LargeScale Graph Analytics Resources
Slides:
 LargeScale Graph Processing: PDF, by Amir Payberah, SICS
 Brief Intro to Spark and Scala (tutorial): PDF, by Amir Payberah, SICS
 Large Scale Graph Processing with Apache Giraph: PDF, by Sebastian Schelter
Papers:
General distributed graph processing platforms
 Challenges in Parallel Graph Processing, 2007: PDF
 GraphLab A New Framework For Parallel Machine Learning, 2010: PDF
 Pregel: A System for LargeScale Graph Processing, 2010: PDF
 Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, 2012: PDF
 PowerGraph: Distributed GraphParallel Computation on Natural Graphs, 2012: PDF
 GraphX: Unifying DataParallel and GraphParalle Analytics, 2014: PDF
 From “Think Like a Vertex” to “Think Like a Graph”, by Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson, VLDB’13: PDF [Giraph++]
 Processing largescale graph data: A guide to current technology by Sherif Sakr: PDF [overview article by IBM Developer Works]
Recent research on graph processing platforms
 Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregellike Graph Processing Systems by Minyang Han (University of Waterloo), Khuzaima Daudjee (University of Waterloo), VLDB’15: PDF
 One Trillion Edges: Graph Processing at FacebookScale by Avery Ching (Facebook), Dionysios Logothetis (Facebook), Sergey Edunov (Facebook), Maja Kabiljo (Facebook), Sambavi Muthukrishnan (Facebook), VLDB’15: PDF
 MOCgraph: Scalable Distributed Graph Processing Using Message Online Computing, Chang Zhou, Jun Gao, Binbin Sun, Jeffrey Xu Yu, VLDB’15: PDF [MOCgraph: message online computing (MOC) to improve memory footprint by processing incoming messages in a streaming manner]
 Blogel: A BlockCentric Framework for Distributed Computation on RealWorld Graphs by Da Yan (HKUST), James Cheng (CUHK), Yi Lu (CUHK), Wilfred Ng, The Hong Kong University of Science and Technology), VLDB’15: PDF [Blockoriented distributed graph processing framework to deal with skewed degree distribution, large diameter, and high density of real world graphs, such as social networks; supports flexible partitioning based on “blocks” as opposed to vertices and/or edges]
 GraphTwist: Fast Iterative Graph Computation
with Twotier Optimizations, Zhou, Lee, Liu, and Zhang, VLDB’15: PDF [improvements to handle vertexcentric system bottlenecks]
 A Scalable Distributed Graph Partitioner, Margo and Seltzer, VLDB’15: PDF [Scalable Hosttree Embeddings for Efficient Partitioning
(Sheep): scalable generalpurpose graph partitioning algorithm for distributed/parallel frameworks]
 Pregelix: Big(ger) Graph Analytics on A Dataflow Engine, Bu, Borkar, Jia, Carey, Condie, VLDB’15: PDF [reducing memory load for distributed graph processing systems]
 LargeScale Distributed Graph Computing Systems: An Experimental Evaluation, Lu, Cheng, Yang, and Wu, VLDB’15: PDF [extensive comparative performance study of the mainstream distributed graph processing frameworks]
 Fast Failure Recovery in Distributed Graph Processing
Systems, Shen, Chen, Jagadish, Lu, Ooi, Tudor: PDF [failure recovery by repartitioning vertices]
Probabilistic Analysis

“Balls into Bins” — A Simple and Tight Analysis, Raab and Steger: PDF
Problems on graphs
Community detection
 Defining and Evaluating Communities Based on Ground Truth, Yang and Leskovec, ICDM’12: PDF [Examines and compares 13 commonly used structural definitions of network communities]
 Community Detection in Graphs, Fortunato, 2010: PDF [survey on community detection algorithms]
 Communities from Seed Sets, Andersen and Lang, WWW’06: PDF [local methods]
 Local Community Identification in Social Networks, Chen, Zaiane, Goebel, ASONAM’09: PDF [local methods]
 Local Graph Partitioning Using PageRank Vectors, Andersen, Chung, Lang, FOCS’06: PDF [local methods]
 Near linear time algorithm to detect community structures in largescale networks, Ragahavan, Albert, Kumara: PDF
 Evaluating Local Community Methods in Networks, James Bagrow: PDF [accuracy benchmarking]
 DENGRAPH: A Densitybased Community Detection Algorithm, Falkowski, Bart, Spiliopoulou, ICWI’07: PDF
 Finding community structure in very large networks, Clauset, Newman, Moore, Physics Review, 2004: PDF [classic paper, global algorithm]
 Community Structure in Social and Biological Networks, Girvan and Newman, PNAS 99 (12), 2001: PDF
 Finding and evaluating community structure in networks, Newman and Girvan, PDF
 A Local Method for Detecting Communities, Bagrow and Bollt, Physics Review 2005, PDF
 Finding local community structure in networks, Clauset, Physics Review 2005, PDF
 Stochastic Local Clustering for Massive Graphs, Schaeffer, PAKDD’05: PDF
 Influential Community Search in Large Networks, by RongHua LI (CUHK), Lu Qin (University of Technology (Sydney), Jeffrey Xu Yu (The Chinese University of Hong Kong), Rui Mao (Shenzhen University): PDF [nondistributed, stream], VLDB’15
 Robust Local Community Detection: On Free Rider Effect and Its Elimination by Yubao Wu (Case Western Reserve University), Ruoming Jin (Kent State University), Jing Li (Case Western Reserve University), Xiang Zhang (Case Western Reserve University): PDF, [systematic study of the existing goodness metrics and “free riders” phenomenon], VLDB’15: PDF
 Community Detection in Social Networks: An Indepth Benchmarking Study with a ProcedureOriented Framework by Meng Wang (Tsinghua University), Chaokun Wang (Tsinghua University), Jeffrey Xu Yu (The Chinese University of Hong Kong), Jun Zhang (Tsinghua University), [generalized community detection procedure and procedureoriented framework for benchmarking; allows comparison of the existing community detection approaches and pinpointing their deficiencies], VLDB’15: PDF
Clustering, Hubs and Outliers

SCAN: a structural clustering algorithm for networks, Xu, Yuruk, Feng and Schweiger, KDD’07: PDF [not distributed, but apparently widelyused algortighm]

SCAN++: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Largescale Graphs by Hiroaki Shiokawa (NTT), Yasuhiro Fujiwara (NTT), Makoto Onizuka (Osaka University): VLDB’15: PDF [improved scalability for SCAN; still not distributed]
Link Prediction and Friend Recommendation in Social Networks
 Link Prediction in Social Networks:
the StateoftheArt, Wang, Xu, Wu, Zhou, Science China: Information Sciences: PDF [comprehensive survey]  LINKREC: A Unified Framework for Link Recommendation
with User Attributes and Graph Structure, Yin, Gupta, Weninger, Han, WWW’10: PDF [uses RWbased technique]  Link prediction based on local information considering
preferential attachment, Shan Zeng, Physica A: PDF [exploits neighborhood information to produce link recommendation]
PageRank
 FrogWild! — Fast PageRank Approximations on Graph Engines by Ioannis Mitliagkas (UT Austin), Michael Borokhovich (UT Austin), Alexandros Dimakis (UT Austin), Constantine Caramanis (UT Austin), VLDB’15: PDF [speeding up PageRank on GraphLab by relaxing synchronisation; nondistributed?]
Graph connectivity problems (connected components, strongly/weakly connected components, biconnected components)
 Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees by Da Yan (HKUST), James Cheng (CUHK), Kai Xing (HKUST), Yi Lu (CUHK), Wilfred Ng, Yingyi Bu (UC Irvine), VLDB’15: PDF [practical Pregel algorithms for various graph connectivity problems]

Finding connected components in mapreduce in logarithmic rounds, Laukik Chitnis, Anish Das Sarma, Ashwin Machanavajjhala, and Vibhor Rastog, ICDE’13: PDF [HashMin algorithm for identifying connected components]
BFS, DFS, etc.
 The More the Merrier: Efficient MultiSource Graph Traversal by Manuel Then, Moritz Kaufmann, Fernando Chirigati, TuanAnh HoangVu, Kien Pham, Alfons Kemper, Thomas Neumann, Huy Vo, VLDB’15: PDF [MultipleSource BFS (MSBFS). nondistributed, also looking into applications to Closeness Centrality (see below)]
Centrality
 U. Brandes. A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology, 25:163–177, 2001: PDF [Betweenness Centrality: Wikipedia article, also in Newman’s and Rajaraman et al books]
 P. Olsen, A. Labouseur, and J.H. Hwang. Efficient TopK Closeness Centrality Search. In ICDE ’14, pages 196–207, March 2014: PDF [closeness centrality]
Small world (degree of separation)
 Lars Backstrom, Paolo Boldi, Marco Rosa, Johan Ugander, and Sebastiano Vigna. Four degrees of separation. In ACM Web Science 2012: Conference Proceedings, pages 45−54. ACM Press, 2012: PDF
Graph Streams
 Graph Stream Algorithms: A Survey by Andrew McGregor, Database Principles Column: PDF, Slides: PDF [queries over dynamically changing graphs represented as a stream of edges; mostly theoretical treatment]
SimRank
 SimRank: a measure of structuralcontext similarity by Jeh and Widdom, KDD’02: PDF [seminal paper that introduced the metric; similarity is asserted through structural context in a graph: two vertices are similar if they are referenced by two vertices which are also similar; applies in many contexts: recommender systems, spam filtering, sponsored search, malware detection, etc.; typically topk simrank scores are of interest]
Sampling
 Sampling from Large Graphs by Leskovec & Faloutsos, KDD’06: PDF [survey of graph sampling techniques; not necessarily distributed]
 Walk, Not Wait: Faster Sampling Over Online Social Networks by Azade Nazi (University of Texas at Arlington), Zhuojie Zhou (George Washington University), Saravanan Thirumuruganathan (University of Texas at Arlingt), Nan Zhang (George Washington University), Gautam Das (University of Texas at Arlington), VLDB’15: PDF [improved graph sampling technique compared to random walk, nondistributed]
Books:
 AlbertLászló Barabási, Network Science. Available for free from http://barabasi.com/networksciencebook/
 Rajaraman, Leskovec, Ullman, Mining of Massive Datasets has a chapter on mining social network graphs. Available for free from http://www.mmds.org/
 M. E. J. Newman. Networks: An Introduction. Oxford University Press, March 2010, 1st Edition, ISBN13: 9780199206650, ISBN10: 0199206651, Info: http://wwwpersonal.umich.edu/~mejn/networksanintroduction
 David Peleg. Distributed Computing: A LocalitySensitive Approach, 2000, ISBN: 9780898714647, eISBN: 9780898719772
DOI: http://dx.doi.org/10.1137/1.9780898719772  Safari books available locally from RHUL: http://ezproxy01.rhul.ac.uk/login?url=http://proquest.safaribooksonline.com/?uicode=holloway
 GraphX
Platforms:
 Spark GraphX
 Apache Giraph
 PowerGraph (includes references to GraphLab and PowerGraph papers and dataset collections)
 GraphLab
 GraphChi
Datasets
 SNAP: Stanford Large Network Dataset Collection. A collection of wellmaintained and documented graph datasets arising many diverse domains.
 SNAP Networks with Ground Truth Communities collection (for community detection research).
 Konect: Koblentz Network Collection (massive repository of network dataset of various nature)
 Mark Newman’s graph data repository. Mostly small datasets for testing/debugging purposes.
 UCI data repository includes references to various graph data sets.
 Laboratory for Web Algorithmics dataset repository: large number of diverse (mainly web crawls) datasets.
 LDBC: Social network graph generator: software to synthesise directed graphs whose properties resemble social networks