My research seeks to shed light on feasibility and limitations of constructing scalable and reliable services for modern large-scale distributed computing environments. Our approach is based on applying principles and paradigms of distributed computing theory to a wide range of today’s systems and applications, such as big data platforms, data analytics, cloud services, data centre networking, and multi-core architectures. The goal is to develop systematic understanding of the tradeoffs involved as well as their implications for the system design. We are also interested in harnessing theoretical insights for devising principled implementation and verification frameworks. Overall, the main research directions being currently pursued are as follows:
Reliable Data Storage: Coding vs. Replication
We study the relationship between two major strategies for achieving data availability in distributed data stores: replication (i.e., storing redundant copies of data) and coding (i.e., storing coded pieces of data). Although conventional wisdom holds that the coding-based solutions are more cost-effective compared to the replication-based ones, a closer look reveals a more nuanced picture. In particular, our recent results establish that the solutions based on coding alone must avoid overwriting old data in some scenarios thus leading to unbounded storage consumption when concurrency is high. To circumvent this limitation, we propose a new adaptive storage strategy that stores the data in a compact coded form as long as concurrency remains below a configured threshold, and automatically switches to replication once the threshold is reached.
- Space Bounds for Reliable Storage: Fundamental Limits of Coding, Alexander Spiegelman, Yuval Cassuto, Gregory Chockler, Idit Keidar. To appear in ACM PODC 2016. pre-print: arXiv:1507.05169.
- Tutorial slides, OPODIS’15: [pdf]
- Presentation slides, British Colloquium of Theoretical Computer Science (BCTCS’16): [pdf]
Collaborators: Alex Spiegelman, Yuval Cassuto and Idit Keidar, Technion
Supported by Royal Society
Verification Frameworks for Concurrent Data Structures
Our goal is to simplify the task of proving correctness of concurrent data structures. We present a methodology that offers algorithm designers a constructive way to analyse their data structures, using the same principles that were used to design them in the first place. Our methodology is based on the notion of a base condition which is a predicate relating the shared states of the data structure to the local states of individual operations in the sequential executions. The correctness of the data structure can then be derived by arguing that the base conditions hold at the specific points (called base points) of each concurrent run. We have so far successfully applied our methodology for proving correctness of several non-trivial concurrent data structures such as concurrent lists and search trees. We are currently working on incorporating our methodology into a formal verification framework to enable automated correctness proofs, program testing, and synthesis of concurrent implementations from sequential code.
- A Constructive Approach for Proving Data Structures Linearizability, Kfir Lev-Ari, Gregory Chockler and Idit Keidar. In Proceedings of Distributed Computing – 29th International Symposium, DISC 2015, LNCS 9363, pages 356–370, Tokyo, Japan, October 5–9, 2015.
- On Correctness of Data Structures under Reads-Write Concurrency, Kfir Lev-Ari, Gregory Chockler, and Idit Keidar. In Proceedings of Distributed Computing – 28th International Symposium, DISC 2014, pages 273–287, Austin, TX, USA, October 12-15, 2014.
- Presentation slides: DISC’14 [pdf], DISC’15 [pdf]
Collaborators: Kfir Lev-Ari and Idit Keidar, Technion
Supported by Royal Society
Cloud Caching Services
We study techniques for estimating the cost and benefits of memory space in large in-memory object caches, such as memcached installations that are widely used to accelerate popular web sites. The goal is to dynamically construct histograms of cache hit rate (known as Hit-Rate Curves, or HRC) as a function of the cache size by tracking the cache accesses by different tenants. The resulting HRCs can then be used to facilitate the cache provisioning decisions by either cloud operators or automated tools. We have so far developed two general and efficient algorithms for dynamically estimating HRCs, both of which can be plugged into cache replacement policies like LRU. Our ongoing effort is focusing on developing new models to capture data access patterns of different classes of cloud applications, and devising new cache replacement policies to optimise the cache content based on these models. We are also working on efficient techniques for monitoring HRCs for distributed caches and optimising cache performance in multi-tenant environments.
- Dynamic Performance Profiling of Cloud Caches, Trausti Saemundsson, Hjortur Bjornsson, Gregory Chockler, and Ymir Vigfusson. In Proceedings of the ACM Symposium on Cloud Computing (SOCC ’14). ACM, New York, NY, USA, Article 28, 14 pages.
- Design and implementation of caching services in the cloud, Gregory Chockler, Guy Laden, and Ymir Vigfusson, IBM Journal of Research and Development, Special Issue on Cloud Computing, 55(6): 1–11, December 2011.
- Data Caching as a Cloud Service, Gregory Chockler, Guy Laden, and Ymir Vigfusson, Proceedings of the 5th Workshop on Large-Scale Distributed Systems and Middleware (LADIS’10), Zurich, Switzerland, July 2010.
- Cache Optimization via Predictive Cache Size Modification, Gregory Chockler, Ymir Vigfusson, and Guy Laden, US Patent 8850122, Issued September 30, 2014.
- Mimir Cache project
Collaborators: Ymir Vigfusson, Emory University
Reliable Data Storage: Multi-cloud Data Stores
Cloud-based storage services have established themselves as a paradigm of choice for supporting bulk storage needs of modern networked services and applications. Although individual storage service providers can be trusted to do their best to reliably store the user data, exclusive reliance on any single provider or storage service leaves the users inherently at risk of being locked out of their data due to outages, connectivity problems, and unforeseen alterations of the service contracts. An emerging multi-cloud storage paradigm addresses these concerns by replicating data across multiple cloud storage services, potentially operated by distinct providers. In this project, we study the impact of the storage interfaces and consistency semantics exposed by individual clouds on the complexity of the reliable multi-cloud storage implementation. Our results establish several inherent space and time tradeoffs associated with emulating reliable objects over a collection of unreliable storage services with varied interfaces and consistency guarantees.
- Preprint: Space Bounds for Reliable Multi-Writer Data Store: Inherent Cost of Read/Write Primitives, Gregory Chockler, Dan Dobre, Alexander Shraer, Alexander Spiegelman: arXiv:1508.03762, 2015.
- Space Bounds for Reliable Multi-Writer Data Store: Brief Announcement, Gregory Chockler, Dan Dobre, Alexander Shraer, and Alexander Spiegelman. In Proceedings of Distributed Computing – 29th International Symposium, DISC 2015, LNCS 9363, pages 673–674, Tokyo, Japan, October 5–9, 2015.
- Presentation slides: [pdf]
Collaborators: Dan Dobre, European Patent Office; Alex Shraer, Google; Alex Spiegelman, Technion
Supported by Royal Society
Large-Scale Push Notification Infrastructure
We are designing RocketSpeed, a large-scale push notification system, to enable global exchange of data between millions of mobile and IoT devices and hosted services. The current effort is focussing a number of challenges brought by a sheer scale of the system, such as the lookup data structure efficiency and tradeoffs between the computation and memory overheads.
Collaborators: Facebook London and Ymir Vigfusson, Emory University
Reliable Management of Software-Defined Networks
Software-Defined Networking (SDN) enables deterministic management of application data flows by compartmentalising the network topology control into a separate distributed tier. We study new approaches for boosting resilience of the SDN virtual data flow networks by leveraging the self-healing paradigm to ensure global connectivity by applying localised adjustments in response to failures. Our on-going effort is focusing on generalising self-healing to accommodate both arbitrary connectivity and heterogeneity of the underlying physical communication network.
Collaborators: Amitabh Trehan, Queens University Belfast; Eliezer Dekel, Huawei European Research Centre
Mining Structural Properties of Massive Graphs
We consider a class of data mining problems on large graph data sets aiming at identifying various structural properties thereof (such as connected components, triangles, etc.). This class of problems is a common ingredient of many social network analytics tasks, such as community detection and friend recommendation. Our approach is to analyse these problems in the formal framework of distributed computing on graphs to identify their complexity tradeoffs and scalability bottlenecks. The outcomes will lead to efficient implementations on large-scale graph processing platforms, such as Apache Giraph and GraphX.
Collaborators: Chhaya Trehan and Peter Robinson, CS RHUL
Trustworthy Blockchain Abstractions
The block chain abstraction, originally developed as an integral part of the Bitcoin protocol, is now being increasingly adopted as a paradigm of choice for facilitating secure transactions and information exchange in the financial, insurance and legal settings. An original fully decentralised implementation, is however, not well suited for these scenarios due to its lack of determinism and increased risk for sensitive data leakage. This project investigates approaches for boosting determinism of the block chain abstraction without compromising scalability. The current work is focusing on applying Byzantine resilient data replication techniques in conjunction with data sharding to support deterministic correctness and performance guarantees in large-scale settings.
Collaborators: IBM Research Haifa and Zurich
Boosting Accuracy of Conflict Analysis via Economics Modelling and Data Mining
This is an interdisciplinary project jointly with the RHUL Department of Economics, which was recently awarded a fully-funded PhD scholarship from Leverhulme Magna-Carta Doctoral Training Centre. The broad idea is to marry economic analysis with methods of big-data analysis, particularly large-scale data mining. The specific case study will focus on mining large collections of human-generated conflict event data to extract accurate event statistics in the presence of potentially fabricated data. The approach will leverage economic theory to identify fabricated reports based on the incentive structures within which they were originally collected.
Collaborators: Mike Spagat, Department of Economics, RHUL; Zhiyuan Luo, CS RHUL
Supported by Magna Carta Doctoral Training Centre