Big Data MSc degree
I have been closely involved with the RHUL Big Data MSc programme since its inception in 2013. I have been responsible for designing and running the applied part of the curriculum for which I developed two new core courses – Computation with Data (CS5800) and Large-Scale Data Storage and Processing (CS5234) – as well as a number of Hadoop-based dissertation projects.
The Computation with Data course is a fast-paced introduction to basic computing concepts such as algorithms, data structures, programming, and basic complexity analysis. Its goal is to ensure the students acquire the necessary computing proficiency to enable them to effectively use computers for data analysis tasks that cannot be accomplished in high-level frameworks.
The Large-Scale Data Storage and Processing course introduces the students to principles underlying the design of modern large-scale computing systems. It covers in depth three primary categories of today’s large-scale data manipulation frameworks: databases, MapReduce, and NoSQL data stores. The course features an extensive practical component where the students gain hands-on experience implementing realistic data analysis tasks on real-world datasets using various large-scale data manipulation tools (such as PostgreSQL OLAP extension, Hadoop MapReduce, and MongoDB).
In the final project, the students are asked to carry out a longitudinal analysis of the communication network induced by the Enron Email corpus which is a typical “big data” dataset due to it being both unstructured and large. To accomplish this, the students must implement all major stages of the data analytics workflow including: (1) extract-transfer-load (ETL) to assemble the Emails spread over many small files into a single large file on a Hadoop cluster; (2) parsing, cleansing, and time-slicing the data using a pipeline of MapReduce programs to obtain a time series of the communication graph snapshots; (3) quantitive analysis of a sample of the resulting graphs to identify their salient properties (such as degree distribution, presence of hubs, community structure, etc.) and their evolution over time; and (4) discussion and visualisation of the results using Gephi, open-source network analysis and visualization software.
Internet of Things and Distributed and Networked Systems MSc degrees
I am the Programme Director for the new Internet-of-Things and Distributed and Networked Systems MSc degrees that will launch in 2016. I have developed a core curriculum for both programmes and am closely involved in various administration and publicity activities.
All courses taught
Royal Holloway, University of London
- Large-Scale Data Storage and Processing, compulsory post-graduate course I developed: 2013 –
- Computation with Data, compulsory post-graduate course I developed: 2014 –
- Computing Laboratory (Games), core undergraduate course: 2013
- Software Engineering, core undergraduate course: 2013
- Principles of Fault-Tolerance in Distributed Systems, top graduate course I developed: 2004
Hebrew University of Jerusalem
- Operating Systems, core undergraduate course: 2001-2002
- Topics in Distributed Middleware Systems, top graduate course I developed: 2000
- Computer Networks (teaching assistant)
- Computer and Communication Security (teaching assistant)
- Software Engineering (teaching assistant)
Open University of Israel
- Compiler Organisation: 1996 – 1998