.jpg)
Course unit details:
Data Engineering Technologies
Unit code | COMP63502 |
---|---|
Credit rating | 15 |
Unit level | FHEQ level 7 – master's degree or fourth year of an integrated master's degree |
Teaching period(s) | Semester 2 |
Available as a free choice unit? | No |
Overview
In the world of data analytics, preparing and managing data of often the most time-consuming task --- estimated to take up to 80% of a whole workload by many reports and surveys. This unit focuses on the essential data engineering techniques that make large-scale data processing and analysis possible and efficient. Students will explore the foundational concepts and tools used in modern data engineering, including scalable data storage systems, advanced querying methods, parallel and distributed data processing, data interpretation, and effective data retrieval strategies. Emphasis is placed not just on theory, but on hands-on, practical skills that prepare students to work with real-world data.
Pre/co-requisites
Unit title | Unit code | Requirement type | Description |
---|---|---|---|
Data Engineering Concepts | COMP63301 | Pre-Requisite | Recommended |
Prior knowledge of machine learning is needed
Aims
This unit aims to provide students with exposure to and experience of specialised technologies that support data storage, access, integration and use at scale. Data engineering relates to the processes, tools and techniques required to maximise the value that can be obtained from the data resources an individual or organisation has access to. Many of the challenges faced by data engineers have been prominent for a considerable period, and have benefited from research and development that has given rise to specialised techniques for obtaining value from data. This unit aims to provide potential data engineers with the ability to select, evaluate and apply data engineering technologies to problems that involve complex data at scale.
Learning outcomes
1. Describe technologies that underpin scalability in data intensive systems and their properties.
2. Describe and discuss data integration and data retrieval techniques.
3. Compare and contrast approaches to the development of data intensive applications.
4. Analyse how different algorithms and data structures affect data intensive system performance.
5. Construct and apply different data representations that support data curation and analysis.
6. Design experiments for comparing and analysing different data engineering techniques.
7. Write reports that analyse properties of data engineering techniques.
Syllabus
Part I: Techniques for Scalability
Week 1: Storage: Storing Datasets for Scalability
• File Systems
• Storage structures
• Indexes on disk and in memory
Week 2: Algorithms
• Algorithmic strategies
• Modelling algorithm behaviour
Week 3: Queries
• Query processing
• Modelling query properties
Week 4: Parallelism/Distribution
• Architectures
• Paradigms
Week 5: Platforms
• Batch
• Interative
• Streaming
Week 6:
• Complete laboratory work.
Part I: Data Curation and Analysis
Week 7: Graph-based Data Analysis
• Graph database
• Graph query
Week 8: Table Representation
• Models and learning methods
• Discussion and applications
Week 9: Semantic Table Interpretation
• Entity annotation
• Type annotation
• Attribute and relation annotation
• Table to graph transformation
Week 10: Data Integration
• Schema inference
• Entity alignment
Week 11: Advanced Topics and Recent Development
• Question answering
• Retrieval augmented generation
Week 12:
• Complete laboratory work
Teaching and learning methods
The unit will adopt a blended learning approach, with videos and quizzes for students to engage with asynchronously, in addition to synchronous activities in the form of: (i) workshops that include both presentation of new material and problem solving; (ii) laboratory sessions that explore specific techniques in more detail and apply them in practice.
Employability skills
- Analytical skills
- Innovation/creativity
- Oral communication
- Problem solving
- Research
- Written communication
Assessment methods
Method | Weight |
---|---|
Written exam | 50% |
Written assignment (inc essay) | 50% |
Feedback methods
Summative lab-based coursework: individual rubric-based feedback after marking.
Formative weekly quizzes: Autograded quizzes providing immediate feedback.
Exam: cohort level feedback after marking.
Recommended reading
Martin Kleppmann, Designing Data-Intensive Applications, O’Reilly, 2017.
Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of Massive Datasets, 3rd Edition, Cambridge University Press, 2020.
Joe Reis and Matt Housley, Fundamentals of Data Engineering, O’Reilly, 2022.
Study hours
Scheduled activity hours | |
---|---|
Assessment written exam | 1.5 |
Lectures | 20 |
Practical classes & workshops | 12 |
Independent study hours | |
---|---|
Independent study | 116.5 |
Teaching staff
Staff member | Role |
---|---|
Jiaoyan Chen | Unit coordinator |