MSc ACS: Data and Knowledge Management

Year of entry: 2023

Course unit details:
Querying Data on the Web

Course unit fact file
Unit code COMP62421
Credit rating 15
Unit level FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s) Semester 1
Offered by
Available as a free choice unit? Yes

Overview

Given the changing landscape of computing towards a predominance of data-centric/data-intensive approaches in both scientific and industrial contexts, organising and querying data is set to become a primary concern in the construction of contemporary systems. The advance of Artificial Intelligence and Data Analysis applications and their requirement to process large-scale and heterogeneous data, creates the demand to build systems which can efficiently query and operate over this data.

This course unit aims to enable students to have a principled and critical understanding of contemporary mechanisms to support efficient access to large-scale and heterogeneous data. The course is organised will around the challenges present on processing different types of data on the Web (Tabular, Tree-shaped, Graph and Document-based), to cover the fundamental algorithms and data structures present “under the hood” of database systems.

Pre/co-requisites

Unit title Unit code Requirement type Description
Modelling Data on the Web COMP60411 Pre-Requisite Compulsory

The formal requirement is the attendance on Modelling Data on the Web.

However, it is strongly recommended that the student attended a previous course on fundamentals of databases. Some of the activities (assessments) will require programming skills.

Aims

The aim of this course is to provide the conceptual and practical foundations for building and optimizing systems which require accessing large-scale and heterogeneous data.

Learning outcomes

By the end of the course, students will be able to:

  1. Describe and differentiate different types of databases and their supporting querying syntax.
  2. Describe and differentiate query processing approaches for different types of data (Tabular, Tree-shaped, Graph, Document-based).
  3. Apply and evaluate query optimization strategies.
  4. Explain how different algorithms and data structures affect query performance for different types of data.
  5. Argue, contrast and compare different architectures and query optimisation strategies.
  6. Demonstrate and program queries over different databases.
  7. Analise a new data management situation and design the appropriate methods for it.

Syllabus

[Week 1]
Introduction to the Course Unit
Relational Query Processing (1 of 2)

  • The Architectural Paradigm for Query Processing Systems
  • The Relational Model of Data
  • The Relational Calculi and Algebra
  • The SQL Language

 
[Week 2]
Relational Query Processing (2 of 2)

  • Logical Optimization
  • Physical Optimization
  • Classical Query Execution
  • Parallel Query Execution

 
[Week 3]
Semi-Structured Data

  • Querying XML Data
  • XML Query Processing
  • NoSQL Databases
  • NoSQL Rules and Features

  
[Week 4]
Graph Data

  • Graph data management
  • Querying with SPARQL
  • Optimizing SPARQL
  • Evaluating SPARQL

     
[Week 5]
Parallelism and Big Data

  • Parallel Query Processing
  • Parallel Relational Databases
  • Map-Reduce

Data Intensive Systems: Patterns and Trends
 

Teaching and learning methods

The course is structured into 5 full-day lectures and lab sessions. Formative and summative assessments will be performed during the lectures. Some lectures will require active student engagement on the TLAs (e.g. work along exercises, changing activities, quizes).

Summative assessments consists of:

  • One closed-book exam
  • Quizzes and lab work

Some exercises might involve lightweight programming tasks.

Employability skills

Analytical skills
Problem solving
Research
Written communication

Assessment methods

Method Weight
Written exam 50%
Written assignment (inc essay) 50%

Feedback methods

Coursework is assigned and lab sessions provide an opportunity for interaction. Coursework is marked offline with feedback given in writing. Lab sessions allow students to discuss the written feedback in more depth with the marker. The course unit will use the standard tools available in virtual learning environments for hints, tips, discussions, etc.

Study hours

Scheduled activity hours
Lectures 25
Practical classes & workshops 10
Independent study hours
Independent study 115

Teaching staff

Staff member Role
Norman Paton Unit coordinator

Return to course details