MSc Data Science (Social Analytics)

Year of entry: 2024

Course unit details:
Topological Data Analysis

Course unit fact file
Unit code DATA70302
Credit rating 15
Unit level FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s) Semester 2
Available as a free choice unit? No

Overview

Students are assumed to have a working knowledge of statistical approaches, including ordinary least squares regression, correlation and the construction of scatter plots. The first two weeks of the module are concerned with extending these ideas as the foundation for Topological Data Analysis (TDA). We will also introduce the core mathematical concepts that undergird TDA taking an intuitive perspective.   

We then introduce the TDA Ball Mapper (TDABM) algorithm of Dlotko (2019), building intuition for the abstract representation of multi-dimensional data by TDABM. Discussion is made of the value of mapping the joint distribution of the multi-dimensional data sets.   

TDABM allows for the is evaluation of statistical models, mapping of inequality, policy appraisal and charting of regional development amongst many applications. Within the discussion we will see how TDABM helps researchers visualise the functionality of Machine Learning models and how the conclusions from Machine Learning may impact differentially across the joint distribution of demographic characteristics. We will discuss how TDABM is used to redress some of the recent controversies derived from applications of Machine Learning algorithms. Consideration will be given to the functions which are built within R to aid analysis of these three applications. We will discuss how the TDABM package (Dlotko 2019b) can be adapted for wider use in the social sciences.  

The final block of content focuses on the topology of time series, beginning with a discussion of time series embedding and the concept of topological persistence. We will discuss the process of building persistence diagrams and constructing persistence norms.  Persistence norms are used widely in the natural sciences for the study of dynamic systems and as shape summaries. We will see how persistence norms are equally useful within the social sciences.   

In all sections the foundation will be in basic statistics, but a working knowledge of R is useful for the accompanying practical tasks.   

Aims

The course unit aims to provide students with a Topological Data Analysis (TDA) toolkit which can be applied to understanding data across the social sciences. An introduction to the concept of data shape is provided, grounding TDA as a study of the shape of data.   

Students will consider TDA tools for the visualisation of multi-dimensional datasets to show how important messages are often hidden in plain sight. It will be shown how visualising multi-dimensional data should be a first stage for understanding data and then used to evaluate the performance of the statistical models constructed upon that data.   

Coverage will also be given to the topology of time series. The course unit will show how time series are embedded in multi-dimensional space and then how the dynamics in that space are captured by TDA. 

Teaching and learning methods

  • Weekly lectures of 1 hour duration with interactive content.  
  • Video lectures providing core content and further examples.  
  • Weekly practical laboratories of 2 hour duration.  
  • All teaching will have accompanying Github content and Jupyter notebooks which are designed to run R code.  
  • Active discussion will be promoted using the blackboard discussion forum.  
  • Groups will be created to support the development of the group assignment, including the uploading of videos and supporting content.  
  • All feedback will be provided through Blackboard and the Turnitin submission system.  

 

Knowledge and understanding

  • Construct and evaluate point clouds from multi-dimensional datasets  

  • Interpret and analyse output from the application topological data analysis   

Intellectual skills

  • Process datasets in preparation for topological data analysis  

  • Interpret topological data analysis ball mapper plots and critically evaluate the messages contained within the plots  

  • Construct and appraise user designed metrics within topological data analysis in R  

  • Reflect upon the commonalities and differences in topological approaches to time series and data mapping.  

Practical skills

  • Write reports for non-academic audiences using topological data analysis ball mapper as a central component of the statistical approach  

  • Present inference taken from the topology of social science time series    

Transferable skills and personal qualities

  • Plan and implement small projects applying topological data analysis to socio-demographic questions  

  • Presentation of academic resources for a wider audience  

Assessment methods

Group application of topological data analysis ball mapper. All students to contribute to all sections. Videos may be uploaded individually.  

Contributions of each group member should be clearly indicated within the submissions. Marks will be individual (4 x 5 minutes video presentation).

2 x 500 words contribution to report of analysis for policymakers (300 words individual reflection) (60%) 

Individual report applying topological data analysis to relevant time series  (1500 words plus code appendix, 40%) 

Recommended reading

This module will be supported by a comprehensive set of course notes as there are currently no suitable textbooks covering the use of topological data analysis in the social sciences.   

  

Technical Papers:  

  

Bubenik, P., & Dłotko, P. (2017). A persistence landscapes toolbox for topological statistics. Journal of Symbolic Computation, 78, 91-114.  

Carlsson, G. (2009). Topology and data. Bulletin of the American Mathematical Society, 46(2), 255-308.  

Chazal, F., & Michel, B. (2021). An introduction to topological data analysis: fundamental and practical aspects for data scientists. Frontiers in artificial intelligence, 4.  

Dłotko, P. (2019). Ball mapper: A shape summary for topological data analysis. arXiv preprint arXiv:1901.07410.  

Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S., & Singh, A. (2014). Confidence sets for persistence diagrams. The Annals of Statistics, 2301-2339.  

Munch, E. (2017). A user’s guide to topological data analysis. Journal of Learning Analytics, 4(2), 47-61.  

Perea, J. A., & Harer, J. (2015). Sliding windows and persistence: An application of topological methods to signal analysis. Foundations of Computational Mathematics, 15(3), 799-838.  

  

Applications (Constantly being added to):  

  

Dłotko, P., Qiu, W., & Rudkin, S. T. (2021). Financial ratios and stock returns reappraised through a topological data analysis lens. The European Journal of Finance, 1-25.  

Gidea, M., & Katz, Y. (2018). Topological data analysis of financial time series: Landscapes of crashes. Physica A: Statistical Mechanics and its Applications, 491, 820-834.  

Graham, E. (2017). Introduction: Data visualisation and the humanities. English Studies, 98(5), 449-458.  

Monsivais, P., Francis, O., Lovelace, R., Chang, M., Strachan, E., & Burgoine, T. (2018). Data visualisation to support obesity policy: case studies of data tools for planning and transport policy in the UK. International Journal of Obesity, 42(12), 1977-1986.  

Nash, K., Trott, V., & Allen, W. (2022). The politics of data visualisation and policy making. Convergence, 28(1), 3-12.  

Qiu, W., Rudkin, S., & Dłotko, P. (2020). Refining understanding of corporate failure through a topological data analysis mapping of Altman’s Z-score model. Expert Systems w

Teaching staff

Staff member Role
Simon Thomas Rudkin Unit coordinator

Return to course details