- UCAS course code
- LL15
- UCAS institution code
- M20
Course unit details:
Data Science Modelling
Unit code | SOST30062 |
---|---|
Credit rating | 20 |
Unit level | Level 3 |
Teaching period(s) | Semester 2 |
Available as a free choice unit? | Yes |
Overview
This is an introductory course aimed at students interested in methods and models for large complex datasets comprising of numerous variables measured on different scales. The course will benefit students with an interest in quantitative Social Sciences (including Criminology, Politics, Psychology, or Sociology to mention but a few).
As an introductory course, the focus will be on the underlying ideas concerning specific methods and models, as opposed to formula and/or theoretical results. Yet, we will emphasise that Statistical Learning should not be seen as a series of black boxes. The course content will be presented and practiced from a problem-oriented perspective using applications from the Social Sciences.
The aims of the course are:
- To provide students with an understanding of how to handle high dimensional and complex data sets.
- To enable students to implement a battery of methods and models to address different classification and forecasting problems (both in supervised and unsupervised settings).
- To give students an assertive command of the statistical package R.
The course is built around 5 topics, to be spread along the 10 weeks of the course. This will allow a paced introduction of each topic and the active engagement of the students.
- Regression (General and Generalized Linear Models)
- Discrimination & Classification (e.g. Logistic and Nonparametric regression, tree- based methods)
- Regularization (e.g. LASSO and Ridge Regression)
- Dimensionality reduction (e.g. Multidimensional Scaling, Correspondence Analysis, Principal Components)
- Unsupervised Learning (e.g. model based clustering, hierarchical non-hierarchical clustering)
Pre/co-requisites
A prior course on statistics (e.g. ECON10072 Advanced Statistics, SOT70151 Statistical Foundations or similar).
Aims
The aims of the course are:
- To provide students with an understanding of how to handle high dimensional and complex data sets.
- To enable students to implement a battery of methods and models to address different classification and forecasting problems (both in supervised and unsupervised settings).
- To give students an assertive command of the statistical package R.
Learning outcomes
Student should, at the end of this course, be able to
- Select, among a pool of competing analytical tools, those most appropriate to the specific application.
- Implement the selected tool using R.
- Successfully write a report explaining and supporting the findings of their analyses and critically assess the results obtained.
- Critically assess the methods being used
- Given a clear policy or research question, students will be able to undertake their own projects involving large and complex datasets.
Teaching and learning methods
The course will use a combination of articles, book chapters, lecture notes and tutorial to achieve the learning outcomes. The course will emphasise applications with tutorials. Each lecture includes a discussion or debate section where the interpretation of data and the critical analysis of data is a highlight. Student attendance and engagement is important. Online resources such as Data Camp will also be used for getting hands-on experience with R coding.
Please note the information in scheduled activity hours are for guidance only and may change.
Knowledge and understanding
The student will be able to motivate, compare and implement a variety of statistical learning techniques.
Intellectual skills
Gain an understanding of the scope and limitations of statistical learning techniques.
Practical skills
- The students will be able to handle large and complex data sets in R.
- The students will be able to identify and estimate suitable supervised and unsupervised learning models for specific empirical applications.
- The students will learn how to write an R-Script, and notes will be distributed about how to use R-Markdown Language for writing their reports.
Transferable skills and personal qualities
- Be able to analyse and pursue research with large and complex datasets.
- Be able to write a blog or policy advice in relation to questions involving large or complex datasets.
- Progress towards more formal and technically oriented courses in the area of Statistical Learning.
Assessment methods
Three short quizzes, altogether worth 30% of the total mark (10% per quiz).
Final written report worth 60% of the total mark. The report should be maximum 2,000 words long.
Weekly Activities 10%
Formative Essay (optional, 0%)
Feedback methods
All Social Statistics courses include both formative feedback - which lets you know how you're getting on and what you could do to improve - and summative feedback - which gives you a mark for your assessed work.
Recommended reading
The core textbook for this course will be:
- James, G., Witten, D., Hastie, T. and Tibshirani, R. (2017) An introduction to statistical learning with Applications in R.
Indicative additional textbook:
- Hadley Wickham and Garrett Grolemund. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (1st ed.). O'Reilly Media, Inc..
The textbooks will be complemented with lecture notes and other learning materials.
Study hours
Scheduled activity hours | |
---|---|
Lectures | 20 |
Practical classes & workshops | 10 |
Independent study hours | |
---|---|
Independent study | 170 |
Teaching staff
Staff member | Role |
---|---|
Tatjana Kecojevic | Unit coordinator |