BASS Philosophy and Data Analytics / Course details

Year of entry: 2024

View tabs
View full page

Course unit details:
Quantitative Text Analysis in the Social Sciences

Course unit fact file
Unit code	SOST30071
Credit rating	20
Unit level	Level 3
Teaching period(s)	Semester 1
Available as a free choice unit?	No

Overview

The availability of text data has increased exponentially in recent years, alongside a growing demand for its analysis. This course introduces students to the quantitative analysis of text from a social science perspective, with a wide coverage of applications in economics, sociology & communication, and political science. The course adopts an applied approach: while theoretical aspects will be addressed, the primary objective is to equip students with the skills to formulate research questions that can be explored through text data and to understand the methodologies required to answer them. To this end, we begin by explaining how text can be conceptualized and modelled quantitatively, examining methods for comparing textual data. Following this, we delve into both supervised and unsupervised techniques in considerable depth, before addressing several specialized topics pertinent to social science research. Ultimately, the course aims to enable students to undertake their own research projects using text as data, providing a foundation for more advanced and technical investigations.

Aims

Learning outcomes

The primary objective of this course is to familiarize students with machine learning methods and contemporary quantitative text analysis techniques, equipping them with the skills needed to apply these statistical methods in their own research. In pursuit of this objective, students will also engage with foundational concepts in machine learning and statistics, cultivating skills that are applicable to a broad range of data and inference challenges. Additionally, students will have the opportunity to enhance their programming competencies and develop an original research project.

Syllabus

Lecture Schedule
(10 sessions of 2-hour lectures and weekly 1-hour computer lab sessions)

1. Introduction to Quantitative Text Analysis
Overview of the field, its applications in social sciences, and fundamental principles of text as data.

2. Descriptive Statistical Methods for Text Analysis
Exploration of foundational descriptive statistics in text analysis, focusing on word frequency, term-document matrices, and other basic text preprocessing and summarization techniques.

3. Supervised Techniques with Text Data I
Dictionary-based approaches, including sentiment analysis and the application of tools such as LIWC and other content dictionaries.

4. Supervised Techniques with Text Data II
Document classification, including precision and recall as evaluation metrics, the role of crowdsourcing in supervised learning, and comparisons of various commonly used classifiers.

5. Transition from Supervised to Unsupervised Techniques
Introduction to machine learning fundamentals, covering support vector machines, k- nearest neighbours, random forests, tree-based methods, and ensemble models.

6. Unsupervised Techniques with Text Data I
Basics of unsupervised learning, with a focus on dimensionality reduction methods, including principal component analysis and singular value decomposition.

7. Unsupervised Techniques with Text Data II
Clustering methods for document classification, scaling techniques, and various topic modelling approaches (e.g., Latent Dirichlet Allocation, Structural Topic Modelling, and BERT-based models).

8. Word Embeddings
Examination of word embeddings for semantic analysis, covering methods such as Word2Vec, GloVe, and embeddings derived from language models.

9. Neural Network-Based Models
Introduction to neural networks for text analysis, with a focus on recurrent neural networks, convolutional neural networks, and transformer architectures.

10. Advanced Applications of Large Language Models (LLMs)
Exploration of recent developments in LLMs, with an emphasis on their applications, limitations, and ethical considerations in text analysis.

Teaching and learning methods

Description of T&L Methods

Instruction will be conducted over a 10-week period, with each week comprising two one- hour lecture sessions. Additionally, students will engage in a weekly one-hour computer lab session to apply theoretical concepts through hands-on exercises

Knowledge and understanding

• Demonstrate a theoretical understanding of content analysis approaches and machine learning techniques

Intellectual skills

• Visualise, describe, and critically assess quantitative text analysis in R/Python, utilizing advanced methods

Practical skills

• Produce reports for academic and non-academic audiences

Transferable skills and personal qualities

• Design and execute small-scale projects applying machine learning to social science research questions using text data

Assessment methods

Method	Weight
Report	100%

Formative Assessment (Assignments) – modelling and coding of text data in a single RMarkdown PDF/HTML document of both answers and code.

Final paper: final data analysis report summarizing key findings from quantitative text analysis, including visualization.
Report may be substantive / technical in nature. (2,000 words (including code, tables and figures): 100%)

Study hours

Scheduled activity hours
Lectures	20
Tutorials	10

Independent study hours
Independent study	170

Teaching staff

Staff member	Role
Yan Wang	Unit coordinator

Return to course details