Software
All code for this course should be written in Python. For those who haven’t worked with this language previously, POL42340 Programming for Social Scientists runs in parallel to this course and provides a comprehensive introduction to Python.
Development Environment
We will use VS Code to write and execute code locally. For some lab sessions and the group project, where your personal device may not be powerful enough or where long execution times are expected, Google Colab should be used. Colab provides limited free access to T4 GPUs, which will enable you to run code that cannot be executed locally without a GPU.
Package Management
We will use venv and pip to manage our Python environment and install packages (though feel free to use Anaconda if you are more familiar with this platform). I strongly encourage the use of Git for version control, particularly when working on your group project; however, we will not cover this extensively in class.
Useful Python Packages
The following packages are particularly useful for dealing with textual data:
| Package | Description |
|---|---|
spaCy |
Industrial-strength NLP library |
NLTK |
Classic NLP toolkit |
Gensim |
Topic modelling and word embeddings |
sentence-transformers |
Sentence and document embeddings |
transformers |
HuggingFace model hub and pipelines |
langchain |
Framework for building LLM applications |
scikit-learn |
Machine learning and evaluation metrics |