CodeBook Lab
CodeBook Lab is a Python package for running LLM annotation experiments against human-coded benchmark data. It takes a codebook and labelled dataset exported from CodeBook Studio and lets researchers evaluate local models systematically across the dimensions that matter for text-as-data research:
- Model choice and model size: compare any models available through Ollama
- Prompt style: standard, persona, or chain-of-thought wrappers
- Zero-shot versus few-shot learning: toggle codebook examples on or off
- Sampling hyperparameters: sweep over temperature and top-p settings
Experiments are controlled from Python, so users can start with a single run, scale up to multiple experiment specifications, or iterate through larger parameter grids when needed. Because the codebook and labelled data stay constant across runs, each dimension can be isolated and compared against the same human labels.
CodeBook Studio
- Define the annotation task
- Annotate texts with humans
- Export
codebook.jsonandground-truth.csv
CodeBook Lab
- Run LLM annotation experiments
- Isolate model, prompt, and hyperparameter effects
- Score outputs against human labels
codebook.json + human-annotated ground-truth.csv
Accuracy, agreement, runtime, energy, and emissions metrics
CodeBook Lab also tracks runtime, prompt and response length, energy consumption, and estimated carbon emissions for each run, with aggregate metrics logged to a single CSV for cross-experiment comparison.
The package ships with a bundled starter task so you can test the workflow before plugging in your own data. For a step-by-step walkthrough covering both tools, see the CodeBook Studio & Lab Tutorial.