CodeBook Lab

Python Package Source Code

CodeBook Lab is a Python package for running LLM annotation experiments against human-coded benchmark data. It takes a codebook and labelled dataset exported from CodeBook Studio and lets researchers evaluate local models systematically across the dimensions that matter for text-as-data research:

Model choice and model size: compare any models available through Ollama
Prompt style: standard, persona, or chain-of-thought wrappers
Zero-shot versus few-shot learning: toggle codebook examples on or off
Sampling hyperparameters: sweep over temperature and top-p settings

Experiments are controlled from Python, so users can start with a single run, scale up to multiple experiment specifications, or iterate through larger parameter grids when needed. Because the codebook and labelled data stay constant across runs, each dimension can be isolated and compared against the same human labels.

CodeBook Studio

Define the annotation task
Annotate texts with humans
Export codebook.json and ground-truth.csv

CodeBook Lab

Run LLM annotation experiments
Isolate model, prompt, and hyperparameter effects
Score outputs against human labels

Shared inputs
codebook.json + human-annotated ground-truth.csv

Experiment outputs
Accuracy, agreement, runtime, energy, and emissions metrics

CodeBook Lab also tracks runtime, prompt and response length, energy consumption, and estimated carbon emissions for each run, with aggregate metrics logged to a single CSV for cross-experiment comparison.

The package ships with a bundled starter task so you can test the workflow before plugging in your own data. For a step-by-step walkthrough covering both tools, see the CodeBook Studio & Lab Tutorial.