CodeBook Lab

Python PackageSource Code

CodeBook Lab is a Python package for running LLM annotation experiments against human-coded benchmark data. It takes a codebook and labelled dataset exported from CodeBook Studio and lets researchers evaluate local models systematically across the dimensions that matter for text-as-data research:

Experiments are controlled from Python, so users can start with a single run, scale up to multiple experiment specifications, or iterate through larger parameter grids when needed. Because the codebook and labelled data stay constant across runs, each dimension can be isolated and compared against the same human labels.

CodeBook Studio

  • Define the annotation task
  • Annotate texts with humans
  • Export codebook.json and ground-truth.csv

CodeBook Lab

  • Run LLM annotation experiments
  • Isolate model, prompt, and hyperparameter effects
  • Score outputs against human labels
Shared inputs
codebook.json + human-annotated ground-truth.csv
Experiment outputs
Accuracy, agreement, runtime, energy, and emissions metrics

CodeBook Lab also tracks runtime, prompt and response length, energy consumption, and estimated carbon emissions for each run, with aggregate metrics logged to a single CSV for cross-experiment comparison.

The package ships with a bundled starter task so you can test the workflow before plugging in your own data. For a step-by-step walkthrough covering both tools, see the CodeBook Studio & Lab Tutorial.