CodeBook Lab

CodeBook Lab is a Python package for running local LLM annotation experiments against human-coded benchmark data. It takes a CodeBook Studio codebook and labelled CSV, removes the human labels before annotation, runs one or more Ollama models, and scores the model outputs against the held-out human labels.

The package is built for computational social science workflows where the core question is not simply “can a model label this text?”, but which model, prompt, examples, and sampling settings reproduce a human codebook most reliably.

These docs intentionally go beyond the package README. They cover conditionals, span annotations, response validation, retry strategies, tidy metrics logs, human reliability checks, and adjudication workflows.

Define once

Use CodeBook Studio to design the annotation task, collect human labels, and export codebook.json plus ground-truth.csv.

Run locally

Use Ollama-backed models from Python, including small local models for quick tests and larger models for benchmark runs.

Compare systematically

Sweep over models, prompt wrappers, examples, temperature, top-p, textbox processing, span processing, and run metadata.

What The Package Does

CodeBook Lab gives you a reproducible path from human-coded data to model evaluation:

  1. Load a task folder containing codebook.json and ground-truth.csv.
  2. Strip annotation columns from the input before sending text to the model.
  3. Prompt a local Ollama model for each applicable annotation field.
  4. Save row-level model outputs and run metadata.
  5. Compare model labels with human labels using metrics appropriate to each annotation type.
  6. Log aggregate quality, runtime, energy, and emissions metrics for comparison across runs.

Supported annotation types include checkbox, dropdown, Likert, textbox, and span annotations. Lab also respects conditional annotations, validates model responses before storing them, retries invalid answers, and keeps skipped conditional fields out of metric denominators.

Categorical and ordinal fields receive classification and agreement metrics; textbox fields can receive lexical and embedding-based similarity metrics; span fields receive token F1, exact-match F1, and character IoU.

Where To Start

If you want the shortest path, begin with Installation, then run the bundled policy-sentiment task in Experiments. If you already have CodeBook Studio exports, go straight to Tasks and Annotation Types to check the expected codebook and CSV formats.

The upstream annotation workflow — designing a codebook and labelling data — is covered in the CodeBook Studio guide.

License & Developers

License

GNU AGPL v3.0 — a strong copyleft license. You may use, modify, and distribute CodeBook Lab, including over a network, provided derivative works remain under the same license.

Developers

Lorcan McLaren — author, maintainer

To cite CodeBook Lab in research, see the Citation page.