API Reference

Experiment Types

Dataclasses describing experiment specs and the results each run returns.

ExperimentSpec Declarative specification for one experiment run in a sweep.
ExperimentRunResult Combined result returned by run_experiment.
AnnotationRunResult Result returned by run_annotation.
MetricsRunResult Result returned by run_metrics.
HumanReliabilityResult Result returned by calculate_human_reliability.
HumanGroundTruthResult Result returned by build_human_ground_truth.

Experiment Functions

High-level entry points for running and sweeping experiments.

run_experiment Run one end-to-end experiment and evaluate it against ground truth.
run_experiment_grid Run a whole parameter sweep from a grid or a prebuilt spec list.
expand_param_grid Expand a parameter grid into concrete ExperimentSpec runs.
resolve_task_dir Resolve a task directory from either a user path or bundled examples.

Lower-Level Workflow

Run annotation, scoring, and human-reliability steps independently.

run_annotation Run one annotation job and persist its outputs to disk.
run_metrics Evaluate one model-output CSV against ground truth and persist the results.
calculate_human_reliability Validate human coder CSVs and calculate inter-coder reliability metrics.
build_human_ground_truth Build consensus human ground truth from coder CSVs and optional adjudications.

Validation and Retry Helpers

Response parsing and retry utilities used inside the annotation loop.

annotate.extract_json_response Extract and validate JSON response based on annotation type
annotate.normalize_retry_strategy Return a supported retry strategy, falling back to "identical".
annotate.classify_text Annotate one text row across all sections in a codebook.

Example Tasks

Discover and copy the bundled starter tasks.

list_example_tasks List bundled example task names shipped with the package.
get_example_task_dir Return the filesystem path to a bundled example task.
get_example_task_files Return the standard file paths for a bundled example task.
copy_example_task Copy a bundled example task to a user-controlled directory.

Prompts

Inspect built-in prompt wrappers and register custom ones.

list_prompt_wrappers Return the sorted names of all registered prompt wrappers.
get_prompt_wrapper Return a registered prompt wrapper by name.
register_prompt_wrapper Register a prompt wrapper for use in Python and CLI experiment configs.
PromptContext Structured prompt-building context passed to prompt wrapper functions.

Ollama Helpers

Check on and start the local Ollama server and models.

ensure_ollama_available Check that the Ollama server is reachable, optionally starting it locally.
ensure_ollama_model Pull an Ollama model so it is available locally before a run.
get_ollama_base_url Return the Ollama base URL used for connectivity checks.