API Reference

Experiment Types

Dataclasses describing experiment specs and the results each run returns.

ExperimentSpec	Declarative specification for one experiment run in a sweep.
ExperimentRunResult	Combined result returned by `run_experiment`.
AnnotationRunResult	Result returned by `run_annotation`.
MetricsRunResult	Result returned by `run_metrics`.
HumanReliabilityResult	Result returned by `calculate_human_reliability`.
HumanGroundTruthResult	Result returned by `build_human_ground_truth`.

High-level entry points for running and sweeping experiments.

run_experiment	Run one end-to-end experiment and evaluate it against ground truth.
run_experiment_grid	Run a whole parameter sweep from a grid or a prebuilt spec list.
expand_param_grid	Expand a parameter grid into concrete `ExperimentSpec` runs.
resolve_task_dir	Resolve a task directory from either a user path or bundled examples.

Run annotation, scoring, and human-reliability steps independently.

run_annotation	Run one annotation job and persist its outputs to disk.
run_metrics	Evaluate one model-output CSV against ground truth and persist the results.
calculate_human_reliability	Validate human coder CSVs and calculate inter-coder reliability metrics.
build_human_ground_truth	Build consensus human ground truth from coder CSVs and optional adjudications.

Response parsing and retry utilities used inside the annotation loop.

annotate.extract_json_response	Extract and validate JSON response based on annotation type
annotate.normalize_retry_strategy	Return a supported retry strategy, falling back to `"identical"`.
annotate.classify_text	Annotate one text row across all sections in a codebook.

Discover and copy the bundled starter tasks.

list_example_tasks	List bundled example task names shipped with the package.
get_example_task_dir	Return the filesystem path to a bundled example task.
get_example_task_files	Return the standard file paths for a bundled example task.
copy_example_task	Copy a bundled example task to a user-controlled directory.

Inspect built-in prompt wrappers and register custom ones.

list_prompt_wrappers	Return the sorted names of all registered prompt wrappers.
get_prompt_wrapper	Return a registered prompt wrapper by name.
register_prompt_wrapper	Register a prompt wrapper for use in Python and CLI experiment configs.
PromptContext	Structured prompt-building context passed to prompt wrapper functions.

Check on and start the local Ollama server and models.

ensure_ollama_available	Check that the Ollama server is reachable, optionally starting it locally.
ensure_ollama_model	Pull an Ollama model so it is available locally before a run.
get_ollama_base_url	Return the Ollama base URL used for connectivity checks.