run_metrics

run_metrics(
    ground_truth_csv,
    llm_output_csv,
    label,
    output_csv,
    model_id,
    codebook_path,
    report_file,
    columns=None,
    quantization_type=None,
    temperature=None,
    top_p=None,
    prompt_type=None,
    use_examples=None,
    chat_mode=None,
    reasoning=None,
    process_textbox=False,
    process_span=False,
    emissions_file=None,
    experiment_directory=None,
    timestamp=None,
    timing_file=None,
    char_counts_file=None,
    run_id=None,
)

Evaluate one model-output CSV against ground truth and persist the results.

Parameters

Name Type Description Default
ground_truth_csv Path to the human-labeled reference CSV. required
llm_output_csv Path to the model-generated annotation CSV. required
label Experiment label written to the metrics CSV, usually the task name. required
output_csv Path to the aggregate metrics CSV to create or update. required
model_id Stable identifier for the model and prompt configuration. required
codebook_path Path to the codebook used for the run. required
report_file Path to the per-column report text file. required
columns Optional subset of annotation column names to evaluate. None
quantization_type Optional quantization metadata string. None
temperature Optional temperature metadata value. None
top_p Optional top-p metadata value. None
prompt_type Optional prompt wrapper name stored as metadata. None
use_examples Optional boolean-like metadata flag. None
chat_mode Optional chat-history policy stored as metadata. None
reasoning Optional Ollama reasoning mode stored as metadata. None
process_textbox Whether textbox metrics should be computed. False
emissions_file Optional path to emissions.csv. None
experiment_directory Optional path to the per-run output directory. None
timestamp Optional timestamp string. None
timing_file Optional path to timing_data.json. None
char_counts_file Optional path to char_counts.json. None

Returns

Name Type Description
codebook_lab.types.MetricsRunResult for the completed evaluation.