run_metrics
run_metrics(
ground_truth_csv,
llm_output_csv,
label,
output_csv,
model_id,
codebook_path,
report_file,
columns=None,
quantization_type=None,
temperature=None,
top_p=None,
prompt_type=None,
use_examples=None,
chat_mode=None,
reasoning=None,
process_textbox=False,
process_span=False,
emissions_file=None,
experiment_directory=None,
timestamp=None,
timing_file=None,
char_counts_file=None,
run_id=None,
)
Evaluate one model-output CSV against ground truth and persist the results.
Parameters
| ground_truth_csv |
|
Path to the human-labeled reference CSV. |
required |
| llm_output_csv |
|
Path to the model-generated annotation CSV. |
required |
| label |
|
Experiment label written to the metrics CSV, usually the task name. |
required |
| output_csv |
|
Path to the aggregate metrics CSV to create or update. |
required |
| model_id |
|
Stable identifier for the model and prompt configuration. |
required |
| codebook_path |
|
Path to the codebook used for the run. |
required |
| report_file |
|
Path to the per-column report text file. |
required |
| columns |
|
Optional subset of annotation column names to evaluate. |
None |
| quantization_type |
|
Optional quantization metadata string. |
None |
| temperature |
|
Optional temperature metadata value. |
None |
| top_p |
|
Optional top-p metadata value. |
None |
| prompt_type |
|
Optional prompt wrapper name stored as metadata. |
None |
| use_examples |
|
Optional boolean-like metadata flag. |
None |
| chat_mode |
|
Optional chat-history policy stored as metadata. |
None |
| reasoning |
|
Optional Ollama reasoning mode stored as metadata. |
None |
| process_textbox |
|
Whether textbox metrics should be computed. |
False |
| emissions_file |
|
Optional path to emissions.csv. |
None |
| experiment_directory |
|
Optional path to the per-run output directory. |
None |
| timestamp |
|
Optional timestamp string. |
None |
| timing_file |
|
Optional path to timing_data.json. |
None |
| char_counts_file |
|
Optional path to char_counts.json. |
None |
Returns
|
|
codebook_lab.types.MetricsRunResult for the completed evaluation. |