RUNNING EVALUATIONS

To run an evaluation against an existing model, go to Evaluations and click on Run an Evaluation. Click on Judge-Based Evaluation from the Builder. Under the INPUTS tab, provide the following information:

Model - A hosted or custom model for evaluation.
Evaluators - One or more evaluators to score model outputs.
Dataset - The dataset to evaluate against.
Failure Mode Analysis (optional) - Whether to generate failure modes automatically
Inference Configurations (optional) - Inference parameters like Temperature, Max Tokens, Seed, Requests Per Minute.

After confirming and launching the evaluation job, you can view the results on the Evaluations page.

Getting started

Oumi workflow

RUNNING EVALUATIONS

Getting started

Oumi workflow

Documentation Index