To run an evaluation against an existing model, go to Evaluations and click onDocumentation Index
Fetch the complete documentation index at: https://docs.oumi.ai/llms.txt
Use this file to discover all available pages before exploring further.
Run an Evaluation.
Click on Judge-Based Evaluation from the Builder. Under the INPUTS tab, provide the following information:
- Model - A hosted or custom model for evaluation.
- Evaluators - One or more evaluators to score model outputs.
- Dataset - The dataset to evaluate against.
- Failure Mode Analysis (optional) - Whether to generate failure modes automatically
- Inference Configurations (optional) - Inference parameters like
Temperature,Max Tokens,Seed,Requests Per Minute.