The Oumi Agent makes it easy to generate high-quality at any stage of the machine learning workflow. Using natural language prompts, you can create training data from scratch, synthesize datasets from failure modes, or analyze and refine existing data, without writing pipeline code. What traditionally requires weeks of manual collection, cleaning, and formatting can be completed in hours. The Oumi Agent automates the most time-consuming parts of dataset preparation such as schema validation, format conversion, and iterative refinement, so your team spends less time on data plumbing and more time on model quality. Because datasets are generated on-demand and scoped to your task, you also avoid the cost of sourcing or licensing large generic datasets that may not fit your use case.Documentation Index
Fetch the complete documentation index at: https://docs.oumi.ai/llms.txt
Use this file to discover all available pages before exploring further.
STRUCTURE & CONTENTS
An is a structured collection of prompts and responses used to either train a model or evaluate its performance. Depending on your workflow, a dataset may include:- Prompt–response pairs for supervised fine-tuning
- Prompts only, where model outputs are generated and evaluated separately
- Multi-turn conversations for dialogue-based training or benchmarking
UPLOADING DATASETS
You can upload datasets directly into Oumi in a variety of common formats, including JSON, JSONL, CSV, and Parquet. All Oumi datasets follow a standardized internal that defines how messages, roles, and metadata are structured. During upload, Oumi automatically validates and converts your data into this format, ensuring it works seamlessly with training, evaluation, data synthesis, and analysis tools across the platform as well as modern machine learning pipelines.RAW FILES
Oumi also supports uploading raw files to ground your models in proprietary or domain-specific data. This allows you to incorporate internal documents, knowledge bases, or other private content into your workflows. To learn more, please see Uploading raw files.EXAMPLE USAGE
Here’s an example of a properly-formed dataset for Oumi in format:messages field, you can also specify a metadata field that is a dictionary of metadata for your data row.
WHAT’S NEXT
Add datasets
Upload and import datasets into the Oumi platform.
Add raw files
Upload and import raw files to contextualize and ground your data.
Data explorer
Explore, inspect, and validate your datasets.
Recipes
Adding new datasets using guided workflows.