Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.oumi.ai/llms.txt

Use this file to discover all available pages before exploring further.

Data underpins both model training and evaluation, making data quality essential to model performance. Oumi makes it easy to create high-quality training and test data, whether from scratch, by extending existing datasets, or using unlabeled data you already have. With a simple prompt, the Oumi Agent can generate complete datasets tailored to your task (e.g, “Build a test set for a model that classifies customer support tickets by urgency”). You still retain full control, with the ability to inspect and refine any part of the data generation workflow before it runs. The Agent turns synthetic data generation into a repeatable, prompt-driven process. It translates your clear task definition into a structured data recipe, generates data at scale, and iteratively improves outputs based on your objectives, all within a single loop. What once took weeks of manual effort and costly labeling can now be done in hours. By automating everything from schema design to quality validation, Oumi removes the operational burden of synthetic data, making it faster to prototype, expand, and evolve datasets without increasing headcount or vendor spend.

WHAT YOU CAN SYNTHESIZE

Oumi incorporates data synthesis as an iterative, repeatable part of your machine learning workflow. You can rapidly prototype datasets, expand small or imbalanced data, and evolve training data alongside your models. Some examples of what you can build with Oumi’s data synthesis include:
  • Question-answer datasets for training chatbots
  • Instruction-following datasets with varied complexity levels
  • Domain-specific training data (legal, medical, technical)
  • Conversation datasets with different personas or styles
  • Data augmentation to expand existing small datasets
With Oumi’s data synthesis capabilities, teams can rapidly prototype, iterate, and scale training datasets while maintaining control over structure, diversity, and quality. By shifting data creation from manual effort to rule-driven generation, you can accelerate model development and unlock use cases that would otherwise be limited by data availability.

WHAT’S NEXT

How it works

Learn how Oumi data synthesis works.

Recipes

Find out what goes inside a data synthesis recipe.