Generate 1M training
rows in under a minute
No scraping. No PII. No data-sharing agreements. Describe your ML schema in plain English — Hadarac delivers production-scale training data instantly.
How it works
Three lines from prompt to dataset
import (hadarac)
# 1. Connect with your API key
client = (hadarac).Client(api_key="hdr_••••••••••••")
# 2. Describe any ML schema in plain English
dataset = client.generate(
prompt="Medical records: patient_id, age (18-90), diagnosis_code (ICD-10), "
"medication, dosage_mg, outcome (recovered/ongoing/critical)",
rows=1_000_000, # scale to any size
format="parquet", # parquet / csv / json
)
# 3. Save and load into your training pipeline
dataset.save("medical_training.parquet")
import pandas as pd
df = pd.read_parquet("medical_training.parquet")
print(df.head())
✓ Generated 1,000,000 rows in 54s → medical_training.parquet Use cases
Built for every stage of ML
Fine-tune LLMs without scraping the web
Generate domain-specific instruction–response pairs, Q&A datasets, or chain-of-thought examples at any scale. Define the schema, describe the domain — get labelled data in seconds.
Augment sparse training sets
Got 200 real examples but need 50,000? Use Hadarac to generate statistically consistent synthetic samples that mirror your real distribution — without overfitting to edge cases.
Generate edge-case and adversarial examples
Prompt Hadarac to create rare scenarios — fraud transactions, malformed inputs, out-of-distribution records. Stress-test your model before it hits production.
Privacy-safe data for regulated industries
Healthcare, finance, legal — sectors where you can't use real patient or customer data for training. Hadarac generates statistically faithful synthetic records with zero PII.
Why synthetic over real data?
Start generating training data today
Free plan includes 15 generations/month. No credit card required. Enterprise plans with custom volume available.
Need 100M+ rows or a dedicated instance? Talk to us →