The Datasets We Actually Build
No generic "legal AI data." We design training datasets for real-world use:
standard jurisdiction packs when you need to move fast, custom builds when
you have a very specific problem, and long-term partnerships when the law
is moving and your models need to keep up.
How We Think About Data
Start where you are. If you need a fast way to de-risk an experiment, we have standard datasets.
If you have a very specific problem, we build bespoke. If you want your models to stay current,
we maintain and extend what you already have.
Standard Regional Datasets
Ready-made, professionally validated jurisdiction packs. Ideal if you want to move fast and
get something into your pipeline without a 3-month scoping exercise.
- Predefined scope, jurisdiction and practice area
- Structured Q&A with correct and strategic wrong answers
- Straightforward per-question pricing
Custom Builds
When your problem does not fit into a template: rare jurisdictions, edge practice areas,
multilingual coverage, or data aligned to your internal frameworks.
- Co-designed with your team and your lawyers
- Scope, size, and validation levels defined together
- Milestone-based payments tied to real deliverables
Partnerships & Updates
Laws change. Your models should not quietly drift out of date.
We maintain, amend, and extend your datasets over time.
- Annual maintenance or “data insurance” packages
- Amendment packs when laws or guidance change
- Advisory on when and how to retrain
What you actually get
- Q&A pairs based on real attorney surveys and interviews
- Correct answers grounded in statutes, case law, and practice
- Strategically incorrect answers that reflect real hallucinations
- Full metadata: jurisdiction, practice area, difficulty, validation notes
- JSONL or CSV, ready to drop into your training pipeline
How pricing works
Per-question pricing, with transparent minimums:
- Standard practice areas: roughly 3–8 USD per Q&A pair
- Hard or niche domains: roughly 12–25 USD per Q&A pair
- Rare or premium jurisdictions: up to 50–100 USD per pair
Licensing is flexible: time-limited, perpetual, exclusive or non-exclusive depending on
how much competitive edge you want.
Where we are active today
| Jurisdiction |
Coverage |
Status |
| UAE (incl. DIFC) |
Commercial, employment, tax |
Shipping now |
| Jordan |
Commercial, contracts, tax |
Shipping now |
| Saudi Arabia |
Commercial, employment, regulatory |
Priority pipeline |
| US Federal |
Corporate, securities, IP |
Pilot builds |
Need something not on this list? That is usually a custom build or partnership conversation.
Product 2
When You Need Something That Does Not Exist Yet
Custom datasets for domains, jurisdictions or workflows that do not fit into a template.
We design the data with you, your lawyers, and your models in mind.
When this makes sense
- You are working in niche practice areas or cross-border transactions.
- You need Arabic–English or other multilingual coverage done properly.
- You want the data to reflect your internal playbooks and risk models.
- You need a very explicit validation standard (for example, 50–100 percent lawyer-certified).
How a typical engagement runs
- Weeks 1–2: Requirements and use-case mapping with your team.
- Weeks 2–3: Finalise scope, size, quality bar, and licensing.
- Weeks 3–4: Recruit and brief the right attorneys.
- Weeks 4–8: Scenario authoring and Q&A creation.
- Weeks 8–10: Multi-layer legal validation and revisions.
- Weeks 10–12: Final delivery, documentation, and integration support.
Pricing is project-based and milestone-based. You pay as we hit the agreed gates, not months in advance.
Typical custom project ranges
- Specialised practice area (for example: tax, healthcare): 30k–80k USD
- Cross-border or multi-jurisdictional: 50k–120k USD
- Heavily regulated domains (for example: securities, banking): 60k–200k USD
Exact numbers depend on jurisdiction, depth, and validation level. The point is simple:
we commit to a clear scope, clear deliverables, and clear milestones before work begins.
Tell Us What You Need Built
Product 3
Keeping Your Data Current
Laws change quietly. Models do not, unless you tell them to. We monitor the jurisdictions you care about
and deliver updates, amendments, and retraining guidance so your AI does not drift.
Annual Maintenance
For a fixed percentage of the initial dataset cost (typically 10–20 percent per year),
we keep your data aligned with legal reality.
- Amendment packs when statutes, regulations or guidance change
- Deprecated items flagged when they become wrong
- Updated citations and metadata
Data Insurance Model
For customers who do not want to think about individual updates.
A fixed annual fee that covers unlimited incremental changes.
- Predictable budget, no per-update surprises
- Continuous small updates instead of big, disruptive overhauls
- Ideal for production systems in regulated environments
Premium Services
Beyond Just Delivering Files
If you are serious about performance, governance, and smooth integration, we can go further than
“here is a dataset, good luck.”
Advanced Metadata & Analytics
Extra signal for model training and evaluation.
- Difficulty ratings and ambiguity flags
- Hallucination risk and stability scores
- Cross-references across scenarios and concepts
Integration Support
We sit with your engineers and help wire the data into your stack properly.
- Schema mapping and data pipelines
- Versioning and rollback strategies
- Evaluation harnesses and test sets
Consulting & Training
Help your team use the data well and measure the right things.
- Fine-tuning strategy and evaluation design
- Workshops with your ML and legal teams
- Ongoing check-ins as you scale
Audit-Ready Provenance
For organisations that need to prove how their data was created and validated.
- Full audit trails and validator records
- Certification packs for compliance teams
- Optional cryptographic or ledger-based sealing
Questions We Get About Products
Can we start small before committing to a larger deal?
Yes. Many teams start with a smaller standard dataset or a scoped pilot for one jurisdiction and practice area.
Once you see the impact on your models, we expand together.
Can we mix your data with our own internal examples?
That is usually the best setup. We provide the structured legal backbone; you layer in
your own proprietary workflows, templates, and edge cases. We can help with schema design.
Do you lock us into a specific model provider?
No. We stay model-agnostic on purpose. Our job is to be the best possible data layer,
whether you are fine-tuning your own models or using third-party APIs.
How Our Pricing Works
We want you to know exactly what you are paying for. We would rather be very clear
than “competitive but vague.”
Per-question transparency
You see the math: price per Q&A pair times number of pairs. No hidden platform fees,
no vague “AI uplift” charges.
Quality never gets discounted
If we lower price, we do it by adjusting license duration or exclusivity, not by
cutting corners on legal quality.
Milestone-based for customs
For custom builds, you pay when specific deliverables are accepted.
Deposits, then gated releases. It aligns incentives on both sides.
Volume rewards commitment
Multi-dataset and multi-year partnerships come with meaningful discounts.
The more we work together, the cheaper it becomes per dataset.
Pricing & Minimums
Minimum dataset: 5,000 rows.
Pricing: ranges from 5.98–100+ depending on jurisdiction, exclusivity, scope, and reviewer seniority.
(Exclusive categories & complex jurisdictions priced higher.)
Terms: milestone-based with an upfront deposit. Non-exclusive and exclusive licenses available (with terms that protect our IP and business).
Ready To See What This Looks Like For You?
Whether you need a standard UAE or Jordan pack, a custom build in a niche area, or ongoing updates
for a production system, we can design the right data layer with you.
What Better Data Actually Buys You
Lower hallucination rates are not marketing copy for us. They show up in how your product behaves
with real users, real lawyers, and real regulators.
Higher accuracy on real tasks
Customers see significant gains on their internal benchmarks when they shift from generic
legal crawls to our attorney-built datasets. That is the difference between a demo and a product.
Fewer hallucinations that matter
Because we deliberately include plausible but wrong answers, models learn to avoid entire classes
of legal mistakes: fictional cases, misapplied rules, jurisdiction drift.
Audit trails for compliance
You get provenance, validator records, and documentation to bring to internal risk, compliance,
or external regulators. It is not just “we used good data,” it is provable.
A foundation you can build on
Once the foundation is in place, you can layer your own knowledge, workflows, and models on top.
You are not rebuilding from scratch every time the law or your product shifts.
What This Actually Changes
Customers use our datasets to move from “interesting prototype” to systems
their legal teams can actually trust.
Higher Accuracy
Models trained on our data see substantial accuracy gains on real-world
legal questions compared to generic web or case-law-only corpora.
Fewer Hallucinations
Strategic wrong answers teach the model to recognize and avoid subtle
legal mistakes instead of generating them confidently.
Audit-Ready Provenance
Attorney validation, citations, and metadata give you the documentation
your compliance and legal teams demand.
5,000+
Rows per Minimum Dataset
2
Jurisdictions Available Now
IRAC
Structured Reasoning Labels
0 PII
Imaginary Parties, Real Law
Coverage
Available now: Tax Law (UAE, Jordan).
On request: Any country / additional areas (Contracts, Litigation, Regulatory, Compliance, Privacy, IP, Employment, M&A).
- Tax Law (UAE/Jordan)
- Contracts
- Litigation
- Regulatory
- Compliance
- Privacy & Data
- IP & Patents
Case Studies
See how curated, IRAC-structured data reduces hallucinations and strengthens jurisdiction-specific reasoning.
Tax Reasoning (UAE)
IRAC scenarios with explicit correct/incorrect variants reduced hallucinations on tricky VAT timing prompts.
Browse Packs
Cross-Jurisdiction Consistency
Adversarial variants improved robustness on LegalBench-style tasks across similar fact patterns.
Request Custom
Evaluation You Can Trust
Held-out test sets include at least one variation per law/case check, with lawyer verification.
See Examples
Why Choose Us
| Feature |
Ours |
Generic Web Crawl |
| IRAC-Structured Q&A | Yes | No |
| Jurisdiction Tagging | Fine-grained | Sparse |
| Lawyer QA | Yes | No |
| Benchmark-Aware Design | Yes (LegalBench) | No |
Data Ethics & Provenance
No Personal Data
Scenarios use imaginary parties; real law application is drafted and double-checked by lawyers.
Open Sources
Derived from our legal research and public information with citations and provenance.
Train/Test Split
Every order includes a held-out test set with variations per law/case check, reviewed by a lawyer.
Licensing Options
Non-exclusive and exclusive packages with protective terms.
Close the Information Entropy Gap
We collaborate with seasoned attorneys to survey,
author, refine, and annotate theoretical legal scenarios.
Each pack includes grounded citations and structured labels so your model learns the
why, not just the what. Which is tailor made to reduce hallucinations.
- Quality & QA: Review led by top lawyers and their teams.
- Custom Metadata Add-Ons: Extend with negotiable extras (difficulty tiers, statute IDs, tags, and more).
- Benchmark Alignment: Crafted to reduce hallucinations and improve LegalBench-style outcomes.
How We Deliver
Tell us your jurisdiction(s), practice area, and target tasks/models.
- Jurisdictions: UAE Tax and Jordanian Tax Law available now; any country on request.
- Practice area: Tax now, others on request.
- Targets: eval tasks (e.g., LegalBench-style), downstream apps, model families.
- Size/splits: minimum 5,000 rows; train/test with lawyer-verified holdout.
step 2
Licensing & Budget
We align on rights and price band before work begins.
- License options:
- Non-exclusive: for internal model training; no redistribution.
- Exclusive: category/jurisdiction exclusivity available at a premium.
- Pricing: ranges 5.98–100$+ based on jurisdiction, exclusivity, scope, and reviewer seniority (we’ll agree the unit — per row/example or per pack).
- Add-ons: custom metadata, deeper reviews, or domain specialists priced separately.
- Protections: terms favoring our IP and business; NDA available on request.
Step 3
Authoring & Research
We identify AI weak points with lawyers and create varied scenario Q&A with explicit correct/incorrect examples.
- Lawyer interviews to map failure modes; adversarial and near-miss variants.
- Imaginary parties; real law application grounded in public sources.
Issue, Rule, Application, Conclusion with custom labels as needed.
- Core fields: IRAC, citations, jurisdiction, split.
- Negotiable extras: difficulty tiers, statute IDs, tagging taxonomies, dates, provenance notes.
Double-checked by lawyers; a held-out test set includes at least one variation per law/case check.
- Review led by Iyad Barakat and team.
- Explicit labeling of correct vs. incorrect reasoning for robust evaluation.
JSONL or CSV to your spec. Minimum dataset: 5,000 rows.
- Schema doc + sample rows included.
- Provenance and audit notes supplied where applicable.
Step 7
Payment & Milestones
Milestone-based with an upfront deposit.
- Kickoff: deposit & scope freeze.
- Mid-project sample drop for feedback.
- Final delivery & acceptance window; materials released on completion.
Typical custom delivery ~3 months for the minimum scope.
From Brief to Delivery
- Discovery call & scope confirmation.
- Weak-point analysis with lawyers; scenario design.
- Authoring with IRAC & custom metadata add-ons.
- Legal QA & adversarial variants (explicit correct/incorrect).
- Export to JSONL/CSV with schema & examples.
- Typical custom delivery: ~3 months for minimum scope.
FAQ
What do I get with a standard order?
Train and test splits in JSONL or CSV. Scenario Q&A with IRAC fields (Issue, Rule, Application, Conclusion), citations, and negotiable metadata add-ons.
How is pricing determined?
Pricing ranges from 5.98–100+ based on jurisdiction, exclusivity, scope, and reviewer seniority. Minimum dataset is 5,000 rows. Payments are milestone-based with a deposit.
Which jurisdictions are available now?
UAE Tax and Jordanian Tax Law. We can produce custom datasets for any country on request.
Do you support LegalBench?
We design examples to reflect patterns found in LegalBench-style tasks. See the paper and rankings.
Is this legal advice?
No. Our datasets are for machine learning training and research. Scenarios use imaginary parties; the law application is drafted and reviewed by lawyers.
Why This Matters Long-Term
The AI companies that win will not be the ones with the largest models,
but the ones with the best data: data designed specifically for their
domain, with deep practitioner input, clear provenance, and real-world
mistakes baked in.
Every dataset we build compounds our advantage: our attorney network gets
stronger, our pipelines get sharper, and your models benefit from a
growing foundation of trusted legal understanding.
Ready to Build Your Dataset?
Reduce hallucinations and boost LegalBench-style performance with jurisdiction-specific, IRAC-structured data.
PDF translation is coming soon.