Train Legal AI with Expert Curated, Jurisdiction Specific Data

We create high fidelity legal datasets that reduce hallucinations and improve LegalBench style performance. Start with UAE Tax and Jordanian Tax Law, or request a custom dataset for any country. Each example follows a scenario Q&A format with embedded IRAC.

Browse Tax Packs Request Custom Dataset

Heads up: PDF translation is coming soon.

Jurisdiction-Specific Data

Purpose-built datasets for real practice. Available now: UAE Tax and Jordanian Tax Law. Additional countries on request.

IRAC Scenario Q&A

Lawyer-authored hypotheticals and prompts with Issue, Rule, Application, Conclusion annotations—designed for reasoning-heavy fine-tuning.

Benchmark-Aware Curation

We tailor questions to patterns that help models score higher on LegalBench tasks (rankings), with explicit correct/incorrect variants.

How It Works

  1. Scope — Tell us your jurisdiction(s), practice area, and target tasks/models.
    • Jurisdictions: UAE Tax and Jordanian Tax Law available now; any country on request.
    • Practice area: Tax now, others on request.
    • Targets: eval tasks (e.g., LegalBench-style), downstream apps, model families.
    • Size/splits: minimum 5,000 rows; train/test with lawyer-verified holdout.
  2. Licensing & Budget — We align on rights and price band before work begins.
    • License options:
      • Non-exclusive: for internal model training; no redistribution.
      • Exclusive: category/jurisdiction exclusivity available at a premium.
    • Pricing: ranges 5.98–100$+ based on jurisdiction, exclusivity, scope, and reviewer seniority (we’ll agree the unit — per row/example or per pack).
    • Add-ons: custom metadata, deeper reviews, or domain specialists priced separately.
    • Protections: terms favoring our IP and business; NDA available on request.
  3. Authoring & Research — We identify AI weak points with lawyers and create varied scenario Q&A with explicit correct/incorrect examples.
    • Lawyer interviews to map failure modes; adversarial and near-miss variants.
    • Imaginary parties; real law application grounded in public sources.
  4. IRAC & Metadata — Issue, Rule, Application, Conclusion with custom labels as needed.
    • Core fields: IRAC, citations, jurisdiction, split.
    • Negotiable extras: difficulty tiers, statute IDs, tagging taxonomies, dates, provenance notes.
  5. Legal QA — Double-checked by lawyers; a held-out test set includes at least one variation per law/case check.
    • Review led by Iyad Barakat and team.
    • Explicit labeling of correct vs. incorrect reasoning for robust evaluation.
  6. Delivery — JSONL or CSV to your spec. Minimum dataset: 5,000 rows.
    • Schema doc + sample rows included.
    • Provenance and audit notes supplied where applicable.
  7. Payment & Milestones — Milestone-based with an upfront deposit.
    • Kickoff: deposit & scope freeze.
    • Mid-project sample drop for feedback.
    • Final delivery & acceptance window; materials released on completion.
  8. Timeline — Typical custom delivery ~3 months for the minimum scope.

What Our Users Say

“Their IRAC scenarios cut down hallucinations and made our tax model far more grounded on jurisdiction-specific prompts.”
— Head of AI, LegalTech Startup
“The held-out test set with lawyer review gave us trustworthy evals out of the box.”
— NLP Research Lead

5,000+

Rows per Minimum Dataset

2

Jurisdictions Available Now

IRAC

Structured Reasoning Labels

0 PII

Imaginary Parties, Real Law

Coverage

Available now: Tax Law (UAE, Jordan).
On request: Any country / additional areas (Contracts, Litigation, Regulatory, Compliance, Privacy, IP, Employment, M&A).

Pricing & Minimums

Minimum dataset: 5,000 rows.

Pricing: ranges from 5.98–100+ depending on jurisdiction, exclusivity, scope, and reviewer seniority.
(Exclusive categories & complex jurisdictions priced higher.)

Terms: milestone-based with an upfront deposit. Non-exclusive and exclusive licenses available (with terms that protect our IP and business).

Request Custom Dataset Browse Tax Packs

Data Ethics & Provenance

No Personal Data

Scenarios use imaginary parties; real law application is drafted and double-checked by lawyers.

Open Sources

Derived from our legal research and public information with citations and provenance.

Train/Test Split

Every order includes a held-out test set with variations per law/case check, reviewed by a lawyer.

Licensing Options

Non-exclusive and exclusive packages with protective terms.

From Brief to Delivery

  1. Discovery call & scope confirmation.
  2. Weak-point analysis with lawyers; scenario design.
  3. Authoring with IRAC & custom metadata add-ons.
  4. Legal QA & adversarial variants (explicit correct/incorrect).
  5. Export to JSONL/CSV with schema & examples.
  6. Typical custom delivery: ~3 months for minimum scope.

Quick Look: Sample JSONL Row

{"id":"uae-tax-000123",
 "jurisdiction":"UAE",
 "practice_area":"Tax",
 "question":"A VAT-registered company receives an advance payment for a future supply. When is the output VAT due?",
 "answer":"Under UAE VAT law, tax becomes due at the earlier of invoice issuance, receipt of payment, or supply date. Here, VAT is due upon receipt of the advance.",
 "irac":{"issue":"VAT timing on advance","rule":"Tax due at earlier of invoice/payment/supply","application":"Payment received before invoice/supply -> VAT due now","conclusion":"Output VAT due upon advance"},
 "citations":["UAE VAT Decree-Law No. 8 of 2017, Art. X"],
 "labels":{"correct":true,"difficulty":"medium"},
 "metadata":{"variant":"base","dataset_split":"train"}}
  

Case Studies

See how curated, IRAC-structured data reduces hallucinations and strengthens jurisdiction-specific reasoning.

Tax Reasoning (UAE)

IRAC scenarios with explicit correct/incorrect variants reduced hallucinations on tricky VAT timing prompts.

Browse Packs

Cross-Jurisdiction Consistency

Adversarial variants improved robustness on LegalBench-style tasks across similar fact patterns.

Request Custom

Evaluation You Can Trust

Held-out test sets include at least one variation per law/case check, with lawyer verification.

See Examples

Why Choose Us

Feature Ours Generic Web Crawl
IRAC-Structured Q&AYesNo
Jurisdiction TaggingFine-grainedSparse
Lawyer QAYesNo
Benchmark-Aware DesignYes (LegalBench)No

FAQ

What do I get with a standard order?

Train and test splits in JSONL or CSV. Scenario Q&A with IRAC fields (Issue, Rule, Application, Conclusion), citations, and negotiable metadata add-ons.

How is pricing determined?

Pricing ranges from 5.98–100+ based on jurisdiction, exclusivity, scope, and reviewer seniority. Minimum dataset is 5,000 rows. Payments are milestone-based with a deposit.

Which jurisdictions are available now?

UAE Tax and Jordanian Tax Law. We can produce custom datasets for any country on request.

Do you support LegalBench?

We design examples to reflect patterns found in LegalBench-style tasks. See the paper and rankings.

Is this legal advice?

No. Our datasets are for machine learning training and research. Scenarios use imaginary parties; the law application is drafted and reviewed by lawyers.

Ready to Build Your Dataset?

Reduce hallucinations and boost LegalBench-style performance with jurisdiction-specific, IRAC-structured data.

Request Custom Dataset Browse Tax Packs

PDF translation is coming soon.