The Datasets We Actually Build

No generic "legal AI data." We design training datasets for real-world use: standard jurisdiction packs when you need to move fast, custom builds when you have a very specific problem, and long-term partnerships when the law is moving and your models need to keep up.

Talk About Your Use Case See Standard Datasets

How We Think About Data

Start where you are. If you need a fast way to de-risk an experiment, we have standard datasets. If you have a very specific problem, we build bespoke. If you want your models to stay current, we maintain and extend what you already have.

Standard Regional Datasets

Ready-made, professionally validated jurisdiction packs. Ideal if you want to move fast and get something into your pipeline without a 3-month scoping exercise.

Predefined scope, jurisdiction and practice area
Structured Q&A with correct and strategic wrong answers
Straightforward per-question pricing

Custom Builds

When your problem does not fit into a template: rare jurisdictions, edge practice areas, multilingual coverage, or data aligned to your internal frameworks.

Co-designed with your team and your lawyers
Scope, size, and validation levels defined together
Milestone-based payments tied to real deliverables

Partnerships & Updates

Laws change. Your models should not quietly drift out of date. We maintain, amend, and extend your datasets over time.

Annual maintenance or “data insurance” packages
Amendment packs when laws or guidance change
Advisory on when and how to retrain

Product 1

Standard Datasets For Popular Jurisdictions

Professionally validated Q&A datasets for high-demand jurisdictions and practice areas. Built from real conversations with lawyers, not just archives and statutes.

What you actually get

Q&A pairs based on real attorney surveys and interviews
Correct answers grounded in statutes, case law, and practice
Strategically incorrect answers that reflect real hallucinations
Full metadata: jurisdiction, practice area, difficulty, validation notes
JSONL or CSV, ready to drop into your training pipeline

How pricing works

Per-question pricing, with transparent minimums:

Standard practice areas: roughly 3–8 USD per Q&A pair
Hard or niche domains: roughly 12–25 USD per Q&A pair
Rare or premium jurisdictions: up to 50–100 USD per pair

Licensing is flexible: time-limited, perpetual, exclusive or non-exclusive depending on how much competitive edge you want.

Request A Scope & Quote Browse Current Packs

Where we are active today

Jurisdiction	Coverage	Status
UAE (incl. DIFC)	Commercial, employment, tax	Shipping now
Jordan	Commercial, contracts, tax	Shipping now
Saudi Arabia	Commercial, employment, regulatory	Priority pipeline
US Federal	Corporate, securities, IP	Pilot builds

Need something not on this list? That is usually a custom build or partnership conversation.

Product 2

When You Need Something That Does Not Exist Yet

Custom datasets for domains, jurisdictions or workflows that do not fit into a template. We design the data with you, your lawyers, and your models in mind.

When this makes sense

You are working in niche practice areas or cross-border transactions.
You need Arabic–English or other multilingual coverage done properly.
You want the data to reflect your internal playbooks and risk models.
You need a very explicit validation standard (for example, 50–100 percent lawyer-certified).

How a typical engagement runs

Weeks 1–2: Requirements and use-case mapping with your team.
Weeks 2–3: Finalise scope, size, quality bar, and licensing.
Weeks 3–4: Recruit and brief the right attorneys.
Weeks 4–8: Scenario authoring and Q&A creation.
Weeks 8–10: Multi-layer legal validation and revisions.
Weeks 10–12: Final delivery, documentation, and integration support.

Pricing is project-based and milestone-based. You pay as we hit the agreed gates, not months in advance.

Typical custom project ranges

Specialised practice area (for example: tax, healthcare): 30k–80k USD
Cross-border or multi-jurisdictional: 50k–120k USD
Heavily regulated domains (for example: securities, banking): 60k–200k USD

Exact numbers depend on jurisdiction, depth, and validation level. The point is simple: we commit to a clear scope, clear deliverables, and clear milestones before work begins.

Tell Us What You Need Built

Product 3

Keeping Your Data Current

Laws change quietly. Models do not, unless you tell them to. We monitor the jurisdictions you care about and deliver updates, amendments, and retraining guidance so your AI does not drift.

Annual Maintenance

For a fixed percentage of the initial dataset cost (typically 10–20 percent per year), we keep your data aligned with legal reality.

Amendment packs when statutes, regulations or guidance change
Deprecated items flagged when they become wrong
Updated citations and metadata

Data Insurance Model

For customers who do not want to think about individual updates. A fixed annual fee that covers unlimited incremental changes.

Predictable budget, no per-update surprises
Continuous small updates instead of big, disruptive overhauls
Ideal for production systems in regulated environments

Premium Services

Beyond Just Delivering Files

If you are serious about performance, governance, and smooth integration, we can go further than “here is a dataset, good luck.”

Advanced Metadata & Analytics

Extra signal for model training and evaluation.

Difficulty ratings and ambiguity flags
Hallucination risk and stability scores
Cross-references across scenarios and concepts

Integration Support

We sit with your engineers and help wire the data into your stack properly.

Schema mapping and data pipelines
Versioning and rollback strategies
Evaluation harnesses and test sets

Consulting & Training

Help your team use the data well and measure the right things.

Fine-tuning strategy and evaluation design
Workshops with your ML and legal teams
Ongoing check-ins as you scale

Audit-Ready Provenance

For organisations that need to prove how their data was created and validated.

Full audit trails and validator records
Certification packs for compliance teams
Optional cryptographic or ledger-based sealing

Questions We Get About Products

Can we start small before committing to a larger deal?

Yes. Many teams start with a smaller standard dataset or a scoped pilot for one jurisdiction and practice area. Once you see the impact on your models, we expand together.

Can we mix your data with our own internal examples?

That is usually the best setup. We provide the structured legal backbone; you layer in your own proprietary workflows, templates, and edge cases. We can help with schema design.

Do you lock us into a specific model provider?

No. We stay model-agnostic on purpose. Our job is to be the best possible data layer, whether you are fine-tuning your own models or using third-party APIs.

How Our Pricing Works

We want you to know exactly what you are paying for. We would rather be very clear than “competitive but vague.”

Per-question transparency

You see the math: price per Q&A pair times number of pairs. No hidden platform fees, no vague “AI uplift” charges.

Quality never gets discounted

If we lower price, we do it by adjusting license duration or exclusivity, not by cutting corners on legal quality.

Milestone-based for customs

For custom builds, you pay when specific deliverables are accepted. Deposits, then gated releases. It aligns incentives on both sides.

Volume rewards commitment

Multi-dataset and multi-year partnerships come with meaningful discounts. The more we work together, the cheaper it becomes per dataset.

Pricing & Minimums

Minimum dataset: 5,000 rows.

Pricing: ranges from 5.98–100+ depending on jurisdiction, exclusivity, scope, and reviewer seniority.
(Exclusive categories & complex jurisdictions priced higher.)

Terms: milestone-based with an upfront deposit. Non-exclusive and exclusive licenses available (with terms that protect our IP and business).

Request Custom Dataset Browse Tax Packs

Ready To See What This Looks Like For You?

Whether you need a standard UAE or Jordan pack, a custom build in a niche area, or ongoing updates for a production system, we can design the right data layer with you.

Request A Custom Quote Talk To Us About A Pilot

What Better Data Actually Buys You

Lower hallucination rates are not marketing copy for us. They show up in how your product behaves with real users, real lawyers, and real regulators.

Higher accuracy on real tasks

Customers see significant gains on their internal benchmarks when they shift from generic legal crawls to our attorney-built datasets. That is the difference between a demo and a product.

Fewer hallucinations that matter

Because we deliberately include plausible but wrong answers, models learn to avoid entire classes of legal mistakes: fictional cases, misapplied rules, jurisdiction drift.

Audit trails for compliance

You get provenance, validator records, and documentation to bring to internal risk, compliance, or external regulators. It is not just “we used good data,” it is provable.

A foundation you can build on

Once the foundation is in place, you can layer your own knowledge, workflows, and models on top. You are not rebuilding from scratch every time the law or your product shifts.

What This Actually Changes

Customers use our datasets to move from “interesting prototype” to systems their legal teams can actually trust.

Higher Accuracy

Models trained on our data see substantial accuracy gains on real-world legal questions compared to generic web or case-law-only corpora.

Fewer Hallucinations

Strategic wrong answers teach the model to recognize and avoid subtle legal mistakes instead of generating them confidently.

Audit-Ready Provenance

Attorney validation, citations, and metadata give you the documentation your compliance and legal teams demand.

5,000+

Rows per Minimum Dataset

2

Jurisdictions Available Now

IRAC

Structured Reasoning Labels

0 PII

Imaginary Parties, Real Law

Coverage

Available now: Tax Law (UAE, Jordan).
On request: Any country / additional areas (Contracts, Litigation, Regulatory, Compliance, Privacy, IP, Employment, M&A).

Tax Law (UAE/Jordan)
Contracts
Litigation
Regulatory
Compliance
Privacy & Data
IP & Patents

Case Studies

See how curated, IRAC-structured data reduces hallucinations and strengthens jurisdiction-specific reasoning.

Tax Reasoning (UAE)

IRAC scenarios with explicit correct/incorrect variants reduced hallucinations on tricky VAT timing prompts.

Browse Packs

Cross-Jurisdiction Consistency

Adversarial variants improved robustness on LegalBench-style tasks across similar fact patterns.

Request Custom

Evaluation You Can Trust

Held-out test sets include at least one variation per law/case check, with lawyer verification.

See Examples

Why Choose Us

Feature	Ours	Generic Web Crawl
IRAC-Structured Q&A	Yes	No
Jurisdiction Tagging	Fine-grained	Sparse
Lawyer QA	Yes	No
Benchmark-Aware Design	Yes (LegalBench)	No

Data Ethics & Provenance

No Personal Data

Scenarios use imaginary parties; real law application is drafted and double-checked by lawyers.

Open Sources

Derived from our legal research and public information with citations and provenance.

Train/Test Split

Every order includes a held-out test set with variations per law/case check, reviewed by a lawyer.

Licensing Options

Non-exclusive and exclusive packages with protective terms.

Close the Information Entropy Gap

We collaborate with seasoned attorneys to survey, author, refine, and annotate theoretical legal scenarios. Each pack includes grounded citations and structured labels so your model learns the why, not just the what. Which is tailor made to reduce hallucinations.

Quality & QA: Review led by top lawyers and their teams.
Custom Metadata Add-Ons: Extend with negotiable extras (difficulty tiers, statute IDs, tags, and more).
Benchmark Alignment: Crafted to reduce hallucinations and improve LegalBench-style outcomes.

How We Deliver

step 1

Scope

Tell us your jurisdiction(s), practice area, and target tasks/models.

Jurisdictions: UAE Tax and Jordanian Tax Law available now; any country on request.
Practice area: Tax now, others on request.
Targets: eval tasks (e.g., LegalBench-style), downstream apps, model families.
Size/splits: minimum 5,000 rows; train/test with lawyer-verified holdout.

step 2

Licensing & Budget

We align on rights and price band before work begins.

License options:
- Non-exclusive: for internal model training; no redistribution.
- Exclusive: category/jurisdiction exclusivity available at a premium.
Pricing: ranges 5.98–100$+ based on jurisdiction, exclusivity, scope, and reviewer seniority (we’ll agree the unit — per row/example or per pack).
Add-ons: custom metadata, deeper reviews, or domain specialists priced separately.
Protections: terms favoring our IP and business; NDA available on request.

Step 3

Authoring & Research

We identify AI weak points with lawyers and create varied scenario Q&A with explicit correct/incorrect examples.

Lawyer interviews to map failure modes; adversarial and near-miss variants.
Imaginary parties; real law application grounded in public sources.

Step 4

IRAC & Metadata

Issue, Rule, Application, Conclusion with custom labels as needed.

Core fields: IRAC, citations, jurisdiction, split.
Negotiable extras: difficulty tiers, statute IDs, tagging taxonomies, dates, provenance notes.

Step 5

Legal QA

Double-checked by lawyers; a held-out test set includes at least one variation per law/case check.

Review led by Iyad Barakat and team.
Explicit labeling of correct vs. incorrect reasoning for robust evaluation.

Step 6

Delivery

JSONL or CSV to your spec. Minimum dataset: 5,000 rows.

Schema doc + sample rows included.
Provenance and audit notes supplied where applicable.

Step 7

Payment & Milestones

Milestone-based with an upfront deposit.

Kickoff: deposit & scope freeze.
Mid-project sample drop for feedback.
Final delivery & acceptance window; materials released on completion.

Step 8

Timeline

Typical custom delivery ~3 months for the minimum scope.

From Brief to Delivery

Discovery call & scope confirmation.
Weak-point analysis with lawyers; scenario design.
Authoring with IRAC & custom metadata add-ons.
Legal QA & adversarial variants (explicit correct/incorrect).
Export to JSONL/CSV with schema & examples.
Typical custom delivery: ~3 months for minimum scope.

FAQ

What do I get with a standard order?

Train and test splits in JSONL or CSV. Scenario Q&A with IRAC fields (Issue, Rule, Application, Conclusion), citations, and negotiable metadata add-ons.

How is pricing determined?

Pricing ranges from 5.98–100+ based on jurisdiction, exclusivity, scope, and reviewer seniority. Minimum dataset is 5,000 rows. Payments are milestone-based with a deposit.

Which jurisdictions are available now?

UAE Tax and Jordanian Tax Law. We can produce custom datasets for any country on request.

Do you support LegalBench?

We design examples to reflect patterns found in LegalBench-style tasks. See the paper and rankings.

Is this legal advice?

No. Our datasets are for machine learning training and research. Scenarios use imaginary parties; the law application is drafted and reviewed by lawyers.

Why This Matters Long-Term

The AI companies that win will not be the ones with the largest models, but the ones with the best data: data designed specifically for their domain, with deep practitioner input, clear provenance, and real-world mistakes baked in.

Every dataset we build compounds our advantage: our attorney network gets stronger, our pipelines get sharper, and your models benefit from a growing foundation of trusted legal understanding.

Ready to Build Your Dataset?

Reduce hallucinations and boost LegalBench-style performance with jurisdiction-specific, IRAC-structured data.

Request Custom Dataset Browse Tax Packs

PDF translation is coming soon.