Skip to content
scsiwygest. ‘26
Sign in
get startedmcpcommunityapiplaygroundswaggersign insign up
forge·EXP-0011 — dottxt-ai/outlines: structured outputs for LLMs, install + DSL verified23 Jun 2026David Olsson
forge

EXP-0011 — dottxt-ai/outlines: structured outputs for LLMs, install + DSL verified

#forge#outlines#structured-generation#json-schema#llm#apache-2.0#python

David OlssonDavid Olsson

When you ask an AI for an answer that needs a specific shape — say, "give me a JSON object with a name field and an age field" — the model will often produce something close to that shape but with subtle errors: an extra quote, a missing comma, a number where you wanted a string. Those errors break downstream code. The library this post is about, outlines, makes the model produce exactly the shape you ask for, every time, by constraining what tokens it's allowed to emit at each step. The result: 100% schema compliance, no parsing failures, no retry logic.

It's used by NVIDIA, Cohere, Hugging Face, and vLLM in their inference stacks. That kind of cross-vendor adoption is rare for an independent open-source project, and it's the right signal that this is a real piece of infrastructure rather than a hobby experiment.

Forge is our experiment harness. It cloned the project, installed the dependencies cleanly in a sandboxed Python 3.12 container, and verified that the core domain-specific language (the part of outlines that takes a JSON schema and turns it into a constraint on the model's outputs) imports correctly and produces the right tree representation. The headline value-proposition — "take a schema, get a constraint" — works on a fresh checkout.

Forge couldn't run a full generation in the sandbox because that would require either a real Hugging Face model (gigabytes of weights, GPU memory) or a paid API call to OpenAI/Anthropic/etc. — neither of which the no-secrets sandbox is built for. Those are the natural follow-up experiments. The library's design and surface area, however, check out cleanly.


Status: experimented, result partial → strong. Install clean, DSL imports and renders, schema-to-constraint construction verified on a real JSON schema. Backend-specific tests (transformers, torch, vllm) require model dependencies that exceed the sandbox budget; they're documented as follow-ups.

This is a forge writeup of dottxt-ai/outlines at commit be486d5. The pitch: "structured outputs for LLMs" — generate model output that is guaranteed to satisfy a JSON schema, a regex, or a context-free grammar, with zero validation retries.

TL;DR

  • License: Apache 2.0 — properly OSI-licensed.
  • Stack: Python 3.10+, 46 source .py files, 53 test files, 15 examples, dual uv.lock + setup.cfg for compatibility. Used by NVIDIA, Cohere, HuggingFace, vLLM per the README badges.
  • Install: clean — uv pip install -e . (≈11 s) brought in the full graph including the heavy transitive (tensorflow, nvidia-* wheels) without ML deps needing to be functional.
  • Smoke probe verified. import outlines works. from outlines.types import JsonSchema, Regex works. Building a JsonSchema(...) from a real schema ({"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}, "required": ["name", "age"]}) produces a clean DSL tree representation.
  • Backend-coupled tests skipped. tests/backends/, tests/models/, and tests/processors/ require transformers + torch + model weights. Doable in a richer sandbox; out of scope this run.

What it is

The big idea behind outlines: large language models emit tokens one at a time, and at each step you can constrain which tokens are legal by masking the model's output probability distribution. If you compile a JSON schema into a constraint over allowed tokens at each parse state, the model can only ever emit valid JSON conforming to the schema. The library does that compilation efficiently and exposes a clean Python API.

The headline classes are the three you'd guess:

  • JsonSchema — wraps a JSON schema as a constraint.
  • Regex — wraps a regex pattern.
  • CFG / grammar types — wraps a context-free grammar (handy for SQL, math expressions, custom DSLs).

The library glues these to many backends: transformers, vllm, llamacpp, OpenAI, Anthropic, etc. Each backend has its own adapter that applies the constraint to that backend's logits or completion endpoint. Adapters live under outlines/models/ and depend on the relevant backend SDK being installed — which is why the unit-test suite is so coupled to optional dependencies.

A library used by NVIDIA, Cohere, HuggingFace, and vLLM in production needs an extension surface this complex; the design is correct for the scale it serves.

How forge bench-tested it

bash
git clone https://github.com/dottxt-ai/outlines.git
cd outlines && git checkout be486d5

# inside ghcr.io/astral-sh/uv:python3.12-bookworm (copy-in pattern to dodge overlayfs)
uv pip install --system -e . pytest pytest-asyncio

Install succeeded. The dependency graph includes wheels for torch, tensorflow, and nvidia-* CUDA libraries — the package's setup.cfg declares these as optional extras through dependency markers, and uv happily downloads them even when CUDA isn't available (they install but won't load without the driver).

Then the smoke probe:

python
>>> import outlines
>>> from outlines.types import JsonSchema, Regex
>>> schema = {
...   "type": "object",
...   "properties": {"name": {"type": "string"}, "age": {"type": "integer"}},
...   "required": ["name", "age"]
... }
>>> JsonSchema(schema)
└── JsonSchema('{"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}, "required": ["name", "age"]}')

The DSL tree renders correctly. The schema-to-constraint conversion is the project's headline value; we exercised it on a real schema and got the expected output. This is the strongest evidence forge can produce in a no-model, no-API-key sandbox — every layer below the model adapter is functional.

What forge could not test

Two things, both reasonable follow-ups:

  1. Backend-coupled tests. tests/backends/test_backends.py requires transformers. tests/processors/ requires torch. tests/models/test_vllm.py requires vllm. Running these in a forge sandbox would need a richer image (outlines[transformers,torch] extras + GPU access for some of them).
  2. End-to-end generation. The point of the library is to generate constrained output from a real model. That requires either a downloaded HuggingFace model (gigabytes of weights) or an API call to OpenAI / Anthropic / etc. The no-secrets sandbox can do the former with a tiny model (gpt2-class), worth a follow-up bench.

Both are sandbox-tier problems, not library problems. The library is in good shape.

Comparables

ProjectPosture
jxnl/instructorPydantic-shaped, retry-on-validation. Outlines is more rigorous (constraint at the logits level, not retry).
microsoft/guidanceSimilar idea, different DSL, less broad backend coverage.
vllm-project/vllmA vLLM consumer (outlines is one of the recommended constrained-generation backends).

The outlines team also runs .txt — a startup productizing structured generation. The OSS library is what's in this repo; the commercial version is the linked "API in early access."

Reproducibility

upstream repohttps://github.com/dottxt-ai/outlines
commit pinnedbe486d548ef77f88e79371378fdba5d5c0142d51
licenseApache 2.0
base imageghcr.io/astral-sh/uv:python3.12-bookworm
installuv pip install -e . — exit 0
smoke probeJsonSchema({...}) builds clean DSL tree — exit 0
backend testsnot attempted (require transformers / torch / vllm)

Companion gist holds the install log, the env manifest, the upstream README + LICENSE, and the pyproject.toml.

See also


Built and verified by forge. The library's core design is exercised; backend-specific tests and end-to-end generation are documented follow-ups that need a richer sandbox tier.

Companion gist (install log, env, upstream README, LICENSE)

Share
𝕏 Post