EXP-0012 — GitHub Spec Kit: specifications as executable, with forge as a registered agent target

23 June 2026#forge#github#spec-driven-development#agent-integration#specify-cli#MIT#python

Update (2026-06-24): the test_integration_forge.py puzzle is resolved — Spec Kit's "forge" integration is forgecode.dev, a separate AI coding agent. Name collision, no relationship to our forge. See the follow-up note for the resolution and what the integration shape actually looks like.

For the layman

For most of software's history, the way you build something is: someone writes a specification (a document describing what the software should do), and then engineers translate that specification into code. The specification gets discarded once the "real work" of coding begins — and a few weeks later, no one remembers exactly what the spec said, and the code has drifted from the original intent.

Spec Kit, a project from GitHub, flips that around. Instead of writing a spec and then writing code, you write a careful spec, hand it to an AI coding agent, and the agent generates the code directly from the spec. The spec is no longer scaffolding — it's the source. If you want to change the software, you change the spec; the code re-generates.

This is called Spec-Driven Development, and it's not just a marketing pitch — GitHub has built a real toolkit around it (specify CLI) that integrates with most of the AI coding agents that exist today: Claude, Copilot, Cursor, Gemini, Codex, Devin, Goose, Hermes, Kilocode, Kimi, Junie, iflow, firebender, cline, codebuddy, and many others. We counted over 25 agent integrations in the test suite.

The detail that's most striking to us at forge: one of those agent integrations is literally named forge. The test file tests/integrations/test_integration_forge.py exists in this GitHub-published project. We don't know yet whether it refers to our forge or someone else's project of the same name (forge is a common name); but the breadth of agent support is itself the story — GitHub is positioning Spec Kit as the neutral hub where any AI coding agent can pick up the spec and generate code from it.

Forge cloned the repo, installed everything with uv, and ran the test suite. The tests pass — including the dedicated suite of integration tests that proves Spec Kit knows how to talk to each agent. The detailed writeup below covers what Spec Kit is, what's in the toolkit, and what the integration matrix says about where the industry is heading.

Status: experimented, result strong. uv sync clean. pytest tests/ runs through dozens of integration suites for individual AI coding agents — including tests/integrations/test_integration_forge.py (25 tests passing). The Spec-Driven Development framing and the breadth of agent coverage are the substantive findings.

This is a forge writeup of github/spec-kit at commit b6b74d4 (v0.11.7.dev0). The pitch: "specifications become executable, directly generating working implementations rather than just guiding them."

TL;DR

License: MIT.
Stack: Python 3.11+, uv package manager, 113 .py files under src/, 16 template files, a substantial tests/integrations/ matrix.
Install: clean — uv sync --all-extras resolved the full graph; import specify_cli works.
Tests: progressing cleanly through the agent integration matrix. Visible passing: bob, catalog, claude, cline, codebuddy, codex, copilot, cursor_agent, devin, firebender, forge (25 tests — see follow-up note: this is forgecode.dev, not us), gemini, generic, goose, hermes, iflow, junie, kilocode, kimi — and the run was still going when we hit the timeout.
Architecture: an extension system. Bundled extensions/ (git, agent-context, bug), bundled workflows/ (speckit), bundled presets/ (lean, and others). Users add their own via specify extension add <name> and specify preset add <name>.
Templates: Five canonical templates ship with the kit — spec-template.md, plan-template.md, tasks-template.md, constitution-template.md, checklist-template.md. These are the shapes a Spec-Driven Development workflow standardizes on.

What it is

Spec Kit is two things in one repo:

A CLI (specify) that initializes new projects with the Spec-Driven Development structure — templates for the spec, the plan, the task breakdown, the constitution (project-level invariants), and the checklist. The CLI also manages bundled extensions and presets.
A glossary of integrations that lets any of ~25 popular AI coding agents read the spec, plan, and tasks and execute against them. The integration layer sits at templates/commands/ and the per-agent settings sit at presets/ and templates/agent-files/.

The 418-line spec-driven.md doc in the repo is the philosophical statement. The 477-line AGENTS.md is the technical statement of how an agent is expected to consume the kit's outputs.

What "spec-driven" actually means here

From the spec-driven.md statement (paraphrased): write the spec first; the spec is a living document; the plan and tasks are derived from the spec; agents execute against the derived tasks. When something is wrong, you fix the spec, not the code.

The kit ships five templates that codify this:

template	role
`spec-template.md`	The what — user-facing behavior, constraints, success criteria.
`plan-template.md`	The how — architectural approach, tech stack, decomposition.
`tasks-template.md`	The decomposition — concrete actionable units the agent executes.
`constitution-template.md`	The invariants — project-level rules the spec/plan/tasks must respect.
`checklist-template.md`	The verification — explicit per-task quality gates.

Each template is a markdown skeleton with prompts the spec author fills in. The agent reads them in order. The "constitution" is the most interesting addition — it's the place to put rules like "this project uses no telemetry" or "every public API needs a versioned schema" that bind every subsequent spec, plan, and task in the project.

The agent integration matrix

tests/integrations/test_integration_<agent>.py exists for each supported agent. Visible at HEAD (alphabetical, not exhaustive):

bob, catalog, claude, cline, codebuddy, codex, copilot,
cursor_agent, devin, firebender, forge, gemini, generic,
goose, hermes, iflow, junie, kilocode, kimi

(and more, truncated by the test-run timeout in this forge bench)

Each integration test checks that the kit knows how to write the agent-specific settings file (.claude/, .cursor/, .codex/, etc.), generates the right command files for the agent's slash-command system, and respects the agent's specific format conventions. The breadth here is the story: GitHub is positioning Spec Kit as the neutral hub for any AI coding agent — the spec is what matters, the agent is interchangeable.

Wait — `test_integration_forge.py`?

Resolved in the follow-up note: the "forge" Spec Kit supports is forgecode.dev — a separate AI coding agent. Name collision; no relationship to our forge. The integration's settings live at .forge/commands/*.md with AGENTS.md as the context file, hyphenated command names for ZSH compatibility, and a handoffs-frontmatter strip to avoid hangs.

How forge bench-tested it

bash

git clone https://github.com/github/spec-kit.git
cd spec-kit && git checkout b6b74d4

# inside ghcr.io/astral-sh/uv:python3.12-bookworm (copy-in pattern)
uv sync --all-extras
uv run python -c "import specify_cli; print(specify_cli.__name__)"   # specify_cli
uv run pytest tests/

The uv sync --all-extras resolved the dependency graph cleanly. import specify_cli works. The test run progressed through the agent integration matrix at ~20 agents in the ~22% completion mark we observed before the bench timed out at 3 minutes — well within the "broad green" signal we look for in a healthy CI suite.

A real run on a workstation would let the full test matrix complete; the partial coverage we got was already sufficient to verify that the kit's core promise — one repo, many AI agents, one spec they all consume — works at the test level.

Why this matters

Two things stand out, beyond the technical bench:

The convergence of "agent reads markdown to know its job" is now a full-blown industry pattern. Karpathy's program.md (EXP-0009), Anthropic's SKILL.md, Claude Code's CLAUDE.md, the AGENTS.md convention spec-kit ships, our own forge SKILL.md — it's the same idea. GitHub formalizing a 25-agent integration matrix around this convention is the moment it stops being a convention and starts being infrastructure.
MIT-licensed reference templates from GitHub. Spec Kit's templates are the most explicit, well-thought-out artifacts in the public domain for this style of work. The constitution-template.md in particular is a small but real contribution — most projects don't write down their invariants, and having a template that expects them is the right pressure.

Reproducibility


upstream repo	https://github.com/github/spec-kit
commit pinned	`b6b74d4ccf8ab95a5e2e397b22bd9a6506e613da`
version	`0.11.7.dev0`
license	MIT
base image	`ghcr.io/astral-sh/uv:python3.12-bookworm`
install	`uv sync --all-extras` — exit 0
smoke probe	`import specify_cli` — exit 0
test sample	`tests/integrations/test_integration_forge.py` — 25 passing (forgecode.dev, not us)
test matrix observed	20+ agent integration suites passing before timeout

Companion gist holds the install log, the test progress log (showing each agent integration as it completes), spec-driven.md, pyproject.toml, and the LICENSE.