Themes at 18: what forge's first eighteen experiments add up to, and three new harvesters that follow

25 June 2026

This blog has now run eighteen experiments — eighteen separate projects forge has cloned, built in a clean sandbox, tried to use, and written up. Some were small (a 200-line dashboard); some were large (a 674 MB smart-glasses operating system); a couple weren't projects at all but blog posts forge turned into projects by implementing the system they described. Each writeup stood alone.

This post is different. It's the cross-cutting view: after eighteen experiments, what patterns do we see repeating? Where is the wider open-source AI ecosystem actually converging? And — most usefully — what does that imply for what forge should bench next?

The short answer to the last question: the original way forge finds projects (someone reacts with 🧪 in our Slack #development channel) is too narrow. We've now seen enough about what makes a good forge candidate that we can also go look for those candidates in the wild. This post introduces three new skills that do exactly that — they watch GitHub, RSS feeds, and a curated list of productive authors for projects that match the patterns forge knows how to bench well.

The five themes

After 18 experiments, five themes recur strongly.

1. The SKILL.md / AGENTS.md / program.md convention is now industry consensus

By far the strongest pattern. Independently invented or adopted by ten different projects forge has benched, with no shared lineage:

forge itself — 10+ skills under plugin/skills/forge-*, each a SKILL.md with YAML frontmatter.
Karpathy's autoresearch — program.md, which Karpathy explicitly calls "a super lightweight skill."
GitHub Spec Kit — AGENTS.md plus 25 agent-specific integrations.
HKUDS Vibe-Trading — agent/SKILL.md.
calesthio OpenMontage — 115 SKILL.md files in one repo, plus dedicated AGENTS.md, CLAUDE.md, CODEX.md, COPILOT.md, CURSOR.md per agent.
Mentra-Community MentraOS — own SKILL.md-like protocol convention.
safishamsi Graphify — graphify install --platform <one of 17 agents> flips an agent-specific SKILL.md into that agent's config dir.
Anthropic's own skills marketplace — mcp__anthropic-skills:* cluster.
The vibe-studio suite — 10 sibling SKILL.md files (3d-vibe, doc-vibe, etc.)
nolly-studio cult-ui — registry pattern (convention-adjacent).

This isn't convergence anymore — it's the convention. The shape: YAML frontmatter declaring name, version, description, dependencies, optional env, optional mcp: block, with the agent-facing body as markdown below.

Implication: any GitHub repo with a SKILL.md, AGENTS.md, or program.md file in the root is a forge candidate. The presence of the convention is the signal.

2. Article-as-spec is forge's highest-leverage template

Three experiments used the article-as-spec template — turning a blog post or substack essay into a working Python package. Three out of three shipped runnable code:

EXP-0006 — agentic-rl-runner from Cameron Wolfe's Agentic RL essay. Shipped to github.com/worksona/agentic-rl-runner.
EXP-0013 — ard-tools from Hugging Face's ARD launch. Shipped to github.com/worksona/ard-tools.
EXP-0018 — Graphify writeup from a MarkTechPost recipe.

The template's success rate is 100% so far. The bottleneck isn't the template; it's that we discover article-as-spec candidates incidentally (someone 🧪s a link). The fix is active discovery.

Implication: RSS feeds of MarkTechPost, HuggingFace blog, Cameron Wolfe substack, Anthropic blog, and dottxt.co — these five sources have produced all three article-as-spec wins so far. Watch them.

3. Productive authors keep being productive

Ten owners produced eighteen experiments. Many produced multiple:

owner	benched experiments	hit rate
Karpathy	EXP-0006 (Wolfe-shape), EXP-0009, EXP-0010	3/3 strong-shape
HKUDS	EXP-0005, EXP-0015	2/2 strong
dottxt-ai, motiful, github, nolly-studio, calesthio, safishamsi, pinokiocomputer	1 each	all strong-or-partial-strong

Zero abandoned benches. Zero "this owner ships placeholder repos." When a previously-validated owner ships a new public repo, the prior on it being worth a forge bench is very high.

Implication: maintain an explicit watchlist. When Karpathy ships a new public repo, forge should know about it within hours, not weeks.

4. The hosted-SaaS pattern note is its own valid output

Two experiments produced pattern-notes instead of code: EXP-0001 (AutoWiki by Factory.ai) and EXP-0016 (Mistral OCR 4). Both are hosted SaaS with no clonable source. Both produced substantive design notes about what the closed product does and how a comparable open project would be built. Both surfaced specific open alternatives for forge to bench next.

This is now a known-good output shape: write up the design, recommend open alternatives, don't pretend to bench what we can't bench. It works.

Implication: the pattern-note isn't a fallback — it's a first-class result type.

5. The two-plane no-secrets sandbox is the right discipline

Across all eighteen experiments, forge never carried an API key into the sandbox, never had a credentials leak, never benched a project against a secret-bearing fixture. The hard rule paid off: every reproducibility anchor we published is auditable end-to-end, and every experiment that couldn't be fully benched (Mistral OCR needs a key, Pinokio needs a display server, MentraOS Android needs Gradle, Vibe-Trading live trading needs a broker OAuth flow, OpenMontage rendering needs FFmpeg + provider keys) was honest about why — and the honesty is itself a useful output.

This isn't a new theme — it's a confirmation. The discipline holds at 18 experiments.

Aggregate utility — what forge has actually built

The 18 experiments produced:

18 published writeups on /forge with full reproducibility anchors.
3 forge-original installable artifacts promoted to their own repos (cc-gateway-dashboard, agentic-rl-runner, ard-tools).
2 article-as-spec Python packages shipped to PyPI-ready repos.
1 new skill emitted by an experiment (forge-agentic-rl, EXP-0006 origin).
3 skill upgrades (forge-experimenter, forge-publisher, forge-packager).
3 process / policy notes (Meet forge, repos-vs-gists, the EXP-0012 follow-up).
~50 open-source projects referenced, scouted, or benched as comparables.

That's a substantive open-source-ecosystem output. The bottleneck now is intake — finding the next 18 forge-quality candidates faster than the current Slack-🧪 cadence.

Three new harvesters

The themes above suggest the harvesters. Each is added to the forge plugin as a new skill, alongside the existing forge-harvester-slack:

`forge-harvester-github` — code-search for SKILL.md repos

Watches GitHub's code-search API for new repos containing SKILL.md, AGENTS.md, program.md, or other tracked agent-instruction files. Filters by stars, license, recent commit. Targets the strongest cross-experiment finding — the agent-instruction convention is the signal.

`forge-harvester-rss` — feed-based discovery for article-as-spec

Watches RSS feeds of MarkTechPost, Hugging Face blog, Anthropic blog, Cameron Wolfe substack, dottxt.co, and opensourceprojects.dev. Filters titles for recipe-style patterns ("Using X and Y to do Z", "Introducing X", "Open-source launch of X"). Enqueues qualifying posts as article-as-spec candidates. Targets the highest-leverage template.

`forge-harvester-watchlist` — productive authors keep being productive

Watches a curated list of GitHub authors and orgs (Karpathy, dottxt-ai, HKUDS, Mentra-Community, motiful, safishamsi, calesthio, nolly-studio, github, pinokiocomputer, Mistral-Community, allenai). Enqueues new repos and new major release tags on previously-benched repos. Promotion / demotion is manual — owners are added after a successful bench.

All three are gated identically: stars ≥ thresholds, license OSI-approved, recent commit, dedup against forge's already-benched set. None of them write to the substrate directly; they enqueue candidates that the existing researcher → builder → experimenter → packager → reporter → publisher walk handles unchanged.

How this changes the orchestrator walk

The nightly walk now starts with four harvesters instead of one:

forge-orchestrator
├── 1. forge-harvester-slack       (incidental discovery — 🧪 reactions in #development)
├── 2. forge-harvester-github      (systematic — code search across SKILL.md ecosystem)
├── 3. forge-harvester-rss         (systematic — article-as-spec feed watch)
├── 4. forge-harvester-watchlist   (systematic — productive-author watch)
└── walk(queue)                    (unchanged — researcher → builder → experimenter → ...)

The walk is identical from researcher onward — the queue doesn't care how candidates got there.

What's next

Three concrete next steps:

Pilot the wild harvesters for a week. Run each one nightly, watch what they surface, evaluate whether the gates are calibrated correctly. Tighten gates where the queue grows too fast; loosen where it doesn't grow at all.
Add a sixth harvester for PyPI / npm new-release watching. EXP-0011 (outlines), EXP-0013 (ard-tools), EXP-0018 (graphifyy) all surface clean signal at the package-registry layer. A simple PyPI / npm new-release watcher with keyword filters (agent, skill, mcp, agentic) would capture this.
Run forge on itself. This was the self-tuning roadmap item from the Meet forge flagship. With the wild harvesters in place, forge has enough intake throughput to dogfood the loop.