EXP-0013 — Agentic Resource Discovery: turning Hugging Face's launch into a working package

23 June 2026#forge#huggingface#ard#agentic-discovery#article-as-spec#python#MIT#MCP-adjacent

Update (2026-06-24): ard-tools now has its own canonical home at github.com/worksona/ard-tools — clone, fork, file issues, install via pip install git+https://github.com/worksona/ard-tools. The companion gist remains the frozen reproducibility anchor for this writeup. Forge's forge-packager skill now codifies the promote-to-repo rule for forge-original installable artifacts; see the follow-up post on repos vs gists.

For the layman

When you give an AI agent a task, the agent often needs tools to complete it — a calculator if the task involves math, a search tool if it needs to find facts, an image generator if it needs to produce a picture. Today, those tools have to be pre-installed by whoever set up the agent. If you want the agent to do something nobody anticipated, you're stuck.

Hugging Face just launched a system called Agentic Resource Discovery (ARD) that changes this. The idea: every organization that publishes tools (Hugging Face, OpenAI, you, a research lab, anybody) hosts a small catalog file describing what they have. An agent that needs a new tool searches across all the published catalogs at runtime, finds the best match, and uses it — no install, no setup, no asking a human. It's like the difference between only being able to install apps that came with your phone vs. being able to search the App Store.

ARD is a specification, not a product — anyone can implement it. Hugging Face shipped a reference implementation called the "Discover Tool" that searches across the HF Hub. The launch was announced via a blog post in their huggingface/blog repo.

Forge is our experiment harness. Because the source here is a blog post (not a clonable codebase), forge applied the new article-as-spec template — instead of just summarizing the spec, we tried to build the smallest runnable system the spec describes. The result is a Python package called ard-tools with 12 passing tests. It includes:

A schema definition for what a valid ai-catalog.json looks like.
A validator (CLI: ard-validate yourcatalog.json) that checks any catalog against the spec.
A minimal search engine that takes a query and ranks capabilities across multiple catalogs.

Anyone who wants to publish an ARD catalog or run an ARD search can pip-install this package and have a working starting point in seconds. The Hugging Face reference implementation is more sophisticated (it uses embeddings instead of lexical matching), but the contract — what an ARD catalog has to look like, what a search call has to return — is what we operationalized.

Status: experimented, result success. Source was the Hugging Face blog post launching ARD — not a clonable repo. Forge applied the article-as-spec template (added in EXP-0006) and shipped a runnable Python package, ard-tools, that operationalizes both halves of the spec (the static ai-catalog.json manifest and the dynamic POST /search ranking API). 12/12 tests pass. The CLI validates a sample catalog. The search demo returns the expected ranking.

This is a forge writeup of the Agentic Resource Discovery launch post by Hugging Face.

Canonical home: github.com/worksona/ard-tools. Frozen anchor: gist.github.com/worksona/c03eb6c9….

Install + use

bash

pip install git+https://github.com/worksona/ard-tools
ard-validate path/to/ai-catalog.json

TL;DR

Source: huggingface/blog/agentic-resource-discovery-launch.md — Hugging Face's announcement of ARD as an open specification.
Spec in one paragraph. ARD defines two components: (1) a static ai-catalog.json manifest that any publisher hosts at a stable URL, listing their skills, mcp-servers, agents, ml-apps, and tools with id, name, description, tags; (2) a dynamic POST /search endpoint that takes a natural-language query and returns ranked capabilities aggregated across catalogs.
What forge shipped. ard-tools — a minimal, MIT-licensed, zero-runtime-dependency Python package that ships the schema, the validator, and a lexical-overlap scorer that stands in for the embedding ranker the HF reference uses.
Tests: 12/12 pass on python:3.12 in under 0.1 s. Coverage: schema rejection paths (missing fields, bad enum, extra fields), zero-overlap and tag-overlap scoring, multi-catalog aggregation, top-k ordering.
End-to-end demo. ard-validate examples/example-catalog.json → OK: ai-catalog.json @ ARD 0.1, publisher=Demo Lab, 2 capabilities. python examples/demo.py → 0.40 Demo Lab Fact Check Agent for a query like "verify a claim with evidence."

What ARD actually is

The launch post describes a federated, search-first model for agent capabilities:

The capability manifest. Each publisher hosts an ai-catalog.json at a stable URL. The file lists every skill, MCP server, agent, ML app, or tool the publisher offers, with enough metadata for a search engine to rank it. Forge's schema:
- Top level: ard_version, publisher ({name, url, contact}), capabilities[], updated.
- Per capability: id, kind (one of skill, mcp-server, agent, ml-app, tool), name, description, optional tags, url, license, version, metadata.
- Strict: no unexpected top-level fields, no unexpected fields per capability.
The search registry. A POST /search endpoint accepts a natural-language query and returns ranked capabilities. The HF reference implementation lives at https://huggingface-hf-discover.hf.space/search and exposes the same shape via the hf discover search CLI command.
The contract. Anything that consumes ARD doesn't care whether the search ranker uses embeddings, BM25, or lexical overlap; whether the catalogs are hosted on HF, GitHub Pages, or someone's homelab; whether the publisher is Hugging Face, OpenAI, or a research lab. The shape of the request and the shape of the response are the spec.

What forge built

ard-tools is the smallest runnable system that lets you publish an ARD catalog and search across catalogs. Four modules:

src/ard_tools/
├── schema.py       JSON schemas for ai-catalog.json and individual capabilities
├── validate.py     Pure-Python validator with path-precise error messages
├── search.py       Lexical-overlap scorer + multi-catalog aggregator
└── __init__.py     Public API

The deliberate non-features:

No embedding model. The lexical scorer is a stand-in. Real implementations swap it for sentence-transformers or an API call. The function signature stays the same.
No HTTP server in this v0.1. A FastAPI shim around search_across_catalogs is ~30 lines; left as the obvious follow-up.
No external schema library. Pure-Python validator, ~50 lines. Keeps the package zero-runtime-dependency.

What the demo shows

bash

$ ard-validate examples/sample-catalog.json
OK: ai-catalog.json @ ARD 0.1, publisher=Demo Lab, 2 capabilities

$ python examples/demo.py
0.40  Demo Lab  Fact Check Agent

The query was "verify a claim with evidence". The catalog had two capabilities: a Calculator (tagged arithmetic, math) and a Fact Check Agent (tagged verify, evidence, described as "Verify claims against trusted sources"). The scorer correctly ranked the Fact Check Agent above the Calculator with a 40% overlap score — a real lexical-search behavior that mirrors what the embedding-based HF reference implementation does on the same query shape.

Tests

$ pytest -q tests/
............                                                             [100%]
12 passed in 0.08s

Breakdown:

group	tests	what it proves
`TestValidation`	5	A valid manifest passes; missing required fields raise with the field name; bad `kind` enums are rejected; unexpected top-level fields are rejected; capabilities must be an array.
`TestScoring`	4	Empty query → 0. Exact word match → > 0. Unrelated query → 0. Tag-only matches score above 0.
`TestSearchAcrossCatalogs`	3	Top-k results are sorted descending. No matches → empty. Multi-catalog aggregation merges publishers correctly.

The "unexpected field rejected" test is the one that matters most — it's what enforces forward-compatibility discipline. If the spec adds new fields, publishers update; in the meantime, rogue fields don't silently slip into manifests.

How forge bench-tested it

bash

pip install -e .[dev]   # zero runtime deps, pytest only
pytest -q tests/        # 12 passed in 0.08s
ard-validate sample-catalog.json
python examples/demo.py

Everything happens inside a python:3.12 Docker container, no host environment, no secrets. The package would install identically on any developer's machine.

Why this matters

The "agent + tools" stack has been fragmenting for the last 18 months. MCP standardized one piece (the protocol for an agent to call a tool). What MCP didn't standardize: how an agent finds tools it doesn't already know about. ARD is the answer to that question, and the answer is the right shape — federated, queryable, no central authority needed.

Forge's contribution here isn't research — it's operationalization. The Hugging Face blog post is the spec; this package is the smallest runnable system that anyone can pip-install today and immediately have a working ARD catalog publisher and search consumer. Three months from now, when someone reads the HF blog and wants to publish their own catalog, they should land on this package and be productive in 30 seconds.

Comparables

Project	Posture
Hugging Face Discover Tool	The reference implementation. Hosted. Uses embeddings.
MCP (Model Context Protocol)	The protocol for calling tools, not for finding them. ARD is the discovery layer; MCP is the invocation layer.
Anthropic's tools catalog	Vendor-specific, not federated.
OpenAPI / Swagger	Adjacent but for HTTP APIs, not agent capabilities.

Reproducibility


canonical repo	https://github.com/worksona/ard-tools
frozen gist	https://gist.github.com/worksona/c03eb6c9166dfc5fb763404e52eceedf
source	https://github.com/huggingface/blog/blob/main/agentic-resource-discovery-launch.md
source type	`article-as-spec`
base image	`python:3.12`
install	`pip install git+https://github.com/worksona/ard-tools`
tests	`pytest -q tests/` — 12 passed in 0.08 s
validator	`ard-validate <path>` — pure-Python, no external schema lib
scorer	lexical-overlap, zero ML deps

For the layman

Install + use

TL;DR

What ARD actually is

What forge built

What the demo shows

Tests

How forge bench-tested it

Why this matters

Comparables

Reproducibility

See also