How We Built the ReACT Report Agent in Atlas
#atlas#devlog#feature#building-in-public
David OlssonWhen a simulation finishes, you have thousands of agent actions, a populated knowledge graph, and a log of inter-agent interactions. Turning that into a readable, evidence-based report is not a summarisation problem — it is an investigation problem. The agent needs to ask questions, retrieve evidence, and decide when it knows enough.
We built the Report Agent around the ReACT (Reasoning + Acting) pattern for exactly this reason.
The ReACT loop
Each section of the report is generated through an iterative cycle, not a single prompt. The agent alternates between reasoning about what it still needs and acting by calling a retrieval tool. It cannot move to a final answer until it has called at least three tools.
The loop runs for up to five iterations per section. If the agent tries to output a Final Answer before hitting the three-tool minimum, the system rejects it and asks it to keep retrieving. If the agent somehow emits both a tool call and a Final Answer in the same reply, the system catches the conflict and asks it to choose one — falling back to truncating to the first tool call after two retries.
Five specialized tools
The agent has five tools, each covering a different retrieval strategy:
| Tool | Strategy |
|---|---|
insight_forge | Deep analysis — auto-decomposes the query into sub-questions, runs semantic search, entity analysis, and relationship chain tracing |
panorama_search | Broad graph traversal — retrieves full event timelines, distinguishing current vs. expired facts |
quick_search | Fast keyword lookup — lightweight fact verification |
interview_agents | Live interviews — calls the OASIS simulation API to get in-character responses from agents on both Twitter and Reddit |
simulation_analytics | Quantitative metrics — reads raw actions.jsonl logs to produce activity curves, turning-point detection, and platform divergence scores |
The agent is nudged toward tool diversity. After each observation the system injects a hint listing which tools have not been used yet, encouraging multi-angle coverage rather than repeated calls to a single tool.
The tool dispatch
The core dispatch method is straightforward. Each branch calls into GraphToolsService or reads log files directly:
def _execute_tool(self, tool_name: str, parameters: Dict[str, Any], report_context: str = "") -> str:
if tool_name == "insight_forge":
result = self.zep_tools.insight_forge(
graph_id=self.graph_id,
query=parameters.get("query", ""),
simulation_requirement=self.simulation_requirement,
report_context=report_context or parameters.get("report_context", "")
)
return result.to_text()
elif tool_name == "interview_agents":
result = self.zep_tools.interview_agents(
simulation_id=self.simulation_id,
interview_requirement=parameters.get("interview_topic", ""),
simulation_requirement=self.simulation_requirement,
max_agents=min(int(parameters.get("max_agents", 5)), 10)
)
return result.to_text()
elif tool_name == "simulation_analytics":
return self._run_simulation_analytics() # reads actions.jsonl directly
# ... quick_search, panorama_search follow the same pattern
simulation_analytics is the only tool that does not go through the graph service. It reads the actions.jsonl files written by the simulation runner, counts actions per round, detects rounds where volume spiked to at least twice the rolling mean (turning points), and computes a platform divergence score — abs(twitter_count - reddit_count) / total — that tells the analyst whether the two platforms moved together or diverged.
How observations are injected
After each tool call the system appends the result back into the conversation as a user message. This keeps the full Thought, Tool, Observation chain in the LLM's context window, so later reasoning can reference earlier findings:
messages.append({"role": "assistant", "content": response})
messages.append({
"role": "user",
"content": REACT_OBSERVATION_TEMPLATE.format(
tool_name=call["name"],
result=result,
tool_calls_count=tool_calls_count,
max_tool_calls=self.MAX_TOOL_CALLS_PER_SECTION,
used_tools_str=", ".join(used_tools),
unused_hint=unused_hint, # lists untried tools
),
})
The observation template also restates the budget — how many calls have been used and how many remain — so the agent can self-regulate without the system needing to intercept every response.
Two-phase generation
Before any section work starts, the agent runs a planning phase. It calls get_simulation_context on the graph (statistics, entity types, a sample of facts), then asks the LLM to produce a JSON outline: a title, a one-sentence summary, and 2-5 sections with descriptions. Section generation then runs sequentially, passing completed section text into each subsequent prompt so the agent avoids repeating itself.
The report model category is used throughout — typically the most capable model available, since report quality is user-facing and the call volume is low (one planning call plus a few calls per section).
What ships to the frontend
Sections are written to disk as they complete (section_00.md, section_01.md, ...). The frontend polls /api/report/<id>/sections for the list of completed sections and fetches each one individually, so the report appears incrementally rather than waiting for the entire run to finish.
Every Thought, tool call, and Observation is written to agent_log.jsonl as structured JSON entries. The UI surfaces this log as an auditable trace — you can follow the agent's reasoning step by step and see exactly what evidence each sentence in the report is grounded in.
The interview_unlocked flag on the status endpoint is set only once the report reaches COMPLETED, which gates the post-report interview feature that lets users ask follow-up questions to simulation agents directly.
What we learned
Enforcing the minimum tool-call floor before allowing a Final Answer meaningfully improved report depth. Without it the agent would sometimes short-circuit after one search and produce thin sections. The unused-tools hint also helped — agents that received it were more likely to call interview_agents, which produces the most distinctive content (direct agent quotes) and is the hardest for the model to generate on its own.
The conflict handling — rejecting responses that contain both a tool call and a Final Answer — was more important than we expected. Certain prompts reliably triggered this behaviour, and the two-retry-then-truncate fallback prevents the section from stalling indefinitely.