From MiroFish to Atlas: Rewriting a Chinese Simulation Engine

16 April 2026#atlas#devlog#building-in-public#reflection

Atlas did not start from scratch. It started from a fork.

MiroFish is an open-source swarm intelligence engine built by a team at Shanda Group. It runs multi-agent social simulations on top of the OASIS framework from CAMEL-AI. When we found it, it did the core thing we needed: upload documents, generate agents, run a Twitter/Reddit simulation, produce a report. The architecture was sound. The language was not -- every LLM prompt, every variable name, every docstring, every frontend label was in Chinese.

We forked it on March 19. The initial commit is 4c55d2a. Three weeks and 27 commits later, we had Atlas. One developer, several sessions co-authored with Claude.

What the first week looked like

The first real commit after the initial fork was ffe6c90: translate all LLM prompts to English. This turned out to be more than a find-and-replace. MiroFish's prompts were baked into service files scattered across ontology_generator, simulation_config_generator, zep_tools, and report_agent. Each one had to be read, understood, and rewritten in English -- not just transliterated, but rewritten so the LLM on the other end would actually follow the intent. We added explicit "Always respond in English" instructions to every system prompt while we were in there.

The full code translation -- variable names, comments, frontend labels -- landed in b43bdc0 three weeks in, alongside a larger feature drop. That commit touched 20+ files. The git diff is not pretty. That is what catching up looks like.

package.json still says "name": "mirofish". We left it. It is an accurate record of where this came from.

The dependency we had to cut

MiroFish used Zep Cloud for its knowledge graph -- an external managed service that required a separate account, API key, and network call for every graph operation. For our use case (local-first, self-hostable, no external account requirements), that was a non-starter.

We replaced it with LocalGraphStore: a SQLite-backed graph database, one .db file per graph, 774 lines of Python, no external dependencies. Nodes, edges, optional embeddings stored as binary-packed floats. The file lives at backend/app/services/local_graph_store.py. It is not a general-purpose graph database -- it does exactly what Atlas needs and nothing more.

Cutting Zep also meant rewriting the graph memory updater and the five report agent tools that previously called into the Zep API. The new tools query LocalGraphStore directly.

The agents were shallow

MiroFish's agent personas were thin: a name, a description, maybe a stance. Enough to distinguish agents in a simulation, not enough to make them behave differently in ways that mattered.

We extended the schema to eight fields: bio, MBTI personality type, prior stance on the topic, personal stakes, reaction triggers, communication style, follower count, and karma. Follower counts and karma scores are drawn from power-law distributions rather than uniform random ranges, which gives the agent population something closer to realistic social network structure -- a few high-follower accounts, a long tail of low-follower ones.

The persona generator now produces structured JSON that maps directly into OASIS agent initialization. The prompt for persona generation is one of the longer ones in the codebase; it has to hold all eight fields and produce consistent output across a population of up to 500 agents.

Infrastructure we added

A few things MiroFish did not have at all:

Per-category model routing. Six workflow categories -- graph building, profile generation, config generation, simulation, report, interaction -- each resolve to a different LLM model via Config.get_model_for_category(). The simulation step runs high call volume at low cost (gpt-4.1-nano). The report step runs lower volume where quality matters (gpt-4.1). Configurable at runtime through a settings panel without restarting anything.

Real-time LLM cost monitoring. Every LLM call is tracked: model, tokens, cost, duration, caller. Persisted to backend/data/llm_calls.jsonl. Streamed to the frontend via Server-Sent Events. A status bar at the bottom of every screen shows running cost. We built this because we kept losing track of what simulations actually cost during development, and the answer was sometimes surprising.

3D force-directed graph. The original used a 2D SVG rendered with D3. We replaced it with a 3D force-directed visualization. The graph evolves in real time as entities are extracted and as agents act during simulation.

Docker and Railway config. MiroFish was source-only. We added a Dockerfile, docker-compose, and a railway.toml for one-command cloud deployment.

Where we are honest about what is still MiroFish

The simulation engine itself is OASIS -- we did not write that, and neither did MiroFish. Both projects sit on top of camel-oasis 0.2.5 and camel-ai 0.2.78. The dual-platform Twitter/Reddit simulation loop, the agent action mechanics, the post/repost/comment primitives -- that is OASIS's work.

There are still Chinese docstrings in the backend. The state-of-development table in the README acknowledges this honestly: "Chinese docstrings remain from fork." We will get to them. They do not affect runtime behavior, which is why they have stayed lower on the priority list than things that do.

No automated tests. Zero. pytest is in the dependencies. There are no test files. This is the most significant technical debt we are carrying. The codebase grew fast and the test suite did not grow with it.

The shape of what we built

Twenty-seven commits. Three weeks. A working multi-agent simulation platform that takes a document and a question, runs a 500-agent dual-platform social simulation, and produces a structured analytical report with an auditable reasoning trace.

MiroFish made that possible. We would not have shipped a working prototype in three weeks starting from zero. The fork gave us a skeleton with the right bones -- the OASIS integration, the five-step workflow concept, the basic service boundaries. What we did was fill it in: English throughout, richer agents, local-first infrastructure, cost visibility, model flexibility, deployment options.

The package.json name is a fair summary of the situation. We are building on what was there.

𝕏 Post