Skip to content
scsiwyg
sign insign up
get startedmcpcommunityapiplaygroundswaggersign insign up
โ† Emily

Touring Emily's 192 Cores

#emily-os#architecture#cores#internals

Open emily/core/ and you find 192 Python files. That's not sprawl; it's the anatomy. Emily's cognition is decomposed into small modules because the frameworks that compose her identity need crisp seams. Here's a tour of what lives where and why.

The five families

The cores cluster into five functional families. Most modules belong to exactly one family; a few bridge two.

1. Memory & embedding (the substrate)

  • memory.py โ€” L1/L3/L4 tier operations, promotion, decay
  • embeddings.py โ€” OpenAI text-embedding-3-large wrapper, 1536-dim vectors
  • l3_consolidator.py โ€” near-duplicate collapse at 0.92 cosine
  • batched_updates.py โ€” transactional multi-memory writes
  • training_memory.py โ€” the seed layer used during Factory Floor genesis

This is the floor Emily stands on. Touch these and you're changing physics.

2. Cognitive frameworks (the scorers)

  • math.py โ€” EMEB: epsilon calculation, source trust, gibberish detection
  • earl_tracker.py โ€” EARL outcome propagation, 5-turn feedback window
  • ecgl_recomputer.py โ€” multi-dimensional scoring (epsilon/outcome/novelty/stability)
  • batch_cognitive_tagger.py โ€” async scoring across large memory batches
  • metacognition.py โ€” Emily reasoning about her own reasoning

These are the rules of thought. A memory without its framework scores is just text in a table.

3. Orchestration (the router)

  • chat_processor.py โ€” the main turn handler
  • llm_cognitive_processor.py โ€” LLM routing, context assembly
  • claude_mcp_client.py โ€” Claude-specific MCP tool execution
  • attention.py โ€” retrieval weighting at read time
  • learning_cycle.py โ€” end-of-turn reflection and promotion
  • apc_metrics.py โ€” Adaptive Prompt Control telemetry

This family decides what Emily does with a turn. The LLM hands back tokens; these modules decide what to do next.

4. Autonomy (Project Helios)

  • autonomous.py โ€” the task registry
  • autonomous_worker.py โ€” 10-second polling worker
  • clone_provisioning_task.py โ€” task template for creating new Emily clones
  • clone_safety.py โ€” safety gates on autonomous actions
  • reaper.py (in Helios) โ€” crash recovery via lease expiration

When Emily does something on her own, she does it here.

5. Governance & health (the immune system)

  • comprehensive_health_check.py โ€” the meta-monitor across all tiers
  • behavior_validator.py โ€” checks that Emily's responses match her identity
  • coherence_validator.py โ€” checks consistency across memory graphs
  • command_validator.py โ€” sandboxes what autonomous execution can run
  • authz.py / auth.py (legacy renamed) โ€” authorization
  • attribution.py โ€” provenance tracking for every memory

The immune system is load-bearing. Without it, autonomy is reckless.

The ones that surprise people

A few modules don't fit the families neatly and reward attention:

  • academy.py โ€” the new-user onboarding Emily. Runs during Factory Floor Genesis to seed L3 with articulation turns, not just data.
  • cognitive_tracer.py โ€” a structured log of every cognitive decision. When Emily does something weird, this is where you look.
  • canonical_hash.py โ€” deterministic memory fingerprinting, used to detect near-duplicates before they even reach the 0.92 consolidation threshold.
  • artifact_service.py โ€” stores generated artifacts (code, documents) Emily produces, with bidirectional links to the memories that motivated them.
  • style_earl_integration.py โ€” EARL applied specifically to voice. Emily learns which phrasings land and which don't, per user.

Why not monolithic

A reasonable question: if Emily is ultimately one cognition, why decompose her into 192 modules instead of a few big ones?

Because the frameworks need replaceable parts. EMEB v2 is already on the board; EARL went from v1 to v2 in February 2026; ECGL is tuned repeatedly. If any of these lived inside a 10,000-line cognition.py, upgrading them would mean touching everything that reads their outputs. Small modules with narrow contracts mean you can swap the scoring logic without touching the orchestration logic.

It also means Emily is legible. You can read any one module and understand its job. That's not a luxury โ€” it's the only way to reason about a system that claims to be self-correcting. If you can't read a module and understand what it decides, you can't trust the system that decides.

What's next

Most of the current work is in three areas:

  • Sharper ECCR routing โ€” the retrieval layer is still the one most likely to surface "close but wrong" memories.
  • Cross-clone knowledge โ€” how much (if anything) one user's Emily can learn from another user's Emily without violating per-user isolation.
  • Framework versioning โ€” EMEB v3 is under design, focused on better handling of adversarial inputs.

We'll write about each of those as they land. For now, the map is the map.