Inside MiroFish: How a 644-Symbol Codebase Simulates the Future with Swarm Intelligence

Code Deep Dives March 26, 2026 📍 København, Danmark

Analysis

Inside MiroFish: How a 644-Symbol Codebase Simulates the Future with Swarm Intelligence

A deep code-level investigation into MiroFish — the open-source multi-agent prediction engine that topped GitHub's Global Trending. Using Code Indexer, we dissect its 85-file architecture, trace call chains from graph construction to report generation, and expose the LLM orchestration that powers its digital sandboxes.

Key Takeaways

Key takeaways: MiroFish is a multi-agent swarm intelligence engine built on OASIS (CAMEL-AI) and Zep Cloud GraphRAG. It uses OpenAI-compatible APIs (model-agnostic) to create parallel digital worlds where thousands of AI agents interact on simulated Twitter and Reddit, generating prediction reports via a ReACT-based Report Agent. Code Indexer analysis reveals 644 symbols across 85 files, with complexity hotspots reaching cyclomatic complexity of 31. The project scored 33/100 on a deep code audit — impressive functionality, but significant technical debt in test coverage (0%) and maintainability. Academic research validates that OASIS simulations replicate real-world social phenomena with ~30% normalized RMSE.

When MiroFish hit #1 on GitHub's Global Trending in March 2026, the pitch was irresistible: upload a news article or policy draft, describe what you want to predict, and the system returns a detailed forecast report — powered by thousands of AI agents living in a simulated parallel world. Created by Guo Hangjiang, a Chinese undergraduate developer, MiroFish bills itself as a "Simple and Universal Swarm Intelligence Engine, Predicting Anything."

The ambition is enormous. But what actually happens inside the code when you press 'Run'? Is this genuine multi-agent simulation with emergent intelligence, or just a sophisticated wrapper around LLM API calls? To find out, I indexed the entire MiroFish codebase using Code Indexer [3] — a semantic code search engine — and ran a full deep audit: symbol extraction, call chain analysis, complexity profiling, dead code detection, and security scanning. Here's what I found.

The Investigation: 644 Symbols, 5,886 Cross-References

Code Indexer processed MiroFish's 85 source files (34 Python, 15 Vue, 8 JavaScript) in under 2 seconds, producing 1,692 indexed chunks, 644 symbols, and 5,886 cross-references. The project report immediately revealed the architectural center of gravity: the backend services layer — not the Vue frontend — is where MiroFish lives and breathes.

Module	File	Symbols	Role
Report Agent	report_agent.py	72	ReACT-based prediction report generator
Zep Tools	zep_tools.py	49	Graph retrieval: InsightForge, PanoramaSearch, InterviewAgents
Simulation Runner	simulation_runner.py	37	OASIS process lifecycle management
Simulation API	simulation.py	36	Flask API endpoints for simulation control
Parallel Simulation	run_parallel_simulation.py	35	Dual-platform (Twitter + Reddit) async runner
Profile Generator	oasis_profile_generator.py	32	LLM-driven persona creation from graph entities
Graph Memory Updater	zep_graph_memory_updater.py	32	Real-time agent action → graph memory sync
Config Generator	simulation_config_generator.py	25	LLM-driven simulation parameter tuning

The symbol distribution tells a clear story: report_agent.py alone contains 72 symbols — more than double the simulation runner. This is a system that invests heavily in post-simulation intelligence synthesis, not just agent execution.

Architecture: The Five-Stage Pipeline

MiroFish operates as a five-stage pipeline that transforms raw seed information into predictive reports. Tracing the call chains through Code Indexer's find_callers and find_references tools reveals a surprisingly deep architecture.

MiroFish Pipeline Architecture

graph TD
  A["1. Seed Material Upload"] --> B["2. Knowledge Graph Construction"]
  B --> C["3. Environment & Agent Creation"]
  C --> D["4. Dual-Platform Simulation"]
  D --> E["5. Report Generation & Deep Interaction"]
  B -->|Zep Cloud API| F["Standalone Graph"]
  F -->|GraphRAG| C
  C -->|LLM Persona Gen| G["OASIS Agent Profiles"]
  G --> D
  D -->|Twitter + Reddit| H["Agent Actions DB"]
  H -->|Graph Memory Updater| F
  E -->|InsightForge + PanoramaSearch| F

Stage 1–2: From Text to Knowledge Graph

When you upload a document — a news article, policy draft, or financial report — MiroFish first runs it through an OntologyGenerator that uses LLM calls to extract entity types and relationships from the text. The result is a structured ontology definition (entity types like "Student," "MediaOutlet," "GovernmentAgency"; edge types like "reports_on," "opposes"). This ontology is then pushed to Zep Cloud via the GraphBuilderService.

Code Indexer's find_references('OpenAI') reveals the LLM integration pattern: three separate files instantiate OpenAI clients — llm_client.py, oasis_profile_generator.py, and simulation_config_generator.py. All use the same OpenAI-compatible SDK, configured via LLM_BASE_URL and LLM_API_KEY environment variables. This means MiroFish is model-agnostic: it works with any OpenAI API-compatible provider — OpenAI itself, Alibaba's Qwen-plus (the recommended model), Anthropic via proxy, or even a local Ollama setup.

Stage 3: Persona Engineering — Where LLMs Become People

This is where MiroFish gets genuinely interesting. The OasisProfileGenerator (32 symbols, cyclomatic complexity of 28 for entity search alone) doesn't just assign random attributes to agents. It queries the Zep knowledge graph for each entity, retrieves their relationships and context, then feeds this to an LLM with a carefully crafted system prompt to generate a rich persona profile.

For individual agents (students, professors, journalists), the generator creates detailed backstories, personality dimensions (agreeableness, openness, assertiveness), opinion biases, and social media behavior patterns. For institutional agents (universities, media outlets), it generates organizational communication styles and official stances. Each persona includes MBTI type, a biographical narrative, and a set of 'core beliefs' that will guide their behavior during simulation.

Code Indexer's complexity analysis flags _search_zep_for_entity (CC=28) and _build_entity_context (CC=20) as the two most complex functions in this module. The high complexity comes from handling the numerous edge cases in Zep's graph API — paginated node retrieval, edge temporal filtering (valid_at, expired_at), and fallback to local keyword matching when Zep's semantic search API fails.

Stage 4: The OASIS Simulation Engine

MiroFish's simulation layer is built on OASIS (Open Agent Social Interaction Simulations) [1] — an open-source framework from the CAMEL-AI team [4] designed for scalable multi-agent social simulation. The key insight from the OASIS paper is that meaningful social phenomena like group polarization and herd behavior only emerge at scale — typically requiring 10,000+ agents.

The run_parallel_simulation.py script (35 symbols, the largest simulation file) orchestrates dual-platform simulation: agents simultaneously interact on a simulated Twitter and a simulated Reddit. Code Indexer's find_by_signature(is_async=true) reveals 19 async functions driving this engine — from run_twitter_simulation (CC=29) and run_reddit_simulation (CC=31) down to individual agent interview handlers.

Each simulation round proceeds as follows: an activity scheduler (modeled on Chinese timezone patterns — peak hours 19:00-22:00, near-zero activity 0:00-5:00) determines which agents are active. Active agents receive their persona, recent memory, and the current social feed. They choose from OASIS's 23 possible actions — post, comment, like, repost, follow, mute — and the results are written to a SQLite database and simultaneously pushed back to the Zep graph via the ZepGraphMemoryUpdater (32 symbols).

This feedback loop is critical: agent actions update the knowledge graph in real-time, which means later agents in the same round can see and react to earlier agents' posts. Opinions evolve. Echo chambers form. Viral content spreads. The simulation isn't just running LLM calls in parallel — it's constructing an evolving social reality.

Stage 5: The Report Agent — ReACT Meets Social Science

With 72 symbols, report_agent.py is the single most complex module in MiroFish. Code Indexer's complexity analysis confirms this: _generate_section_react (CC=31) and _post_process_report (CC=25) are among the highest-complexity functions in the entire codebase.

The Report Agent implements a full ReACT (Reasoning + Acting) loop using LangChain-style tool calling. When generating a prediction report, it has access to four specialized retrieval tools:

InsightForge (CC=19): The most powerful tool. Automatically decomposes the user's question into sub-queries, runs multi-dimensional semantic search across the Zep graph, and synthesizes entity insights with relationship chains. Code Indexer's find_callers('search_graph') traces this tool directly to the core graph search API.
PanoramaSearch: Fetches the complete graph panorama including expired/historical facts — critical for understanding opinion evolution over time.
QuickSearch: Lightweight semantic search for rapid fact retrieval.
InterviewAgents: The most remarkable tool — it selects relevant agents from the simulated world based on their persona type and role, then conducts LLM-powered 'interviews' where each agent responds in character based on their lived simulation experience.

The Report Agent plans which tools to use, executes them in sequence, reflects on the results, and iterates until it has enough evidence to write a comprehensive section. Each section is generated independently, then a final post-processing step (_post_process_report, CC=25) reconciles the sections into a coherent report with proper citations from the simulation data.

The Deep Audit: Score 33/100

Code Indexer's deep audit paints a nuanced picture. The project scores 33/100 overall — a grade of F. But the scoring breakdown reveals that MiroFish's weaknesses are concentrated in engineering discipline, not in the core innovation.

Source: Code Indexer Deep Audit

The security score is perfect: 20/20, no critical findings. This matters — MiroFish handles API keys and runs background processes, so injection vulnerabilities would be catastrophic. The architecture score (4/10) reflects zero dependency cycles but 20 coupling hotspots and 11 low-cohesion files. The biggest pain points:

Test Coverage: 0% — zero test files detected. For a system making 'predictions,' this is a critical gap.
Maintainability: 0/15 — 30 code duplicates and 18 'god files' (files doing too many things). The parallel simulation scripts share massive amounts of duplicated logic between Twitter and Reddit runners.
Documentation: 2/10 — only 48 of 568 functions have docstrings.
Hygiene: 0/15 — 30 magic numbers scattered through the codebase (hardcoded timeouts, array indices, threshold values).

The dead code analysis flagged 100+ potentially unused functions — many of which are Flask route handlers registered via decorators (false positives) but also genuine orphaned code from the frontend API layer. The coupling analysis detected that report_agent.py and zep_tools.py are changed together 60% of the time — a sign of tight coupling that could benefit from a cleaner interface boundary.

LLM Requirements: What Does It Actually Cost?

MiroFish requires two external services: an OpenAI-compatible LLM API and Zep Cloud for graph memory. Let's trace where every LLM call goes.

Code Indexer's find_references('OpenAI') identifies exactly 6 OpenAI client instantiation points across 3 files. The LLM is called during:

Ontology generation — extracting entity/relationship types from seed documents (1-2 calls)
Profile generation — creating agent personas (1 call per agent, so potentially hundreds)
Simulation config — generating time parameters, events, and platform configs (3-5 calls)
Simulation execution — every agent action during every round requires an LLM call via OASIS
Report generation — the ReACT loop makes 5-15 calls per report section, plus InsightForge sub-query decomposition

For a typical simulation with 50 agents running 72 simulated hours (72 rounds at 1 hour per round, with 5-20 agents active per round), you're looking at roughly 500-1,400 LLM calls for the simulation alone, plus another 50-100 for report generation. The recommended model is Alibaba's Qwen-plus via the Bailian API, which is significantly cheaper than OpenAI GPT-4. A community-driven offline fork (nikmcfly/MiroFish-Offline) even replaces Zep with Neo4j and uses Ollama for fully local operation.

Does It Work? The Research Behind Multi-Agent Prediction

The scientific foundation for MiroFish's approach rests on the OASIS paper [1], published on arXiv in November 2024. The CAMEL-AI team validated OASIS across three key experiments:

Information Propagation: OASIS achieved ~30% normalized RMSE when modeling how information spreads on simulated Twitter, matching patterns from real-world news dissemination studies.
Group Polarization: Simulated agents consistently adopted more extreme opinions during interactions, replicating the well-documented polarization effect — especially prominent with uncensored LLM models.
Herd Effect: AI agents proved more susceptible to herd behavior than humans, particularly in following negative trends — a finding with implications for prediction accuracy.

Broader 2024-2025 research supports the viability of multi-agent LLM simulation for prediction. A 2024 study used 2,686 'scholar agents' to forecast research ideas, achieving higher similarity scores with actual 2024 publications than baseline methods. Another framework (PREDICT, EMNLP 2024) used multi-agent debate simulation for hate speech detection, demonstrating that agent consensus can outperform single-model classification.

However, the predictive power has clear boundaries. Multi-agent simulations excel at modeling emergent social dynamics — opinion formation, information cascades, group behavior. They are significantly less reliable for precise quantitative predictions (stock prices, election margins) because LLM agents inevitably carry biases from their training data. MiroFish's strength is in qualitative scenario analysis: 'What might public reaction look like if this policy is announced?' rather than 'Will the market go up 3%?'

Architectural Critique: Brilliance and Technical Debt

Code Indexer's coupling analysis reveals the strongest co-change pattern: run_parallel_simulation.py, run_reddit_simulation.py, and run_twitter_simulation.py change together 83% of the time with near-identical logic. This is classic code duplication — the Twitter and Reddit simulation loops are essentially the same algorithm with different OASIS platform constructors. A Strategy pattern or platform-agnostic simulation base class would eliminate hundreds of duplicated lines.

The Zep integration is both the project's greatest strength and its biggest architectural risk. Zep Cloud provides sophisticated graph-based memory with temporal edges (facts can expire), cross-encoder reranking, and semantic search. But the ZepToolsService (1,736 lines, 49 symbols) implements its own local keyword-match fallback for when Zep's API fails — effectively duplicating search logic. And the dependency on Zep Cloud means the simulation can't run without internet access (unless using the community offline fork).

The SimulationConfigGenerator deserves special mention for its attention to cultural modeling. The code hardcodes Chinese internet behavior patterns — peak activity 19:00-22:00 Beijing time, near-zero activity 0:00-5:00, morning ramp-up 6:00-8:00. The LLM is prompted to adjust these patterns based on the specific scenario (student populations might peak later, media entities might be active 24/7). This level of sociocultural calibration is rare in simulation frameworks and hints at the project's Chinese academic origin.

The Investigation Methodology: Code Indexer in Practice

This analysis was conducted entirely using Code Indexer [3] — a semantic code search engine designed for AI-assisted code understanding. The tool chain used:

index_project — Indexed MiroFish's 85 files in under 2 seconds, generating 1,692 chunks and 644 symbols
project_report — Instant tech stack detection (Python 51%, Vue 22%, JavaScript 12%), dependency analysis, key module identification by symbol density
audit_project (deep mode) — Comprehensive quality audit: security scanning, complexity analysis, dead code detection, coupling hotspots, cohesion scoring, taint analysis. Completed in ~3 minutes
find_complex_functions — Identified 15 functions with cyclomatic complexity ≥ 8, pinpointing the architectural hotspots
find_references — Traced OpenAI SDK usage across the entire codebase, revealing the exact LLM integration points
find_callers (transitive) — Mapped the call graph from API endpoints down to core graph search operations
find_dead_code — Flagged 100+ potentially unused functions for cleanup
find_by_signature — Located all 19 async functions driving the simulation engine

The workflow went from 'never seen this code' to 'complete architectural understanding' in about 4 minutes of indexing and exploration. For a 15,000+ line Python backend with complex inter-service dependencies, that's the difference between a weekend of manual exploration and an afternoon of systematic discovery.

Conclusion: The Future Is a Digital Fish Tank

MiroFish represents a fascinating convergence of three rapidly maturing technologies: large language model agents, graph-based memory systems, and social simulation frameworks. Its five-stage pipeline — from seed material to knowledge graph to agent personas to dual-platform simulation to ReACT-powered report generation — is genuinely novel in how it stitches these components together.

The code quality tells the story of ambitious open-source development: brilliant architectural ideas (the InsightForge multi-dimensional retrieval, the cultural timezone modeling, the agent interview system) wrapped in the typical technical debt of a fast-moving project — zero tests, duplicated simulation logic, and 18 god files. The audit score of 33/100 is harsh but fair; the security score of 20/20 is reassuring.

Can MiroFish predict the future? Not in any precise, quantitative sense. But it can simulate complex social reactions to new information at a scale that would be impossible with traditional surveys or focus groups. Feed it a policy proposal and 50 carefully crafted agent personas, and you get a qualitative sandbox where opinion dynamics play out — polarization emerges, echo chambers form, viral narratives spread. Whether those emergent patterns match real-world outcomes depends entirely on the quality of the initial seed data and the LLM's ability to role-play convincingly.

For researchers, policy analysts, and anyone curious about collective human behavior, MiroFish is worth watching. For engineers, it's a masterclass in multi-agent system architecture — and a cautionary tale about what happens when you ship 15,000 lines of Python without a single test.

📚 Sources & References

#	Source	Link
[1]	OASIS: Open Agent Social Interaction Simulations with One Million Agents Ziyi Yang, Zaibin Zhang et al., 2024	arxiv.org
[2]	MiroFish — GitHub Repository Guo Hangjiang (666ghj), 2025	github.com
[3]	Code Indexer — Semantic Code Search Engine Code Indexer, 2025	codeindexer.dev
[4]	CAMEL-AI: Communicative Agents for AI Society CAMEL-AI, 2024	github.com
[5]	Zep — Long-Term Memory for AI Agents Zep AI, 2025	github.com