6-Tier Architecture
Memory is organized into six tiers, each serving a distinct purpose. The agent queries only the tiers relevant to the current task, keeping token usage low.
| Tier | Purpose | Contains | Priority |
|---|---|---|---|
| Active | Current session state | Active checklist, in-progress tasks, live status | Highest |
| Short-term | Recent session extracts | Facts, preferences, decisions, todos, constraints from recent sessions | High |
| Mid-term | Goals, milestones, learning indexes | Current goals, risks, conversation-to-tool mappings | Medium |
| Long-term | Accumulated knowledge | Mission, principles, constraints, user contract, learned patterns | High |
| Specialist | Domain expertise | Per-category JSONL: findings, playbooks, experiments, notes | High |
| Contract | User-agent agreements | Behavioral terms, explicit user preferences, operational boundaries | Highest |
BM25 Search
ClawMagic uses BM25 (Best Matching 25) for memory retrieval. This is the same algorithm used by search engines like Google and Elasticsearch. It runs locally with zero API calls.
| Feature | Basic Token Overlap | BM25 |
|---|---|---|
| Term frequency | No (binary match) | Yes (repeated terms weighted) |
| Inverse document frequency | No | Yes (rare terms matter more) |
| Document length normalization | No | Yes |
| Relevance quality | Basic | Industry standard |
| API cost | Zero | Zero |
BM25 is implemented as a pure TypeScript module with zero dependencies. Recall@10 and MRR both reach 1.0 on internal benchmarks.
Distill Pipeline
After each session, the distill pipeline extracts structured knowledge from the conversation and stores it in the appropriate memory tier. Five distill types enable filtered retrieval later.
- Fact — objective information learned during the session.
- Preference — user preferences and style choices.
- Decision — choices made and the reasoning behind them.
- Todo — action items identified for future sessions.
- Constraint — boundaries and rules that apply going forward.
Distilled points can be promoted from short-term to long-term memory based on repeated relevance. Stale entries decay naturally.
Thought Chains
Thought chains are procedural memory. When the agent solves a task, it can save the approach as a chain. Next time a similar task appears, the chain fires first, skipping the planning phase entirely.
- Chains include metadata: name, tags, description for workflow matching.
- Auto-reuse means repeat tasks execute faster with fewer tokens.
- Chain-first routing skips planning for recognized patterns.
- Users can inspect, edit, and delete chains from the control panel.
Storage and Configuration
Memory storage follows the same zero-dependency philosophy as the rest of ClawMagic. Defaults work everywhere. Upgrades are opt-in.
- Default (JSONL + BM25): flat files on disk, zero external services, Git-backup friendly. Works on any machine with Node.js.
- SQLite toggle: enables FTS5 for BM25 at larger scale. Faster disk I/O for big document counts. Independent of embeddings.
- Embeddings toggle: adds cosine similarity search via embedding API calls. Works with or without SQLite. Opt-in only.
All toggles are configurable from the control panel. API contracts remain the same regardless of which backends are active. The LLM never generates SQL, even when SQLite is enabled.
Token Efficiency
The memory system is designed around a core principle: retrieve what is needed, not everything available.
- Default mode uses 1 API call per chat turn (LLM only). No embedding calls.
- Context budgets are bounded per turn. Overflow triggers a 4-stage recovery: compact, no_tool_result, manifest_only, graceful stop.
- Pull-based retrieval means the agent asks for context rather than receiving it all upfront.
- Tool schemas load as manifests first, expanding to full schema only when selected.