EchoVessel Memory · Four Layers, One Picture

The simplest possible view of how L1–L4 connect.
user · 用户输入 "我养了只猫,叫小黑" persona · 回复 "小黑听起来很可爱呢" LLM prompt assemble_turn() streaming reply → channel L1 core_blocks persona · self · user · relationship · mood — "who the persona is" ✍️ authored · by human (onboarding / admin) 📖 read every single turn L2 recall_messages every user + persona message, verbatim — "the ground truth" ✍️ captured · per-message · atomic 📖 recent-window (last ~20) into prompt L3 events concept_nodes WHERE type=EVENT — "what happened, distilled" ✍️ LLM extract · when session closes + signed emotional_impact [-10, +10] 📖 vector-search retrieval → every prompt L4 thoughts concept_nodes WHERE type=THOUGHT — "long-term impressions" ✍️ LLM reflect · 1-2 per session · gated + filling[] cites L3 event IDs (evidence) 📖 vector-search retrieval → every prompt input streamed reply read · every turn capture verbatim ① session closes · LLM extract prompts/extraction.py · 0–3 events · signed impact · 6 relational tags ② gate passes · LLM reflect prompts/reflection.py · 1–2 thoughts · tone rules · cited filling filling[] — evidence chain (parent=thought → child=event) retrieve retrieve write · 写入 (人 / daemon / LLM) distill · 提炼 (LLM 调用) read · 读入 prompt (每 turn) evidence · 溯源链

L1core_blocks

Five authored identity documents — persona, self, user, relationship, mood. Written by humans during onboarding or via the admin UI. Always sit at the top of every LLM prompt.

1 row per label · never auto-distilled

L2recall_messages

Every message verbatim — user and persona, both sides of every turn. No transformation. The ground truth other layers can reconstruct from.

1 row per message · FTS5 indexed · recent-window into prompt

L3events

Episodic memory. When a session closes, an LLM reads L2 and distills 0–3 discrete events, each with a signed emotional_impact in [-10, +10] and up to 3 relational tags from a closed vocabulary.

concept_nodes WHERE type=EVENT · embedded via sqlite-vec

L4thoughts

Long-term impressions. After gate passes (shock OR timer, never over 3/24h), an LLM reflects on recent L3 events and produces 1–2 thoughts. Every thought must cite its source event IDs via filling[].

concept_nodes WHERE type=THOUGHT · edges in concept_node_filling

writeImmediate · per message

L2 captures every incoming and outgoing message before the LLM streams a reply. This is the only write that happens on the turn's critical path. Latency matters here; keep it fast.

L1 is also a write target, but only by admin edits and the onboarding bootstrap — never per-turn.

distillDeferred · per session

After a session idle-closes (default 30min), a background worker pulls L2 messages and asks a small LLM to extract 0–3 events (L3). If the reflection gate allows, a medium LLM then reflects on recent events to produce 1–2 thoughts (L4), each with explicit filling[] evidence.

readFused · every turn

At assembly time, every new user message triggers: L1 read (all 5 blocks), L2 read (recent window), and an L3+L4 vector search with ranking formula:

total = 0.5·recency + 3·relevance + 2·|impact| + 1·relational_bonus

Nothing ever filters by channel_id (iron rule D4).

📚 Where this scoring formula comes from

The retrieval scoring is a direct descendant of Stanford's 2023 Generative Agents paper (Park et al., "Interactive Simulacra of Human Behavior"). In their implementation at reverie/backend_server/persona/cognitive_modules/retrieve.py, the new_retrieve() function fuses three signals: recency × 0.5, relevance × 3.0, importance × 2.0. EchoVessel's weights are lifted verbatim from there.

Four deliberate divergences from the Stanford original:

  1. Added relational_bonus × 1.0 — a closed-vocabulary tag bonus (identity-bearing / unresolved / vulnerability / turning-point / correction / commitment) tuned for long-term companionship scenarios, where "core facts about who the user is" deserve extra pull.
  2. Time-based recency instead of position-based decay. Stanford: decay^i where i is rank in last-accessed order — being retrieved refreshes the memory (cognitive-psychology "rehearsal" effect). EchoVessel: 14-day half-life from created_at, access never refreshes. Diary-style vs brain-style.
  3. Signed emotional_impact ∈ [-10, +10] instead of their unsigned poignancy ∈ [1, 10]. Lets grief (-9) and joy (+9) stay separated in retrieval; the magnitude drives salience, the sign drives valence grouping.
  4. Min-relevance floor of 0.4 — candidates whose cosine similarity to the query is below the floor get dropped before scoring. Stops truly-orthogonal high-impact memories from surfacing on totally unrelated queries. Added after the Over-recall eval metric flagged the failure mode.

Park's original comment in the reference code: "these weights should likely be learned, perhaps through an RL-like process". For now both projects hand-tune to [0.5, 3, 2]. If you find a better set via eval, the whole ranker is just four floats in memory/retrieve.py.