EchoVessel Memory · Six Layers, One Picture

L1–L4 form the conversation loop · L5 entities and L6 episodic state sit alongside, feeding the same prompt.
user · 用户输入 "我养了只猫,叫小黑" persona · 回复 "小黑听起来很可爱呢" LLM prompt assemble_turn() streaming reply → channel L1 core_blocks persona · user · style — "who the persona is" ✍️ authored · by human (onboarding / admin) 📖 read every single turn · never auto-updated L2 recall_messages every user + persona message, verbatim — "the ground truth" ✍️ captured · per-message · atomic 📖 recent-window (last ~20) into prompt L3 events concept_nodes WHERE type=EVENT — "what happened, distilled" ✍️ LLM extract · when session closes + signed emotional_impact [-10, +10] 📖 vector-search retrieval → every prompt L4 thoughts concept_nodes WHERE type=THOUGHT — "long-term impressions" ✍️ LLM reflect · 1-2 per session · gated + filling[] cites L3 event IDs (evidence) 📖 vector-search retrieval → every prompt input streamed reply read · every turn capture verbatim ① session closes · LLM extract prompts/extraction.py · 0–3 events · signed impact · 6 relational tags ② gate passes · LLM reflect prompts/reflection.py · 1–2 thoughts · tone rules · cited filling filling[] — evidence chain (parent=thought → child=event) retrieve retrieve L5 entities + aliases entities · entity_aliases · junction — "Scott" = "黄逸扬" = "Yiyang" ✍️ extraction · 3-tier dedup · alias / embedding / ask-user 🧭 alias scan every turn · entity_anchor rerank bonus 📖 cross-alias events join the candidate pool L6 episodic_state personas.episodic_state · JSON snapshot — "how I feel right now" ✍️ extraction side-effect · session_mood_signal ⏱ 12h decay back to neutral on assemble_turn 📖 # How you feel right now · skipped when neutral mentioned_entities session_mood_signal write · 写入 (人 / daemon / LLM) distill · 提炼 (LLM 调用) read · 读入 prompt (每 turn) evidence · 溯源链

L1core_blocks

Three authored identity documents — persona, user, style. Written by humans during onboarding or via the admin UI. Always sit at the top of every LLM prompt. L1 never auto-updates · reflection grows via L4 thoughts (subject='persona') and people/places grow via L5 entities with descriptions.

1 row per label · human-authored only

L2recall_messages

Every message verbatim — user and persona, both sides of every turn. No transformation. The ground truth other layers can reconstruct from.

1 row per message · FTS5 indexed · recent-window into prompt

L3events

Episodic memory. When a session closes, an LLM reads L2 and distills 0–3 discrete events, each with a signed emotional_impact in [-10, +10], up to 3 relational tags, and an optional event_time_* day-precision window for time-anchored recall.

concept_nodes WHERE type=EVENT · embedded via sqlite-vec

L4thoughts + intentions + expectations

Reflection layer · three sub-types share the same table. thought backward-looking insight, intention strict commitment with subject='persona', expectation forward prediction with event_time_end as due_at. Each carries a filling[] chain back to its source events.

concept_nodes WHERE type IN (THOUGHT, INTENTION, EXPECTATION)

L5entities + aliases

Canonical names for third-party people / places / orgs / pets, with many-to-one alias rows ("Scott" = "黄逸扬" = "Yiyang"). Three-tier dedup at extraction time: alias match → embedding 0.65 / 0.85 → uncertain branch where the persona naturally asks the user. At retrieve time an alias hit pulls every linked ConceptNode into the candidate pool with an entity_anchor rerank bonus.

tables · entities · entity_aliases · junction concept_node_entities

L6episodic_state

The persona's current affect as a single-row JSON snapshot — {mood, energy, last_user_signal, updated_at}. Written as a side-effect of extraction's session_mood_signal (no extra LLM call). assemble_turn entry decays mood back to neutral after 12h. Renders as # How you feel right now; the section is skipped while mood is neutral.

column · personas.episodic_state

writeImmediate · per message

L2 captures every incoming and outgoing message before the LLM streams a reply. This is the only write that happens on the turn's critical path. Latency matters here; keep it fast.

L1 is also a write target, but only by admin edits and the onboarding bootstrap — never per-turn · never by slow_tick.

distillDeferred · per session

After a session idle-closes (default 30min), a background worker pulls L2 messages and asks a small LLM to extract 0–3 events (L3) plus any new L5 entities; the same call emits a session_mood_signal that lands in L6. If the reflection gate allows, a medium LLM reflects to produce 1–2 thoughts (L4) with explicit filling[] evidence. The optional G phase then runs slow_cycle for between-session reflection — produces forward-looking expectation nodes under cool-down + token-wall + daily-cap gates.

readFused · every turn

At assembly time, every new user message triggers: L1 read (all 3 blocks), L6 read (episodic state · 12h decay check), L2 read (recent window grouped into day buckets), L5 alias scan (find_query_entities), and an L3+L4 vector search with ranking formula:

total = 0.5·recency + 3·relevance + 2·|impact| + 1·relational_bonus + 1.5·entity_anchor

Nothing ever filters by channel_id (iron rule D4).

📚 Where this scoring formula comes from

The retrieval scoring is a direct descendant of Stanford's 2023 Generative Agents paper (Park et al., "Interactive Simulacra of Human Behavior"). In their implementation at reverie/backend_server/persona/cognitive_modules/retrieve.py, the new_retrieve() function fuses three signals: recency × 0.5, relevance × 3.0, importance × 2.0. EchoVessel's weights are lifted verbatim from there.

Four deliberate divergences from the Stanford original:

  1. Added relational_bonus × 1.0 — a closed-vocabulary tag bonus (identity-bearing / unresolved / vulnerability / turning-point / correction / commitment) tuned for long-term companionship scenarios, where "core facts about who the user is" deserve extra pull.
  2. Time-based recency instead of position-based decay. Stanford: decay^i where i is rank in last-accessed order — being retrieved refreshes the memory (cognitive-psychology "rehearsal" effect). EchoVessel: 14-day half-life from created_at, access never refreshes. Diary-style vs brain-style.
  3. Signed emotional_impact ∈ [-10, +10] instead of their unsigned poignancy ∈ [1, 10]. Lets grief (-9) and joy (+9) stay separated in retrieval; the magnitude drives salience, the sign drives valence grouping.
  4. Min-relevance floor of 0.4 — candidates whose cosine similarity to the query is below the floor get dropped before scoring. Stops truly-orthogonal high-impact memories from surfacing on totally unrelated queries. Added after the Over-recall eval metric flagged the failure mode.

Park's original comment in the reference code: "these weights should likely be learned, perhaps through an RL-like process". For now both projects hand-tune to [0.5, 3, 2]. If you find a better set via eval, the whole ranker is just four floats in memory/retrieve.py.