EchoVessel Memory — Six Layers, One Picture

L1core_blocks

Three authored identity documents — persona, user, style. Written by humans during onboarding or via the admin UI. Always sit at the top of every LLM prompt. L1 never auto-updates · reflection grows via L4 thoughts (subject='persona') and people/places grow via L5 entities with descriptions.

1 row per label · human-authored only

L2recall_messages

Every message verbatim — user and persona, both sides of every turn. No transformation. The ground truth other layers can reconstruct from.

1 row per message · FTS5 indexed · recent-window into prompt

L3events

Episodic memory. When a session closes, an LLM reads L2 and distills 0–3 discrete events, each with a signed emotional_impact in [-10, +10], up to 3 relational tags, and an optional event_time_* day-precision window for time-anchored recall.

concept_nodes WHERE type=EVENT · embedded via sqlite-vec

L4thoughts + intentions + expectations

Reflection layer · three sub-types share the same table. thought backward-looking insight, intention strict commitment with subject='persona', expectation forward prediction with event_time_end as due_at. Each carries a filling[] chain back to its source events.

concept_nodes WHERE type IN (THOUGHT, INTENTION, EXPECTATION)

L5entities + aliases

Canonical names for third-party people / places / orgs / pets, with many-to-one alias rows ("Scott" = "黄逸扬" = "Yiyang"). Three-tier dedup at extraction time: alias match → embedding 0.65 / 0.85 → uncertain branch where the persona naturally asks the user. At retrieve time an alias hit pulls every linked ConceptNode into the candidate pool with an entity_anchor rerank bonus.

tables · entities · entity_aliases · junction concept_node_entities

L6episodic_state

The persona's current affect as a single-row JSON snapshot — {mood, energy, last_user_signal, updated_at}. Written as a side-effect of extraction's session_mood_signal (no extra LLM call). assemble_turn entry decays mood back to neutral after 12h. Renders as # How you feel right now; the section is skipped while mood is neutral.

column · personas.episodic_state

writeImmediate · per message

L2 captures every incoming and outgoing message before the LLM streams a reply. This is the only write that happens on the turn's critical path. Latency matters here; keep it fast.

L1 is also a write target, but only by admin edits and the onboarding bootstrap — never per-turn · never by slow_tick.

distillDeferred · per session

After a session idle-closes (default 30min), a background worker pulls L2 messages and asks a small LLM to extract 0–3 events (L3) plus any new L5 entities; the same call emits a session_mood_signal that lands in L6. If the reflection gate allows, a medium LLM reflects to produce 1–2 thoughts (L4) with explicit filling[] evidence. The optional G phase then runs slow_cycle for between-session reflection — produces forward-looking expectation nodes under cool-down + token-wall + daily-cap gates.

readFused · every turn

At assembly time, every new user message triggers: L1 read (all 3 blocks), L6 read (episodic state · 12h decay check), L2 read (recent window grouped into day buckets), L5 alias scan (find_query_entities), and an L3+L4 vector search with ranking formula:

total = 0.5·recency + 3·relevance + 2·|impact| + 1·relational_bonus + 1.5·entity_anchor

Nothing ever filters by channel_id (iron rule D4).

📚 Where this scoring formula comes from

The retrieval scoring is a direct descendant of Stanford's 2023 Generative Agents paper (Park et al., "Interactive Simulacra of Human Behavior"). In their implementation at reverie/backend_server/persona/cognitive_modules/retrieve.py, the new_retrieve() function fuses three signals: recency × 0.5, relevance × 3.0, importance × 2.0. EchoVessel's weights are lifted verbatim from there.

Four deliberate divergences from the Stanford original:

Added relational_bonus × 1.0 — a closed-vocabulary tag bonus (identity-bearing / unresolved / vulnerability / turning-point / correction / commitment) tuned for long-term companionship scenarios, where "core facts about who the user is" deserve extra pull.
Time-based recency instead of position-based decay. Stanford: decay^i where i is rank in last-accessed order — being retrieved refreshes the memory (cognitive-psychology "rehearsal" effect). EchoVessel: 14-day half-life from created_at, access never refreshes. Diary-style vs brain-style.
Signed emotional_impact ∈ [-10, +10] instead of their unsigned poignancy ∈ [1, 10]. Lets grief (-9) and joy (+9) stay separated in retrieval; the magnitude drives salience, the sign drives valence grouping.
Min-relevance floor of 0.4 — candidates whose cosine similarity to the query is below the floor get dropped before scoring. Stops truly-orthogonal high-impact memories from surfacing on totally unrelated queries. Added after the Over-recall eval metric flagged the failure mode.

Park's original comment in the reference code: "these weights should likely be learned, perhaps through an RL-like process". For now both projects hand-tune to [0.5, 3, 2]. If you find a better set via eval, the whole ranker is just four floats in memory/retrieve.py.