EchoVessel Runtime Interactions & Memory Flows

"how each layer wakes up" edition · 2026.04 · main @ 1f9bdf6 · companion to architecture.html
What this page answers: when a user sends one message, which memory layers wake up in what order, which write their output back where, and how information loops between layers. The static architecture.html shows what exists; this one shows what happens.
L1core_blocks · persona frame
L2recall_messages · raw log
L3concept_nodes/EVENT · episodic
L4concept_nodes/THOUGHT,INTENTION,EXPECTATION · reflected
L5entities + aliases · canonical names
L6personas.episodic_state · current affect

1. How each memory layer comes into being the distillation rules that graduate raw text into persistent memory

Memory is the whole point · six layers, three transitions
L1 authored · L2 captured · L3 distilled · L4 reflected · L5 entities resolved · L6 episodic state tracked
CORE

L1 · core_blocks — how the 3 blocks are born

L1 is the only layer that is human-authored. Three blocks — persona / user / style — exist from persona inception onward, written directly by a human (or a bootstrap LLM pass on imported material that the user reviews before commit).

Path
Trigger
Writer
Which blocks
Mechanism
Onboarding
First run · /api/admin/persona/onboarding
persona / user
memory.onboarding.bootstrap_persona() · one CoreBlock row per label
Bootstrap from material
Onboarding path 2 · upload → LLM drafts the blocks
all 3
LLM synthesises from imported events/thoughts · user reviews before commit
Admin edit
Admin → Persona tab · save
persona / user
POST /api/admin/persona · partial update · atomic on-disk write · extra='forbid' rejects legacy keys (422)
Style-only edit
Admin → Persona tab · style preferences
style
POST /api/admin/persona/style · the only writer for the style block
L1 is NEVER auto-updated. No code path inside slow_tick / consolidate / extraction may write to core_blocks — pinned by the tests/memory/test_slow_cycle.py "L1-never-auto-update invariant" test. Per-turn affect lives in L6 (personas.episodic_state), which consolidate writes through automatically from extraction's session_mood_signal. Persona reflection grows in L4 thoughts (subject='persona'); third-party people grow in L5 entities.description. Neither lands in L1.

L2 · recall_messages — every message captured, verbatim

L2 is the only layer that is written during the user-facing turn. There's no distillation here — every incoming line and every persona reply is persisted verbatim as an atomic row.

When
What is written
Schema fields
Indexes
Output
🟢
User msg arrives
memory.ingest_message(..., role=USER) before LLM sees anything
role, content, channel_id, session_id, turn_id, created_at, token_count, day
FTS5 trigram · session join · day bucket
1 row + FTS index update
🟢
Persona reply finalised
ingest_message(..., role=PERSONA) after streaming completes
same shape · same turn_id · same session_id
FTS5 gets both sides of the turn
1 row
🟢
Session open/close
Session row tracks status, first_message_at, last_message_at, message count, token count
sessions table
status transitions: open → closing → consolidating → closed
1 row per session
Why verbatim: L2 is the ground truth. Later distillations (L3, L4) can go wrong — LLM hallucinates, eval catches a drift. When that happens, we need to go back to L2 to see what actually happened. Never paraphrase at ingest time.

L3 · concept_nodes WHERE type=EVENT — the first real distillation

When a session closes, prompts.extraction.EXTRACTION_SYSTEM_PROMPT asks a SMALL-tier LLM: "given this closed conversation, what is worth remembering?" Output is 0–3 event rows, each with a signed emotional impact and structured tags. This is the layer where raw text first becomes episodic memory.

What the extraction prompt instructs the LLM to produce

For each event:
   description · 1–3 sentences in the source language · third-person reference to user
   emotional_impact · signed integer in [-10, +10]
   emotion_tags · 0–4 free-form lowercase labels
   relational_tags · 0–3 from the closed vocabulary of 6 values

Emotional impact scale (from the prompt)

ValueMeaning (extraction prompt verbatim)Example
-10catastrophic loss, trauma, crisisdeath of close family · suicidal ideation voiced · violence disclosed
-7severe sadness, grief, serious conflictbreakup · job loss · long-buried secret first disclosed
-4meaningful stress, disappointmentargument with boss · sleep deprivation · anxiety attack
-1mild low, slight frustrationbad commute · minor annoyance
0pure neutral · rare (most shared things have some valence)used sparingly
+1mild pleasantnice weather · good meal
+4meaningful joy, satisfaction, connectionpromotion at work · fun weekend · first real laugh in weeks
+7major positive milestoneengagement · big win · deep reconciliation
+10life-defining joybirth of child · surviving a crisis · long-awaited reunion
Why a signed scale: |impact| drives retrieval salience, but the sign drives the distinction between grief and joy. "用户妈妈去世了" is -9 — storing it as +9 would make it retrieve alongside joyful memories. The prompt explicitly warns against sign flips.

Relational tag vocabulary (closed · exactly 6)

TagTriggers onExample
identity-bearinga core fact about who the user is"user is a single mom" · "user has depression"
unresolvedan emotional thread opened but not closed this sessionuser hints at fight but changes topic before resolution
vulnerabilitya rare moment of being unusually open or exposed"I've never told anyone this…"
turning-pointa shift in the relationship itselffirst real trust · first conflict · first private share
correctionuser corrected something the persona assumed"实际上不是那样" · "actually that's not what I meant"
commitmentexplicit promise or follow-up"下次聊" · "I'll tell you how it goes"
Retrieval bonus hook: having any relational_tag grants a +0.5 bonus to retrieval score (see §8). The prompt warns the LLM: "if you're tagging every event, you're over-tagging." Target ~20–30% of events should carry a relational tag.

Mandatory self-check (the safety net)

The prompt requires the LLM, after drafting events, to ask itself: "did this session contain any emotional PEAKS I failed to extract?" Typical missed peaks listed in the prompt:

  • a single casual mention of someone dying buried in mundane chat
  • a quick vulnerable disclosure followed by deflection
  • a question that is actually a cry for help ("你觉得活着累吗?")
  • understated positive milestones the user downplays ("btw, I got engaged")

The self_check_notes output field records the LLM's answer. Missing a peak is the #1 reason the Emotional Peak Retention eval metric fails — so the prompt treats this as non-skippable.

L4 · concept_nodes WHERE type=THOUGHT — reflection, not summary

After extraction writes L3 events, if the session passes the reflection gate (see §7C), a MEDIUM-tier LLM runs prompts.reflection.REFLECTION_SYSTEM_PROMPT. It's told: "you are the reflective inner voice · produce 1 or 2 quiet, honest impressions about what you have been noticing."

Tonal constraints (from the prompt · most important)

RuleWhat's forbiddenWhat's required instead
no clinical"subject exhibits patterns of avoidance""something Alan does is…" · "Alan tends to…"
no advice"Alan should talk to a professional"impressions describe · they do not prescribe
no labels"depression" · "anxiety disorder" · "PTSD"describe what you observe without naming it
source languagetranslating Chinese events into English thoughtsmatch whatever language the events were in
no meta"the session was 10 messages long" · "via Discord"describe the user, not the conversation's metadata (F10 iron rule)

Good vs bad reflections (from the prompt)

✅ Good
"Alan 把真正重的话都留到深夜才说。白天的他稳定只是保护色。"
"I've been noticing that Alan only lets himself be tired when no one is around to see it."
❌ Bad
"Subject demonstrates nocturnal disclosure pattern suggestive of attachment avoidance."
"The user appears to be experiencing suppressed grief symptoms."

Structural constraints

Field
Constraint
Why
Enforcement
Parse behaviour on violation
1
count
1 or 2 thoughts, never 0 (unless empty input), never 3+
most sessions produce 1 · a second only if genuinely distinct
MAX_THOUGHTS = 2
truncate to 2; warn
2
description
1–3 sentences · natural warm observation · first person or neutral third
long rants → hard to read · short clinical labels → wrong tone
soft cap ~500, hard cap 2000
truncate at hard cap; warn
3
emotional_impact
signed [-10, +10] · but reflections rarely exceed ±8
impressions are processed · rarely touch raw extremes
RECOMMENDED_IMPACT_BOUND = 8
warn at ±9/±10 · clamp to -10/+10
4
filling (evidence)
MUST cite ≥1 L3 event id from the input · no invention allowed
without provenance, forgetting-rights cascade can't work
every id must exist in input
rejected · thought dropped
The filling chain is load-bearing: it's what lets "forget this event" cascade correctly. If the user deletes an L3 event that five L4 thoughts cited, the system can either drop those thoughts (cascade) or mark their filling as orphaned (preserve thought, lose evidence). Without cited filling, neither option is possible — the chain is untraceable and the thought becomes a dangling insight. That's why the prompt treats missing filling as a rejection rather than a warning.
Zoom-out: L1 is authored (by humans or bootstrap LLM). L2 is captured (verbatim, never distilled). L3 is the first place where raw text becomes memory — extracted by LLM per session close, with strict rules on valence sign and the 6 relational tags. L4 is the inner voice — fewer thoughts, softer tone, cited evidence. Each transition is an LLM call guarded by the gates in §7.

2. One turn in motion every layer, every second, chronological

User types one sentence → persona streams one reply
runtime.interaction.assemble_turn · called by TurnDispatcher from IncomingTurn.channel_id → channel.send() back
HOT PATH
#
Who wakes up
What it does
Reads
Writes / emits
1
Channel
Debounce burst into one IncomingTurn; mint a turn_id.
internal: per-user debounce timer
queue → TurnDispatcher
2
L2 write
Persist each raw user line as a RecallMessage row before the LLM sees anything.
L2 · recall_messages, FTS5 index, session last_message_at
3
L1 read
Pull the 3 core blocks verbatim — persona / user / style. Always first in the prompt.
L1 · core_blocks (1 row per label)
4
L3+L4 retrieve
Embed the user's message · run vector search over L3+L4 concept_nodes · walk the relational graph for bonus scoring. No channel_id filter (iron rule D4).
L3+L4 · concept_nodes, concept_nodes_vec (sqlite-vec ANN), concept_node_filling (graph edges)
5
L2 read (window)
Load the last memory.recent_window_size messages from the current session as short-term chat context.
L2 · recall_messages WHERE session_id = current, DESC LIMIT N
6
Prompt assemble
Splice system prompt + L1 blocks + retrieved L3/L4 memories + L2 window + new user message. No transport identity anywhere (iron rule F10).
ephemeral prompt string → LLM
7
LLM stream
Call llm.complete(prompt, stream=True); every token arrives via on_token callback.
network: OpenAI / Anthropic / stub
tokens → channel + SSE broadcaster
8
Channel send + mirror
Channel delivers the reply on its native surface (Discord DM / SSE frame). Runtime also mirrors to the runtime-owned broadcaster tagged with source_channel_id.
SSE · user_appended, token×N, done, voice_ready
9
L2 write (persona)
Persist the persona's reply as its own RecallMessage row (role=persona). Same session_id, same turn_id.
L2 · recall_messages, FTS5 index, cost ledger
10
on_turn_done
Clear the channel's in_flight_turn_id; log the LLM call into the cost ledger; hand control back to the dispatcher queue.
llm_calls; chat.settings.updated (if flags toggled)
Zoom-out: the reply to the user takes steps 1-8 (channel → memory → LLM → channel). Steps 9-10 are post-turn housekeeping — they happen while the user is reading the reply. This is why consolidation to L3 / L4 / L5 / L6 is not here: it's asynchronous background work, covered in §3. L6 episodic_state lands at session-close as a side-effect of extraction emitting session_mood_signal (no extra LLM call); the per-turn read of L6 just decays the snapshot if it's older than 12 hours.

3. Same 10 steps, 8-column sequence who is active when

Channel
Runtime
L1
L2
L3
L4
LLM
UI
debounce
Step 1 · Channel mints turn_id, emits IncomingTurn
→ runtime
ingest
write
user_appended
Step 2 · Runtime writes each IncomingMessage into L2. UI sees the user's bubble via SSE.
assemble
read 3 blocks
Step 3 · L1 reads — persona / user / style. Always first. L6 read also runs here; if mood snapshot is > 12h old it's decayed back to neutral on the way in.
retrieve
vector search
vector search
Step 4 · L3+L4 retrieve via sqlite-vec + relational-graph walk. No channel_id filter (D4).
recent window
read N recent
Step 5 · Pull recent_window_size L2 messages from current session.
prompt
Step 6 · Splice system prompt + L1 + retrieved L3/L4 + L2 window + new user line. F10 guard.
stream
token×N
Step 7 · LLM streams tokens. Each token also mirrors to the runtime broadcaster (tag source_channel_id).
send
mirror
done
Step 8 · Originating channel delivers reply; runtime publishes chat.message.done to every SSE subscriber.
persist persona
write
Step 9 · Write persona's reply to L2 with same session_id + turn_id.
on_turn_done
log cost
Step 10 · Dispatcher releases the slot; llm_calls ledger records tokens + cost. L6 episodic_state is NOT written per-turn — it lands once when the session closes (as a side-effect of extraction's session_mood_signal); see §3 / §7.

4. After you stop typing session lifecycle & consolidation — the slow write

Idle → session closes → memory graduates L2 → L3 → L4
consolidate_worker + idle_scanner · background coroutines · user never waits on this
SLOW WRITE
t = 0
first user msg
+3s
persona reply
+15s
user follow-up
+20s
persona reply
+8 min
user goes quiet
+30 min
idle_scanner closes session
+30m 5s
L3 extraction
+30m 20s
L4 reflection

Step-by-step · what runs, in order

When
Worker
What it does
Reads
Writes
🌛
idle_scanner
Every idle_scanner.interval_seconds (default 60s) · finds sessions where now - last_message_at > SESSION_IDLE_MINUTES (default 30min).
sessions WHERE status='open'
sessions.status = 'closing'
📦
consolidate_worker
Polls every consolidate.worker_poll_seconds (default 5s) for sessions in status='closing'. Picks one at a time.
sessions WHERE status='closing'
sessions.status = 'consolidating'
📝
L2 read
Loads all recall_messages for the session. Skips entirely if < trivial_message_count OR < trivial_token_count (default 3 / 200).
L2 · all messages in session
🧩
LLM extract
Calls prompts.extract_fn(session_messages) with SMALL tier model. Produces a short list of ExtractedEvent(description, emotional_impact, tags...).
L2 · session window
LLM tokens → memory ingester
💚
L3 write
Each ExtractedEvent becomes a concept_node row (type=EVENT). Sentence-transformers embedder generates a 384-d vector; sqlite-vec stores it.
L1 · persona / user blocks (for context)
L3 · concept_nodes, concept_nodes_vec, concept_nodes_fts
🎯
reflection gate
Counts L4 thoughts written in last 24h. If ≥ reflection_hard_gate_24h (default 3) · skip reflection. Otherwise continue.
L4 · COUNT(*) WHERE created_at > now-24h
🪞
LLM reflect
Calls prompts.reflect_fn(recent_events) with MEDIUM tier. Produces 0-2 ExtractedThought(description, filling_ids).
L3 · recent events (last N sessions)
LLM tokens → memory ingester
🧡
L4 write
Each thought becomes a concept_node (type=THOUGHT) + one concept_node_filling row per source event (parent=thought, child=event).
L4 · concept_nodes, concept_node_filling (graph edges), concept_nodes_vec
close
Marks session status='closed'. Fires on_session_closed hook → SSE chat.session.boundary event → Web UI draws a timestamped rule.
SSE · chat.session.boundary
Why this is async: extracting events needs an LLM call; reflecting needs another. If we did this inline per turn, every user message would wait 2-3 seconds for extraction before seeing the persona's reply. Instead, we let the reply hit the user immediately (steps 1-8 in §1) and push L3/L4 writing to the background (§3). The cost is: L3/L4 are stale by ~30 minutes of idle time. The benefit: the chat feels real-time.

5. Story trace one real sentence, all six layers lit up

Scenario · user DMs the Discord bot

"我养了只白猫,叫小黑。他超调皮,老在半夜跳到我脸上。"
t+0.0s
CH
discord.py on_message → DiscordChannel.push_user_message · debounce timer armed for 2s.
t+2.0s
CH
Debounce expires · mint turn_id = turn-33bec7 · push IncomingTurn into queue.
t+2.1s
L2
Write ↘ One row into recall_messages · role=user · content = the full sentence · channel_id="discord" · session_id="s_d4e82a" · linked to turn-33bec7.
t+2.2s
L1
Read ↗ 3 core_blocks loaded · persona="她的性格是…" · user="你是一个在旧金山的软件工程师…" · style="don't say 'haha'". L6 read also runs · personas.episodic_state.mood="neutral" so the # How you feel right now section is skipped.
t+2.3s
L3
Read ↗ Embed the sentence · sqlite-vec ANN search · cold DB so 0 hits. Relational walk also 0 edges. No episodic memory to pull in yet.
t+2.3s
L4
Read ↗ Same search on thoughts · 0 hits · persona has no long-term impressions about you yet.
t+2.4s
LLM
Prompt assembled (core_blocks + empty retrieval + empty window + new sentence) · sent to gpt-4o · streaming begins.
t+2.4–5.8s
CH
Tokens stream back · Discord channel builds the reply · Web broadcaster mirrors every token tagged source_channel_id="discord".
t+5.8s
L2
Write ↘ Persona reply written as second row · role=persona · same turn_id.
t+31m
BG
idle_scanner tick · session s_d4e82a last_message_at is 31 min ago · mark status='closing'.
t+31m 5s
L3
Write ↘ Extract LLM produces: {description: "user has a white cat named 小黑 that's energetic and jumps on them at night", emotional_impact: +2, tags: [pet, night], session_mood_signal: {mood: "warm", energy: 6, last_user_signal: "shared a fond detail about pet"}} · stored with 384-d embedding. This is the moment 'persona remembers'. Same call also writes an L5 entity for "小黑" (kind=pet) and an L6 episodic_state update from the session_mood_signal field — no extra LLM round-trip; SSE chat.mood.update fires from on_mood_updated.
t+31m 20s
L4
Gate check · 24h thought count = 0 < 3 · OK to reflect. Reflection LLM produces: {description: "user has a playful bond with animals · their life has warmth at night", filling_ids: [<event_id>]}. Filling edge created thought → event.
later
CH
User returns, asks "你还记得我的猫吗" on Web (not Discord).
later
L3
Read ↗ Embed "你还记得我的猫吗" · sqlite-vec finds the 小黑 event with high cosine similarity. Persona mentions "小黑" by name. Cross-channel memory works — because D4 never filtered by channel_id.
Key moments: the raw sentence lives in L2 immediately (step at +2.1s). But the fact "user has a cat named 小黑" only becomes memory at +31m when the session idle-closes and consolidation runs. Ask the persona at +10min and it won't know about 小黑 via L3 retrieval — it would only remember because the raw text is still in the L2 recent-window of the same session. Once the session closes and re-opens, L2 window is reset, but L3 now carries the permanent fact.

6. Cross-layer read/write matrix who touches each layer, when

Layer Turn hot path reads Turn hot path writes Background reads Background writes Admin API
L1 load_core_blocks() · every prompt · 3 labels (persona / user / style) extract_fn may read persona / user blocks for context
  • GET /api/admin/persona
  • POST /api/admin/persona (rejects legacy keys with 422)
  • POST /api/admin/persona/style (style block)
  • onboarding · bootstrap-from-material
L2 recent window · recent_window_size msgs every ingested message · user + persona whole-session read for extraction
  • GET /api/chat/history
  • GET /api/admin/memory/search (FTS)
  • DEL /api/admin/memory/messages/{id}
  • DEL /api/admin/memory/sessions/{id}
L3 vector + graph walk · every turn reflect_fn reads recent events for input extract_fn output · per closed session
  • GET /api/admin/memory/events (paginated)
  • GET /api/admin/memory/events/{id}/dependents
  • DEL /api/admin/memory/events/{id} (orphan or cascade)
  • import pipeline also writes L3
L4 vector search + force_load_user_thoughts · every turn gate reads 24h count · slow_tick reads recent events for forward-looking inference reflect_fn output (thoughts) · slow_tick output (thoughts + expectations + intentions)
  • GET /api/admin/memory/thoughts
  • GET /api/admin/memory/thoughts/{id}/trace
  • DEL /api/admin/memory/thoughts/{id}
  • GET /api/admin/slow-tick/transcripts
L5 alias scan via find_query_entities() · every turn extract_fn checks for known aliases during dedup extract_fn writes new entities + alias rows + junction edges
  • GET /api/admin/memory/entities
  • POST /api/admin/memory/entities/{id}/merge
L6 personas.episodic_state · 12h decay check on assemble_turn entry update_episodic_state() on session close from extraction's session_mood_signal
  • GET /api/admin/persona (returns episodic_state)
Observation: hot path (per-turn) is dominated by reads from L1+L2+L3+L4+L5+L6, with L2 writes as the only hot-path write. Everything that creates new memory (L3 / L4 / L5 / L6) is background. This is by design — it's what keeps turn latency low while still building up long-term knowledge.

7. L6 episodic state lifecycle how persona affect updates and reaches the browser

How L6 actually updates: the extraction LLM that runs on session close is asked, in the same JSON output, for a session_mood_signal field. update_episodic_state() writes that signal into personas.episodic_state as a side effect — no extra LLM call, no separate observer pass over L2. The on_mood_updated hook is preserved (the SSE topic is still chat.mood.update), but its trigger has moved from "every turn" to "every consolidate-driven session close".
extraction.session_mood_signal → update_episodic_state → SSE → Web UI header
single LLM call writes both L3 events and L6 affect
LIVE
1. Session closes
consolidate_worker picks up status='closing'
2. Extraction LLM runs
SMALL tier · returns L3 events + L5 entities + session_mood_signal in one JSON
3. Consolidate writes through
events + entities persisted; mood signal validated
Validated signal funnels through memory.update_episodic_state().
4. L6 write
personas.episodic_state JSON replaced; on_mood_updated hook fires on the lifecycle queue
5. SSE broadcast
chat.mood.update · runtime-owned broadcaster · every subscriber
6. Web UI re-renders
usePersona hook slices the new episodic_state into the chat header
Read-side closure: the next turn's Step 3 (L1+L6 read) pulls the freshly-written episodic_state. assemble_turn entry checks the snapshot's updated_at; if it's older than 12 hours, mood resets to neutral so a long quiet period doesn't open the next conversation under stale affect. When mood is neutral, the # How you feel right now system-prompt section is skipped entirely to keep the prompt terse.

8. Emotion & scoring algorithms the math behind what gets remembered and what gets surfaced

Three distinct algorithms · three distinct jobs
retrieval fusion score · consolidation trivial-gate · shock / timer / hard-gate reflection triggers
MATH

Algorithm A · Retrieval fusion score

Every L3/L4 candidate returned from sqlite-vec ANN gets re-ranked by a weighted sum of 4 signals. The weights are in src/echovessel/memory/retrieve.py.

total = 0.5 · recency + 3.0 · relevance + 2.0 · |impact| + 1.0 · relational_bonus
Signal
Weight
Source
Formula
Range
R
recency
how fresh this memory is
exp(-ln(2) · days_since / 14) · 14-day half-life
[0, 1]
V
relevance
semantic similarity of query → candidate
1 - distance/2 · clamped · min-floor 0.4 drops orthogonal hits
[0, 1]
I
|impact|
how emotionally loaded the memory is
min(|emotional_impact| / 10, 1.0)
[0, 1]
G
relational_bonus
has the node got relational_tags?
0.5 if relational_tags non-empty, else 0
{0, 0.5}
Why these weights? Relevance (3.0) dominates — what a memory is about matters most. Impact (2.0) ensures peak emotional moments surface even when their semantics drift from the query. Recency (0.5) gently prefers fresh memories without drowning old-but-important ones. Relational bonus (1.0) pulls in graph-connected nodes. The min-relevance floor of 0.4 is important: without it, strictly orthogonal candidates would occasionally bubble up on pure |impact| × relational_bonus, causing false-positive recall. With the floor, truly unrelated memories can't enter the ranked set even if they're emotionally loaded.

Algorithm B · Strong-emotion override (consolidation)

When deciding whether a session is "trivial" (skip extraction), the session below threshold can still get promoted to L3 if it contains any strong-emotion keyword. This is a keyword-matched safety net for peak emotional moments that happen in short sessions.

Category
zh keywords
en keywords
Why this list
Effect
💧
Bereavement / loss
走了 · 去世 · 死了 · 离世 · 葬礼 · 没了
died · passed away · funeral
loss events are almost always L3-worthy regardless of session length
override trivial gate
⚠️
Crisis
撑不住 · 不想活 · 活不下去 · 自杀 · 崩溃
can't go on · suicide · breakdown
safety-critical · never silently dropped
override trivial gate
🎭
Major milestones
分手 · 离婚 · 被裁
breakup · divorce · fired
large identity shifts · persona should remember even from a one-liner
override trivial gate
Logic: is_trivial(session, messages) returns True only when BOTH below-threshold AND no strong emotion keyword. _has_strong_emotion(messages) is a case-insensitive substring match — optimized for recall, not precision. False positives occasionally push a mundane sentence through extraction; that's an acceptable cost vs. losing a late-night single line about a breakup.

Algorithm C · Reflection triggers (shock · timer · hard gate)

Once a session clears extraction (L3 events exist), the next question is whether to run reflection (L4 thoughts). Three decision rules converge.

SHOCK_IMPACT_THRESHOLD
If any freshly-extracted L3 event has |emotional_impact| ≥ 8 · force reflection NOW even if the timer hasn't elapsed. Rationale: peak moments should reshape persona's impression immediately, not wait 24 hours.
SHOCK_IMPACT_THRESHOLD = 8
memory/consolidate.py
TIMER_REFLECTION_HOURS
Even without a shock event · if > 24 hours have passed since the last reflection · run one. Keeps persona's long-term impressions slowly updating even during routine chats.
TIMER_REFLECTION_HOURS = 24
memory/consolidate.py
REFLECTION_HARD_LIMIT_24H
Regardless of shock or timer · no more than 3 reflections per rolling 24-hour window. Prevents L4 explosion on chatty days or debugging sessions. The hardest of the three gates — wins over shock and timer.
REFLECTION_HARD_LIMIT_24H = 3
(configurable via consolidate.reflection_hard_gate_24h)
Decision order:
  1. count reflections in last 24h → if ≥ 3, skip (hard gate wins)
  2. any fresh event with |impact| ≥ 8? → reflect (shock path)
  3. last reflection > 24h ago? → reflect (timer path)
  4. otherwise → skip this session

Where does emotional_impact come from?

The emotional_impact signed integer on each L3 event isn't computed by a formula — it's produced by the extraction LLM itself. The prompt in prompts/extract.py instructs the model to rate -10 (catastrophic loss / grief) to +10 (peak joy / breakthrough), with 0 for mood-neutral facts. The algorithm layer here is the prompt engineering rather than a numerical rule, which is why tweaking it requires editing prompt text, not code.

9. How retrieval actually ranks inside the L3+L4 read of step 4

Ranked fusion · vector similarity + relational-graph bonus
memory.retrieve() · lives in src/echovessel/memory/retrieve.py
DETAIL
Phase
Source
What
Formula / knob
Result
A
Query
Embed the new user message with sentence-transformers.
all-MiniLM-L6-v2 · 384-d
query vector q
B
ANN search
sqlite-vec returns top-K nodes by cosine similarity.
K = memory.retrieve_k (default 10)
candidates[] with cos_sim
C
Graph walk
For each candidate, sum relational edges from concept_node_filling weighted by recency and type.
bonus = edges × memory.relational_bonus_weight (default 1.0)
candidates[] now have bonus
D
Score
Final score = cos_sim + relational_bonus_weight · graph_bonus. Sort DESC.
_score_node() · pure function
ranked[] for prompt
E
Trim
Cut to top N by prompt budget. Favour higher emotional_impact on ties.
prompt budget = remaining context tokens
selected[] attached to prompt
Why the graph bonus? Vector similarity catches semantically-close memories but misses causal chains. "小黑 跳科目三" and "买了新的跳舞游戏" aren't semantically close, but they share a relational edge via the L4 thought "user likes playful things at home". The bonus pulls in graph-connected nodes even when cosine similarity alone wouldn't rank them.

10. Policy gates the checks that can stop a write

Four gates guard memory writes & proactive outputs
each gate has a config knob and a test · fail-closed by design
GUARDS
trivial_session
Session's total messages < consolidate.trivial_message_count OR tokens < consolidate.trivial_token_count → skip extraction. Rationale: "hi / hey / how's it going" sessions don't earn a permanent L3 event.
memory.consolidate.is_trivial()
reflection_hard_gate_24h
Persona has already written ≥ consolidate.reflection_hard_gate_24h thoughts in last 24 hours → skip reflection for this session. Rationale: prevents L4 explosion on chatty days.
memory.consolidate.consolidate_session()
no_in_flight_turn
Proactive scheduler wants to send an autonomous message, but channel has in_flight_turn_id set → defer. Rationale: don't interrupt an active conversation.
proactive.policy.PolicyEngine
quiet_hours
Proactive attempts between proactive.quiet_hours_start and quiet_hours_end → drop. Respects user's sleep.
proactive.policy.quiet_hours_check()
rate_limit
Persona has already sent ≥ proactive.max_per_24h autonomous messages in last 24h → drop. Rationale: bound the dial.
proactive.policy.rate_limit_check()

11. SSE mirror as the nervous system what events flow, in both directions

Runtime broadcaster carries both turn events and lifecycle events
every SSE subscriber is a god-view observer
NERVOUS SYSTEM
Event
Origin
When it fires
Carries
UI reacts
chat.message.user_appended
New user message ingested (per message inside a turn)
user_id, content, turn_id, source_channel_id
append user bubble to timeline
chat.message.token
LLM emits one streaming token
delta, turn_id, source_channel_id
concatenate into persona bubble
chat.message.done
Channel.send() succeeds, turn complete
content, delivery, source_channel_id
finalise bubble · render channel pill
chat.message.voice_ready
TTS finishes, voice artifact cached
audio_url, duration
show ▶ voice play button
chat.mood.update
L6 episodic_state refreshed (post session-close, via extraction's session_mood_signal)
persona_id, user_id, mood_summary
re-render chat header mood · prepend mood-shift row into Memory Timeline panel
chat.session.boundary
Session transitions open → closed · or a new session starts
closed_session_id, new_session_id, events_count, thoughts_count (close edge)
timeline session-close summary row + timestamped horizontal rule in bubble column
memory.event.created
Consolidate or import commits a new L3 event (RuntimeMemoryObserver → on_event_created)
event_id, description, emotional_impact, session_id
prepend event row into Memory Timeline panel
memory.thought.created
Fast-loop reflection, slow_tick G phase, or import commits a new L4 thought / intention / expectation
thought_id, type, subject, description, source ∈ {reflection, slow_tick, import}, filling_event_ids
prepend thought / intention / expectation row into Memory Timeline panel (subject='persona' feeds # How you see yourself lately on next turn)
memory.entity.confirmed
L5 entity resolves to confirmed (new row, alias-matched, or operator-merged · uncertain entities are filtered at source)
entity_id, canonical_name, kind, merge_status
prepend entity row into Memory Timeline panel
memory.entity.description_updated
slow_tick synthesizes a description over threshold, or owner PATCHes via admin API
entity_id, canonical_name, description, source ∈ {slow_tick, owner}
prepend entity-description row into Memory Timeline panel
chat.settings.updated
voice_enabled / provider / etc flipped (e.g. SIGHUP)
changed_fields
cross-tab config sync
chat.connection.ready
SSE handshake complete
green dot in status pill
chat.connection.heartbeat
Every 30s · NAT keepalive
no visual
Companion: the static-view equivalent of this page is architecture.html — layout, tables, endpoints, iron rules. If this page is the nervous system, that one is the anatomy.