EchoVessel — Internal Architecture Deep Dive

Current baseline · v0.0.1-alpha

14 commits from initial snapshot to main · 916 passed / 3 skipped / 0 failed

SHIPPED

echovessel rundaemon · SIGTERM/SIGHUP · pidfile

memory.dbSQLite + FTS5 + sqlite-vec

LLM providersopenai_compat / anthropic / stub

FishAudio TTSper-persona voice_id · MP3 cache

Web admin page6 tabs · zero placeholders

Import pipelineupload → LLM extract → memory

Proactive scheduleropt-in · 4 policy gates

Cross-channel SSEruntime-owned broadcaster

Chat history backfillGET /api/chat/history

Voice clone wizard3-step FishAudio clone

Cost trackingllm_calls ledger · per-feature

GitHub Actions CIruff · lint-imports · pytest

Five modules · strict downward-only imports

Enforced by import-linter · 2 contracts · 0 broken

LAYERED

runtime

daemon loop · turn dispatcher · LLM factory · SIGHUP reload · import facade · cost logger

app.py launcher.py interaction.py consolidate_worker.py

channels · proactive

channels translate external protocols ↔ IncomingTurn · proactive schedules autonomous messages

can import ↓ memory · voice · core

memory · voice

memory = L1-L6 persistence + retrieve + consolidate · voice = TTS/STT/cloning provider layer

can import ↓ core

core

shared types · enums · config paths · utilities · zero external dependencies

no upward imports

Contract enforcement: uv run lint-imports fails CI if any module imports upward. The second contract (proactive must not import runtime or prompts) prevents a subtle cycle where proactive scheduling would transitively pull in the LLM factory.

L1 → L6 · raw transcripts, reflected insights, named entities, current affect

One SQLite file · all layers share it · never sharded by channel_id (iron rule D4)

CORE

Core blocks — the persona's frame

Three hand-authored identity documents that every LLM prompt starts with. Never auto-updated.

table · core_blocks audit · core_block_appends

persona_block — who this persona is (trait, tone, values)
user_block — identity-level facts about you
style_block — owner-curated style preferences (don't say "haha", etc.)
L1 is human-authored only. Reflection grows in L4 thoughts (subject='persona'); third-party people grow in L5 entities.description.

write path

onboarding · admin edit · importer bootstrap

read path

every turn's prompt prefix

Recall messages — raw chat log

Every user + persona message, verbatim, tagged with channel + turn + session.

table · recall_messages fts · recall_messages_fts

role: "user" | "persona"
channel_id stored, but never read as filter (D4)
session_id groups a conversation burst
turn_id groups one user→persona exchange
FTS5 trigram index for search + LIKE fallback for short queries

write path

memory.ingest_message() · per message · atomic

read path

/api/chat/history · recent-window context · admin memory search

Events — episodic memory

"What happened" distilled out of chat · one concrete event per node.

table · concept_nodes WHERE type=EVENT vec · concept_nodes_vec

description — short 1-2 sentence summary
emotional_impact — signed integer, drives retrieval weighting
emotion_tags / relational_tags — JSON lists
sentence-transformers embedding (384-d) written via sqlite-vec
linked back to source session/turn (source_session_id, imported_from)

write path

consolidate worker (post session close) · import pipeline

read path

vector search + relational graph walk · every turn's retrieval step

Thoughts — long-term impressions + forward-looking expectations

"What persona believes about you" + "what persona expects next" · same table, two sub-types.

table · concept_nodes WHERE type IN (THOUGHT, INTENTION, EXPECTATION) link · concept_node_filling

type=thought · subject='user' — durable insight about the user ("you find quiet afternoons grounding")
type=thought · subject='persona' — persona's self-reflection produced by slow_tick · feeds the user prompt's # How you see yourself lately section · the only physical path by which the persona's self-image grows
type=intention — strict commitment with subject='persona' ("I'll text you back tomorrow")
type=expectation — slow_tick forward prediction with event_time_end as due_at
concept_node_filling · parent=thought, child=event · the evidence chain
reflection hard-gate · max 3 new thoughts / 24h per persona (configurable)
skipped entirely for trivial sessions (< 3 messages or < 200 tokens)
all sub-types embedded in sqlite-vec for semantic recall

write path

consolidate worker reflect_fn (fast loop · SHOCK / TIMER) · slow_tick G phase (slow loop · between-session reflection · also where subject='persona' rows come from)

read path

retrieval · pinned # About {speaker} · # How you see yourself lately · # Promises you've made · # You've been expecting · delete → shows source events to confirm cascade

Entities — canonical names + aliases

"Scott" = "黄逸扬" = "Yiyang" — alias join keeps cross-language recall working.

tables · entities entity_aliases junction · concept_node_entities

entities · canonical name + kind (person | place | org | pet) + tri-state merge_status + a description prose column
entity_aliases · many-to-one alias → entity (case-sensitive, CJK kept whole)
concept_node_entities · many-to-many junction binding L3 events to L5 entities
three-tier dedup at extraction time: alias match → embedding 0.65 / 0.85 thresholds → uncertain branch where the persona naturally asks the user
at retrieve time: alias hit pulls every linked ConceptNode into the candidate pool with an entity_anchor rerank bonus
description · slow_tick auto-synthesizes when linked_events_count ≥ 3 · owner can PATCH /api/admin/memory/entities/{id} to override (sets owner_override=true) · renders into the system prompt as # About {canonical_name} on alias hits

write path

extraction inside consolidate creates entities · slow_tick G phase or admin PATCH writes description via update_entity_description(...)

read path

every turn's find_query_entities() alias scan · # About {canonical_name} rendered when description is non-empty · # Entity disambiguation pending hint when uncertain

Episodic state — current affect

"How the persona feels right now" · single-row JSON snapshot · 12h decay back to neutral.

column · personas.episodic_state JSON

{mood, energy, last_user_signal, updated_at}
extraction LLM emits session_mood_signal alongside events; consolidate writes it through · zero extra LLM calls
assemble_turn entry decays mood back to neutral if the snapshot is older than 12 hours
renders as # How you feel right now in the system prompt; section is skipped while mood is neutral

write path

consolidate side-effect when extraction emits session_mood_signal

read path

every turn's system prompt · 12h decay check on assemble_turn entry

Consolidate pipeline: session closes (idle ≥ 30 min OR max length) → worker picks it up → extracts L3 events + L5 entity links from L2 messages (SMALL tier; stub for trivial sessions) → runs L4 reflection if the reflection gate allows → writes everything to concept_nodes + entities with embeddings via sqlite-vec → side-effect updates personas.episodic_state from extraction's session_mood_signal → optional G phase runs slow_cycle for between-session reflection (cool-down + token-wall + daily-cap gates). Four tuning knobs (trivial_message_count, trivial_token_count, reflection_hard_gate_24h, memory.relational_bonus_weight) are live via config.toml; slow_tick has its own [slow_tick] section.

Discord DM → Persona reply, with cross-channel mirror to Web

runtime._handle_turn_body orchestrates every step

HOT PATH

Receive

discord.py on_message · DiscordChannel.push_user_message

→

Debounce

2 s timer per user · burst → one IncomingTurn

→

Dispatch

TurnDispatcher · serial queue across all channels

→

4
Assemble
ingest L2 · retrieve L3/L4 · prompt · LLM stream

→

Send

Discord DM · + optional TTS voice message

→

Mirror

SSE broadcast · source_channel_id="discord"

→

on_turn_done

clear in-flight · cost log · consolidate eligible?

Step 4 expanded: runtime.interaction.assemble_turn() performs 5 sub-steps per turn — ingest each message into L2, run retrieve() (no channel_id filter!), assemble prompt from core_blocks + retrieved L3/L4, call llm.complete() with streaming tokens, and ingest the assistant's reply into L2 once streaming finishes.

Runtime-owned broadcaster

Promoted from WebChannel in commit 54f69d2

LIVE SYNC

🌐 SSEBroadcaster owned by Runtime

Every channel's turn events mirror through one shared broadcaster. Every payload carries source_channel_id. Web UI renders a channel pill (📱 Discord / 💬 iMessage). Failure-isolated: if publish raises, the originating channel's send() still succeeds.

Mirrored events

chat.message.user_appended chat.message.token chat.message.done chat.message.voice_ready chat.mood.update chat.session.boundary chat.settings.updated

History backfill

GET /api/chat/history

PAST

↑ useChat mount · 50 newest, DESC

On every browser mount, useChat calls getChatHistory(50), reverses to ascending, prepends into the timeline, and only then starts the SSE stream. Cursor paging via before=<turn_id> walks further back. "Load older" button prepends more.

Query params

Param	Semantics
`limit`	1–200, default 50 · clamped 422 on overflow
`before`	turn_id cursor · returns messages older than that turn's first message · 404 if cursor unknown

Chat

WEB CHANNEL

POST/api/chat/senduser message · ingests + dispatches

GET/api/chat/eventsSSE stream · runtime-owned broadcaster

GET/api/chat/historybackfill · cross-channel

GET/api/chat/voice/{id}.mp3cached TTS audio

Admin · state & persona

RUNTIME

GET/api/statedaemon + channel readiness + memory counts

GET/api/admin/persona3 core blocks (persona / user / style)

POST/api/admin/personapartial update · atomic TOML write

POST/api/admin/persona/onboardingfirst-run bootstrap

POST/api/admin/persona/voice-toggleflip persona.voice_enabled

POST/api/admin/persona/bootstrap-from-materialLLM-synthesise blocks from import

Admin · memory

L3/L4

GET/api/admin/memory/eventsL3 list · pagination

GET/api/admin/memory/thoughtsL4 list · pagination

GET/api/admin/memory/searchFTS5 + LIKE fallback · highlights

POST/api/admin/memory/preview-deletecascade preview

DEL/api/admin/memory/events/{id}orphan or cascade choice

DEL/api/admin/memory/thoughts/{id}soft delete

GET/api/admin/memory/events/{id}/dependentswhich thoughts derive from this?

GET/api/admin/memory/thoughts/{id}/tracewhich events fed this?

Admin · import & voice

PIPELINES

POST/api/admin/import/uploadmultipart file

POST/api/admin/import/upload_textpaste text

POST/api/admin/import/estimatetokens + USD

POST/api/admin/import/startspawn pipeline · returns pipeline_id

POST/api/admin/import/cancelidempotent

GET/api/admin/import/eventsSSE per pipeline

POST/api/admin/voice/samplesupload training clip

POST/api/admin/voice/cloneFishAudio interactive clone

POST/api/admin/voice/previewstreaming audio/mpeg

POST/api/admin/voice/activatewrite voice_id to config

Admin · config & cost

KNOBS

GET/api/admin/configsafe subset · api_key_present, never value

PATCH/api/admin/configatomic write + SIGHUP reload

GET/api/admin/cost/summarytoday / 7d / 30d · by feature

GET/api/admin/cost/recentlast 50 LLM calls

Admin · forget (cascade)

DESTRUCTIVE

DEL/api/admin/memory/messages/{id}mark L3 source_deleted

DEL/api/admin/memory/sessions/{id}cascade messages

DEL/api/admin/memory/core-blocks/{label}/appends/{id}physical audit-row delete

TurnDispatcher

SERIAL

One queue, one worker coroutine, one turn at a time — across all channels. Web sends and Discord DMs compete for the same slot. This is the contract that lets memory writes and LLM calls assume no concurrent mutation. Parallel turns are ordered by arrival at the dispatcher, not by channel.

Why not one-per-channel? Because the persona has one brain. Parallel LLM calls would produce interleaved memory writes and mood updates. We picked simplicity over throughput.

Background workers

IDLE

consolidate_workerpolls closed sessions · runs extract + reflect

idle_scannerevery 60 s · closes sessions idle > 30 min

SSE heartbeat30 s · keeps NAT connections alive

proactive scheduleropt-in · 4 gate policy

~/.echovessel/config.toml

KNOBS

Authored by echovessel init from resources/config.toml.sample. Sections: [runtime], [persona], [memory], [llm], [consolidate], [idle_scanner], [voice], [proactive], [channels.web], [channels.discord]. Hot-reload set: LLM provider/model/params, persona display_name, memory tuning, consolidate thresholds. Restart-required set: data_dir, db_path.

Hot reload via SIGHUP

llm.provider llm.model llm.temperature persona.display_name persona.voice_id memory.retrieve_k consolidate.*

./.env (CWD)

SECRETS

Loaded by _load_dotenv() at echovessel run startup from Path.cwd() / ".env". Shell-exported env vars take precedence. echovessel init writes a commented-out template at chmod 0600 and never overwrites an existing .env (not even with --force). The committed template is .env.example.

Expected keys

OPENAI_API_KEY ANTHROPIC_API_KEY FISH_AUDIO_KEY ECHOVESSEL_DISCORD_TOKEN

GitHub Actions

ENFORCED

ruff checksrc/ + tests/ · ubuntu + macos matrix

lint-imports2 contracts · layered + proactive-no-runtime

pytest -q916 passed · 3 skipped · 0 failed

triggerspush to main + every PR · concurrency group

hatch_build.py

WHEEL

uv buildre-runs npm run build + vite · embeds /static/

wheel contentssrc/echovessel/ + resources/ + static bundle

artifact size~415 KB wheel · ~365 KB sdist

PyPInot yet published · run from source via git clone

14 commits · initial snapshot → current main

2026-04-15 → 2026-04-16 · ~24 wall-clock hours

JOURNEY

Commit	Scope
`4357250`	Initial snapshot — EchoVessel pre-v0.0.1
`7325498`	Round 1 truth-layer landing — 4 config fields wired through, `runtime/proactive.py` dead stub removed, SSE pruning
`ed7fe4a`	Round 2 · Import pipeline wired end-to-end — facade + 5 admin routes + 3-step wizard
`1959fe3`	Wave A · Admin UI truth-layer — Events/Thoughts/forget/mood/session boundary/Discord status
`8006185`	Wave B · Cost tracking + Config edit
`36fe8b8`	Wave C · Memory search / trace / onboarding path 2 / voice clone
`85e39f9`	Housekeeping · SQLite lock fix · FastAPI 422 rename
`d9c6ba3`	CI fix · skip eval tests when corpus missing · loosen facade timeout
`f67bc07`	echovessel init writes .env template
`49bf995`	Move .env to repo-root (CWD) + add .env.example
`3535406`	Scrub remaining ~/.echovessel/.env refs in docs
`6e1d3af`	README/CHANGELOG · drop PyPI-install framing (not released yet)
`54f69d2`	Cross-channel unified Web timeline (live SSE + history backfill)
`6e3a7a1`	Fix cost_logger table creation + CHANGELOG truth sweep
`a8fc089`	Public docs · cross-channel + truth sweep
`852fb62`	docs/channels: naming note (channel == stateful message gateway)

EchoVessel Internal Architecture Deep Dive

1. Module Layers — the layered architecture

2. Memory — the six-layer store

3. Message Flow — one turn end-to-end

4. Cross-Channel SSE — Web as god-view

Mirrored events

Query params

5. HTTP Surface — `127.0.0.1:7777`

6. Voice — TTS · STT · clone

7. Runtime Internals — turn dispatcher & workers

8. Config & Secrets

Hot reload via SIGHUP

Expected keys

9. Iron Rules — the invariants that never bend

10. CI & Packaging

11. Release Timeline

EchoVessel Internal Architecture Deep Dive

1. Module Layers — the layered architecture

2. Memory — the six-layer store

3. Message Flow — one turn end-to-end

4. Cross-Channel SSE — Web as god-view

Mirrored events

Query params

5. HTTP Surface — 127.0.0.1:7777

6. Voice — TTS · STT · clone

7. Runtime Internals — turn dispatcher & workers

8. Config & Secrets

Hot reload via SIGHUP

Expected keys

9. Iron Rules — the invariants that never bend

10. CI & Packaging

11. Release Timeline

5. HTTP Surface — `127.0.0.1:7777`