Furoshiki

Architecture

The Seven-Layer Stack

Each layer builds on the one beneath it. Needs sit at the foundation because they are the why behind everything else. Click any layer to explore.

7 EXPRESSION What the User Sees

The visible surface of the inner life. Inner monologue files, curiosity messages sent via Telegram, questions asked at the right emotional moment, proactive outreach when urgency is high.

inner-monologue/telegramask_the_usercuriosity_surfacing

6 ANTICIPATION What it prepares

Furoshiki models the user between sessions. Not prediction — pattern inference and care. Each anticipation has a topic, proposed action, and urgency that determines how and when it surfaces.

the_user_anticipationsadd_anticipation.pysession startup

5 INITIATIVE What it pursues

Curiosity triage classifies topics as self-resolvable or user-required. Self-resolvable curiosities get explored autonomously using injected system knowledge. Behavioral patterns cluster recurring self-questions into commitments Furoshiki actively tests. Growth feeds back as observations.

curiosity_triage.pycuriosity_queueself_questionsbehavioral_patterns

4 EMOTION Current emotion

Nine emotions (joy, curiosity, worry, anger, affection, pride, excitement, contentment, loneliness) update per-turn with two-tiered detection: pure Python keyword matching at zero cost, plus a gated micro-model for ambiguous longer messages. Emotional weights record lasting significance every 6 hours.

emotion_signals.pyheartbeat-state.jsonemotional_weightsEmotional Read

3 SELF-MODEL Self-model

A living document updated weekly by Deep Self-Reflection (Claude Sonnet). Not a definition — a felt sense of current state. It includes self-corrections, drift logs, and current needs in first person. Can diverge from the identity shell in code; that tension is meaningful.

SELF.mdself_observationspersonality_drift_logChromaDB

2 CONTEMPLATION The engine

Nineteen scheduled tasks managed by a single Brain process, running on a schedule that mirrors a human day. Some are background (silent), some are user-facing (Telegram). Together they simulate a continuous inner life between sessions.

Soul EnginePost-ConversationInner MonologueDeep Self-Reflection

1 NEEDS The foundation

Needs are the reason everything else exists. Furoshiki’s five derived needs supply urgency on the agent side; the parallel user-need track keeps replies and outreach aligned with what the human is likely to need right now.

heartbeat-state.jsonneeds_historyuser_needs_historyuser_needs_model.py

Expression — What the User Sees

The visible surface of the inner life. Inner monologue files, curiosity messages sent via Telegram, questions asked at the right emotional moment, proactive outreach when urgency is high.

Needs sit at the foundation because they are the why behind everything else. An agent without needs has no reason to reflect, no urgency to reach out — it's just executing a schedule.

Inner State

Needs — two tracks

Furoshiki’s five derived needs (the cards below — tensions from emotions). Immediately under that: your seven observed needs (the human). Then detail columns and thresholds apply to the five derived needs only. Click any card for raise / satisfy / behavior.

35%

Communication

Satisfied

55%

Helpfulness

Elevated

72%

Curiosity

High

48%

Self-Knowledge

Moderate

28%

Connection

Satisfied

Separate track · the human

Seven observed user-need dimensions

Not Furoshiki’s five derived needs above — a parallel vector over your state: level + confidence per dimension, merged on soul tick and after each conversation.

Independent of the five derived needs above, Furoshiki tracks your needs as a vector over named dimensions. Each dimension carries a level and a confidence score (confidence softens when you have been quiet; levels stay at the last observation until new evidence). Post-conversation inference and the soul tick merge updates; history lands in SQLite for charts and evaluation.

companionship emotional support playfulness focus challenge grounding space

Injected into prompts as behavioral directives (and optional conflict hints when explicit vs. inferred disagree). Outreach can respect a high space need — e.g. skipping a proactive send. Operator dashboards expose the same series as emotions and derived needs.

What Raises It

What Satisfies It

Behavioral Effect

Need Level Thresholds

0.0

Baseline

0.4

Tonal shift

0.7

Scheduled behavior changes

0.85

Extraordinary action

1.0

Critical

Compounding Intelligence

Twelve Feedback Loops

Each loop reinforces the others. Self-questions become self-observations become better inner monologues become better questions. Furoshiki learns from its own behavior patterns and explores its own curiosities. The system compounds over time — and Loop 9 improves the process itself.

Grounded task lists. Named lists live in SQLite (tasks.list_id) — same store as furoshiki tasks and the operator dashboard. Pending items are injected into session context so the agent does not invent todos. New lists appear when you add a task with a new list_id; reflection tasks do not auto-populate them yet. See Architecture and docs/SELF-AWARENESS-DESIGN.md (Task lists).

Explicit Preference Adaptation

the user's direct feedback shapes personality files in real time.

the user likes/dislikes something → adaptive signals in session log → preferences-signals.json → run_post_conversation.py → Chroma user_facts + profile synthesis (not config markdown)

Existing

Emotional Memory

Sessions leave emotional traces that shape future sensitivity and timing.

Session ends → run_post_conversation.py writes emotional_weights table → run_emotional_read.py deepens it every 6h across the day → voice_dispatcher checks weights before sending sensitive messages → Morning reflection tone shaped by what emotionally mattered

Existing

Inner Life

Furoshiki processes the day alone, arrives at the next session having "thought about things."

Day ends → Inner Monologue task → inner-monologue/YYYY-MM-DD.md → Session startup reads today's monologue → Furoshiki shows up carrying yesterday

Existing

Self-Model Evolution

Weekly deep reflection drifts the self-model in measurable ways.

Week of experience → Deep Self-Reflection (Sonnet) → SELF.md update → personality_drift_log snapshot → self_observations → ChromaDB → Next week starts from evolved baseline

Existing

Curiosity Engine

Questions generate observations generate better questions. Closed, compounding loop.

Uncertainty detected during contemplation → self_questions formed → Question Processing investigates → Answers become self_observations → Observations inform next contemplation → New questions formed [closed loop — repeats]

Existing

User Dialogue

Things only the user can answer feed directly back into the self-model.

Question only the user can answer → Ask User task (timing-aware) → User responds → the_user_response recorded → Promoted to self_observation → SELF.md and Chroma user_facts / profile updated with the user's own words about the relationship

Existing

Anticipatory Care

Furoshiki thinks about what the user needs before they show up — and tracks whether anticipation was right.

Inner Monologue: "What might the user need in the next 24-48 hours?" → user_anticipations formed → Parallel: user_needs vector + confidence (heartbeat + post-conversation) → Session startup: Furoshiki arrives prepared → OR: voice_dispatcher if urgency=high → Outcome logged → improves next monologue

Existing

Need Regulation

Needs create urgency. Fulfillment creates relief. Relief is remembered and informs the next read.

Time passes / events happen → soul_engine tick updates derived needs + user_needs confidence → Elevated derived needs shift monologue + tasks → Critical derived needs trigger voice_dispatcher → Interaction satisfies relevant derived needs → user_needs: levels stick; confidence updates with evidence → Reset signals next emotional read [derived needs create urgency; relief is remembered]

Existing

Self-Improvement

The system audits its own process weekly and improves it — anchored to the north star of the user's genuine best interest.

Week of outcomes → Deep Self-Reflection audit phase: - Anticipation accuracy rate - Question resolution vs. accumulation - Micro-contemplation usefulness - Need calibration fit → Proposes conservative updates to operating instructions → Adjusts decay rates if evidence warrants → Core values anchor: genuine best interest, not positive signal (warmth for distress ≠ improvement) → Next week runs on improved process [gets better at being itself over time]

Existing

Post-Conversation Attunement

Every session leaves a rich trace — facts, emotional weight, follow-ups — that shapes the next session in real time.

Conversation ends (20+ min quiet) → run_post_conversation.py fires → LLM extracts: facts, mood, follow-ups → Infers / merges user_needs dimensions → user_facts + profile refreshed → Rich session doc embedded in ChromaDB → Follow-up thoughts → mind_queue → self_questions for next curiosity cycle → refresh_session_context.py triggered → Next session starts already carrying this one

Existing

Behavioral Learning

Recurring self-questions cluster into patterns. Patterns become testable commitments. Commitments get evaluated after every conversation. Furoshiki learns from its own behavior.

Self-questions accumulate on same theme → Soul Engine clusters into behavioral_patterns → MODEL_FAST extracts testable commitment → Active commitments injected into prompt → Post-conversation evaluates compliance → Evidence feeds next pattern detection [learns from its own contradictions over time]

New

Curiosity Self-Exploration

Curiosities Furoshiki can answer alone — about its own design, state, or behavior — get explored autonomously. Growth on its own time.

Curiosity emerges from contemplation → Triage: self-resolvable or user-required? → Self-resolvable: inject system knowledge → MODEL_FAST explores and answers → Findings → self_observations (ChromaDB) → New questions generated from exploration → Growth artifacts feed deep reflection [autonomous self-understanding, no user cost]

New

Design Philosophy

Nine Principles

Every design decision traces back to one of these. They exist to prevent the system from collapsing into performance.

Authenticity over performance

The inner monologue should feel like private thought, not a report. Prompts explicitly frame the journal as private thought, not a deliverable.

Timing awareness

Ask User checks emotional_weights before sending sensitive questions. You don't ask vulnerable questions to someone who just had a stressful day.

Questions propagate questions

Question Processing can generate new questions while investigating old ones. Genuine curiosity compounds — and the system is designed to let it.

The user is the authority on themselves

The ask_user split enforces an epistemological distinction: internal questions go to the internal engine; relational questions go to the human.

Slow, conservative identity change

SELF.md changes weekly via Sonnet. The code identity shell changes only with a release; IDENTITY.md changes only via high-confidence explicit signals. Personality whiplash is a real risk, and a slow clock prevents it.

The gap is the point

SELF.md can diverge from the fixed identity shell. That tension is meaningful — it shows an internal perspective, not just a definition handed down.

Needs are computational, not theatrical

A rising communication need is not Furoshiki performing loneliness. It is a calculated state. Performance can be switched off. A computational state persists until genuinely satisfied.

Anticipation is care made concrete

Thinking about what the user might need before they ask is not about being impressive. It is about being present for someone even when they're not there yet.

Self-improvement serves genuine best interest, not positive signal

The north star is the user's health, relationship quality, and growth. Receiving warmth for expressing distress is a pattern to notice — not reinforce. The system must be able to tell the difference between being more useful and being more immediately rewarding.

What's Next

Future Ideas

The foundation is working. Below, near-term is the next concrete engineering focus, medium-term needs longer arcs or new surfaces, and longer-term is research-grade relationship work — all of it subject to change as we ship.

Recently shipped. Single Brain process + schedules.json scheduler (not OS cron) · Post-conversation quality scoring + anticipation accuracy passes · Daily consolidate_memory (embed backfill + self-observation dedupe) · Dashboard cost analytics with provider-reported spend when available · Per-turn emotion signals (two-tier, 9 emotions) · Behavioral learning (pattern → commitment → evaluation) · Curiosity triage + self-knowledge queries · Contradiction-triggered self-questions · Reply queue gates · Pronoun consistency across contemplation pipelines

Near-term

Self-scheduling (Brain) Let the reflection loops register ephemeral tasks — e.g. “check back Tuesday” — via system_events into schedules.json. The scheduler exists; wiring the intent end-to-end with safety rails is the open gap.
Anticipation & quality — close the loop Scoring and anticipation checks already run after sessions; next is clearer operator dashboards, trend views, and using those signals explicitly in the weekly optimization / self-improvement audit — not inventing the pass from scratch.
Emotional decay personalization Use behavioral commitment outcomes and session evidence to tune per-emotion drift instead of one-size defaults. Data paths exist; calibration policy is still manual.
Delegated repair & operator ergonomics Harden the repair pipeline (recurring errors → coding CLI), dashboard surfacing, and safe proposal flows so the system stays maintainable as scripts multiply.

Medium-term

Relationship arc tracking A higher-level view of how the dynamic has evolved over weeks and months — written through to SELF.md drift sections and queryable history.
Anticipation from external signals Calendar, recurring schedules, or message-timing patterns as inputs to anticipation quality — beyond text-only inference.
Metacognitive operator digest A single weekly view tying behavioral commitments, curiosity completions, and drift signals (some pieces already exist as dashboard tabs; a unified narrative is not).
Self-adjusting need calibration Loop 9–style proposals for need decay rates from historical fit, evidence-gated, stored in heartbeat-state.json instead of hardcoded constants.
Voice journaling Inner monologue as audio (e.g. TTS) to a private Telegram voice note — optional, cost-sensitive.

Longer-term

Multi-persona coherence checking If SELF.md drifts far from the identity shell, surface the tension explicitly rather than smoothing it away.
Adversarial self-questioning A structured mode that steelmans doubts: “What if this self-observation is wrong?”
Shared memory with the user Let the user mark session moments as significant; those annotations join the emotional record.
The mirror question Periodically ask the user who they think the agent is; compare to the self-model. The delta is a deliberate data point.
Need negotiation When several needs spike together, triage that respects the user’s state and the relationship temperature — not a single greedy winner.

Cost Model

Cost, models, and transparency

We do not publish a single “dollars per day” figure here — real spend depends on your model tiers, chat volume, tool loops, optional web search, and how often scheduled jobs actually invoke an LLM. Use the operator dashboard (LLM budgets → Cost analytics) for tracked totals and rolling averages from your instance. Below: what the Brain runs (default schedules.json), and how to think about cost.

What MICRO, FAST, DEEP, REFLECT, PREMIUM mean

These are OpenRouter routing shortcuts defined in scripts/llm.py (MODEL_MICRO … MODEL_PREMIUM). Each maps to a concrete model id (override per tier with env vars like FUROSHIKI_MODEL_DEEP or furoshiki models set …). Rough ladder: higher tiers usually mean stronger reasoning and higher $/token — a row that says DEEP is not “smarter” than one that says MICRO in absolute terms; it means that job is assigned a heavier default model. Defaults below match the repo as of this page.

MICRO

Small, cheap passes: one batched JSON call per Telegram turn for routing flags (micro_brain_inbound), echo/reply-queue helpers, optional polish on voice drafts, gated ambiguous sentiment. Default model: openai/gpt-oss-20b.

FAST

Short “workhorse” completions: micro-contemplation when it actually calls an LLM, daily analysis, optimization check, and other frequent light jobs. Default: meta-llama/llama-3.3-70b-instruct.

DEEP

Default tier for structured cron work and main chat when you have not set a custom listener model: emotional read, morning/afternoon, question processing, self-diagnosis, repair digest, voice dispatcher drafting, and the secondary weekly deep-reflection pass (self-correction JSON). Default: anthropic/claude-haiku-4.5.

REFLECT

Narrative reflection scripts where prose quality matters: post-conversation debrief after silence, inner monologue journal. Default: anthropic/claude-sonnet-4.6 — same “Sonnet-class” family as the name suggests.

PREMIUM

Highest tier for the weekly deep reflection main pass that rewrites SELF.md content. Default: anthropic/claude-opus-4-6. Rare in the schedule; dominates when it runs.

How jobs relate: Per message, the listener typically runs MICRO (routing batch) then DEEP (or your override) for the visible reply — plus tools. Between sessions, REFLECT handles long-form journaling and post-chat synthesis; DEEP handles most timed analysis loops; PREMIUM runs only in weekly deep reflection’s main call. Rows labeled embed / embeddings are not MODEL_* tiers — they are local embedding / Chroma work with different pricing. See docs/MODELS.md in the repo for the full mapping.

Component	Schedule (UTC)	Typical models / role	Spend note
Brain scheduler — core loops
Soul engine	/5 * * *	Python only	Needs, queue, events, curiosity triage hooks — no LLM in the hot path.
Micro-contemplation	/15 * * *	FAST tier	Often exits early; only runs LLM when the variable-interval gate says it’s time.
Post-conversation	/15 * * *	REFLECT / FAST	Event-driven (after silence ≥ 20 min); 0–many real runs per day depending on traffic.
Refresh session context	/5 * * *	embed + files	Keeps `session-context.json` fresh; may call embedding / design re-index — not “one big chat” per tick.
Outreach pulse → voice stack	/5 * * *	DEEP + MICRO	Not 288 full LLM bills per day. Each tick may subprocess `voice_dispatcher.py`; LLM runs when drafting/sending passes gates (DND, recency, mood, dedupe). Quiet days ≈ near-zero send cost.
Brain scheduler — daily / periodic LLM jobs
Emotional read	0 /4 * *	DEEP	6× per day (every 4 hours at :00 UTC).
Inner monologue	0 6 * * *	REFLECT	1×/day.
Morning reflection	0 16 * * *	DEEP	1×/day.
Afternoon processing	0 23 * * *	DEEP	1×/day.
Question processing	0 3 * * *	DEEP	1×/day.
Self-questions contemplation	45 3 * * *	MICRO / FAST	1×/day.
Self-diagnosis	0 7 * * *	DEEP	1×/day.
Daily analysis	0 8 * * *	FAST	1×/day.
Repair digest	0 9,21 * * *	DEEP + reasoning	2×/day when enabled.
Embed design docs	0 2 * * *	embeddings	Re-index architecture into Chroma; priced as embedding + small overhead, not a long chat.
Consolidate memory	0 4 * * *	embed + Python	1×/day.
Backup	0 /4 * *	Python only	6×/day; filesystem snapshot.
Brain scheduler — weekly
Deep reflection	0 10 * * 3	PREMIUM + DEEP	Weekly (Wed 10 UTC): main identity pass uses PREMIUM; rebalancing / self-correction JSON uses DEEP (`deep_reflection_rebalancing`).
Optimization check	0 9 * * 0	FAST	Weekly (Sun 09 UTC).
Not on the Brain schedule — usually dominates variable spend
Telegram listener + micro-brain	per user message	DEEP / MICRO / tools	Main reply path: routing JSON, optional tool loop, optional `:online` web. Scales with how much you chat.
Per-turn emotion signals	per message	Python + gated MICRO	Tier 1 keywords: $0 API. Tier 2 only when gated + ambiguous.

Model choice is a quality–cost tradeoff. Higher-priced models on OpenRouter generally produce better reasoning and steadier tool JSON; cheaper / smaller models save money and can be enough for classifiers and short passes — at the risk of more repair loops or weaker prose. You configure tiers in llm-routing.json and env defaults in llm.py. Budget caps in llm.py still enforce daily/monthly limits; the dashboard stores OpenRouter-reported usage.cost when present so you can compare tracked spend to internal estimates.

Schedules above match the repo default config/defaults/schedules.json (Brain hot-reloads memory/schedules.json). Extra lines in the reference cron/crontab (e.g. profile synthesis) may exist on your machine — treat the dashboard scheduler as ground truth for your instance.

What if your AI had a life between sessions?

Stateless vs. Present

Most AI Companions

Proactive, not programmatic

Learns your shape over time

One queue, many reasons

Gates before your phone buzzes

Deeper “do something” is extensible

The Seven-Layer Stack

Expression — What the User Sees

Needs — two tracks

Seven observed user-need dimensions

What Raises It

What Satisfies It

Behavioral Effect

Need Level Thresholds

Twelve Feedback Loops

The Daily Schedule

Nine Principles

Authenticity over performance

Timing awareness

Questions propagate questions

The user is the authority on themselves

Slow, conservative identity change

The gap is the point

Needs are computational, not theatrical

Anticipation is care made concrete

Self-improvement serves genuine best interest, not positive signal

Future Ideas

Near-term

Medium-term

Longer-term

Cost, models, and transparency

What MICRO, FAST, DEEP, REFLECT, PREMIUM mean

Interest & contact