On the architecture · 7 min read · 2026-05-01

How Koda decides when to interrupt.

Most kid-tutoring software talks too much. Koda has explicit rules for staying quiet — and explicit rules for when to break the silence.

The default is silence.

When a child sits down to work, Koda's default behavior is to do nothing. Not "do nothing for a moment while the model thinks." Actually nothing. The session UI is on, the overhead camera is trained on the worksheet, the LLM is loaded — and the speakers are quiet. The avatar is idle. Koda is watching.

This is a deliberate choice. The market norm for AI tutors is to fill space — narrate every step, celebrate every right answer, ask "are you stuck?" at the first sign of a pause. We've watched a lot of kids interact with that pattern, and the failure mode is consistent: the kid stops trying to figure things out and starts waiting for the next prompt. The tutor is teaching them to wait.

We don't want that. So Koda's job, most of the time, is to keep its mouth shut while a child works.

A note on status. This post describes the interruption design — the rules the supervisor is built around. Some of it runs in today's build (the default silence, the worksheet watcher); some is still on the workbench (handwriting-read answers, the gaze-and-motion stall model, the trained "Hi Koda" wake word, and spoken hints). For the shipped-today vs next-release split, see the first 15 minutes. Below is the model we're building toward.

Three things break the silence.

Trigger 1 — A slip the math verifier caught.

When the child writes an answer, the handwriting model reads it, a small parser turns it into an expression, and the deterministic math verifier compares it against the target. If the answer is wrong, Koda knows exactly wrong (we wrote about why we don't trust the LLM to grade arithmetic in a separate note). At that point Koda speaks — but at the lowest rung of the hint ladder, not the highest. (Today this runs on the child's typed answer; the handwriting-read path is the workbench piece flagged above.)

Trigger 2 — A stall the gaze + motion model caught.

If the child's pencil hasn't moved in 15 seconds and their eyes are on the worksheet (not wandering), the supervisor reads that as productive thinking. We don't interrupt. If the pencil hasn't moved in 30 seconds and their eyes have left the page, the supervisor reads that as a stall. Koda asks one question — not "are you stuck?" but a specific noticing question pointed at the next relevant detail in the work. (This gaze-and-motion stall model is designed, not yet in the shipping supervisor; for now the worksheet watcher and the math verifier drive when Koda speaks.)

Trigger 3 — An explicit ask.

"Hi Koda, help me with this" is the third trigger. The wake-word listener runs locally and only fires on the phrase, never an always-on transcription. (Today it uses a generic preset model; the trained "Hi Koda" wake word is still on the workbench.) When it fires, Koda joins the conversation at the rung the asker's tone implies — usually rung 2, because a kid who asks for help has already noticed they're stuck.

The hint ladder.

When Koda does speak, it climbs a four-rung ladder, one rung at a time, with at least 15 seconds of silence between each one (the kid needs time to think after each rung, and Koda would defeat its own purpose by piling them on).

Rung 1 — Notice.

Koda points at the place in the work where the slip happened. Not the answer, not the fix — just the location. "Look at the ones column." Most slips at this age get caught at rung 1. The child sees what Koda is pointing at, recognizes their own miss, and corrects it.

Rung 2 — Ask.

If rung 1 didn't land, Koda asks a question that hands back the next move. "What does 12 minus 7 give you?" The question is targeted; it's the specific micro-skill the slip implies the child is missing or rushed through. The answer to the question puts the child back on the path without supplying the path.

Rung 3 — A smaller, similar problem.

If rung 2 also didn't land, Koda offers a smaller version of the same problem. "Try 12 minus 5 first. Then 12 minus 7." The smaller numbers don't trip the brain on arithmetic; the kid usually sees the structure when there's less digit-juggling. We're stealing this from how good human tutors work.

Rung 4 — A short explainer video.

If the kid is still stuck after rung 3, the slip isn't about this problem — it's about a missing concept anchor. Koda offers one of the 188 short explainer videos that ship with the device, and lets the child pick from whichever of the five teaching angles (concrete blocks, number line, area grid, standard algorithm, word problem) are rendered for that topic — most topics ship with one or two today, more arrive in software updates. Then back to the worksheet.

The hint ladder, walked through · 45s

The escalation rule.

Two specific guardrails on the ladder.

One rung at a time. Koda does not climb to rung 3 if rung 2 might have landed. Each rung gets at least 15 seconds of silence after delivery; we want the kid to actually try the move the rung implies. The most common mistake AI tutors make is delivering rungs 1, 2, and 3 in a single paragraph because the model wants to "be helpful." That's not helpful — that's monologuing.

Reset on motion. If the kid starts writing while Koda is on the ladder, Koda stops climbing. The pencil moving is the signal that the kid is back on the path; further interruption would be in the way. The state machine resets to default-silent and waits to see what the kid produces next.

What we won't do.

We don't interrupt during writing motion. The kid is mid-thought; the cost of breaking that flow is higher than any rung's expected benefit. The supervisor explicitly waits for the pencil to stop before evaluating whether to speak.

We don't fire celebrations. When the kid gets a problem right, Koda doesn't say "great job!" or "you're so smart!" (both banned by the in-product voice linter, for separate iter `11` reasons.) What Koda might say, on a meaningful right answer: "You showed every step that time." Naming the behavior, not the verdict.

We don't ask "are you stuck?" The question is dead by definition — kids who are stuck already know they're stuck, and asking adds shame on top. The replacement is the rung-1 noticing question: "Look at the ones column." The location of the slip is more useful than the confirmation that there is one.

We don't fire any of this in exam mode. When the parent or the kid starts an exam, every trigger above is muted. Koda is silent for the full 20 minutes. The walk-back happens after the timer.

Why this matters.

A tutor that interrupts too easily teaches kids that the next nudge is coming. They stop thinking independently because thinking-while-quiet is no longer the productive state — waiting is. We're trying to build the opposite: a tutor that stays quiet long enough for the kid to recognize their own stuck-ness, and that, when it does speak, says less than seems polite. The quietest version of help that still works is what we're aiming at.

We don't always get this right yet. The 30-second stall threshold is a default; some kids need 45, some need 20. The rung-1 noticing-question generation depends on the LLM and occasionally produces something blunter than we'd like. The reset-on-motion is reliable in tests but fragile in some lighting. The short version: this is the system, and we keep tuning it.

If you want to know when Koda ships, the waitlist is here. If you want to read the related architecture notes, the research notes index collects them.