On the architecture · 9 min read · 2026-05-01

Why we run on-device instead of in the cloud.

The local-only architecture, the trade-offs we accept, and the thing we will never sell.

The decision, in one paragraph.

Koda is an on-device AI tutor for kids — a Pre-K-8, multi-subject and multi-skill tutor, launching with math for grades 2 to 5. It uses cameras: one looking down at the paper worksheet, one looking at the child. It has to recognize handwriting, recognize a face, and hold a real-time tutoring conversation. We could have built that in the cloud. Most things are built in the cloud. We built ours so that every frame the camera sees, every word your child says, every problem they work on, stays on the device in your home. Nothing leaves your house. We made that decision for a specific reason, and we'd like to walk through what it actually means.

What "on-device" actually means.

Modern AI products usually work like this: a small device captures something — your voice, your face, what you typed — and sends it to a server somewhere. The server does the hard work and sends back a result. Most of the time the captured data stays on the server too, sometimes for "quality improvement," sometimes for "model training," sometimes just because nobody bothered to delete it.

Koda doesn't do that. The device in your home runs all of it:

The vision model that reads your child's handwriting from the overhead camera. Frames are processed and discarded.
The face-matching model that recognizes which kid in the family is sitting down. Face vectors are encrypted in the local database; the original enrollment photos are stored as image files in the local profiles directory on the device (not in the cloud, not synced anywhere).
The language model that decides what to say next, picks the next teaching move, and writes the hint copy. Local, every session.
The voice that speaks. Local, deterministic; no recordings of your child's voice sent anywhere.
The math verifier that checks every numerical answer deterministically — no statistical guessing, no LLM voting on whether the math is right.

All of it on the same device on your counter. The cloud is not in the loop. The only network connection Koda needs after setup is the software updater, and you control when that runs.

A note on scope: this list is about where the computation happens — on the device, never the cloud. Some of these pieces, like handwriting-read answers and spoken audio, are still being wired into the live session; the build today uses typed input and on-screen text hints (see the first 15 minutes).

The risk model: biometric data about minors.

We're going to be specific about why this matters. Koda's cameras see a child's face. The face camera captures a face embedding — a numerical fingerprint of that face. That embedding is biometric data, and it's biometric data about a minor.

In the standard cloud architecture, that embedding would live on a server. It would be backed up. It would be replicated to a disaster-recovery region. It would be available to any engineer with the right credentials, in a database table somewhere, forever, until somebody got around to writing the deletion job. It would be a target — for a competitor's recruiter, for a state actor, for a casual data breach. And in 5 years, when the kid is 15, that embedding would still match their face.

We don't think any of that should be true for a child's face. So we built the architecture so it can't be. The face vector lives only on the device in your home. There is no copy on a server because there is no server. When you delete a profile, every row tied to that profile is removed in a single local cascade. There is no other place to chase it down.

What you give up by going local.

We're going to be honest about the trade-offs.

It costs more upfront. A real device that runs models locally is a real piece of hardware and it ships in a box. A cloud-only product can be a $5/month subscription. Ours can't.

The model on your machine doesn't auto-improve every week. Cloud-only AI products can swap in a smarter model overnight, often without telling you. Ours can't change without an update you've opted into. We think that's actually correct — your child shouldn't get a different tutor on Tuesday than on Monday without anyone telling you — but it does mean we ship updates the way real software used to: in tested, versioned releases.

It's a real object on the counter. A pure-cloud product is invisible. Ours isn't. We picked a hardware path because it's the smallest unit that can run the models Koda needs in real time. We may shrink the form factor over time; today it's a real box.

What you don't give up.

Tutoring quality. The model on the device is sized for the kind of step-by-step elementary and middle-school tutoring Koda does; bigger models don't help and would only make the box hotter. The math itself is checked by a deterministic verifier, which is exact in a way no language model is. So when your child gets the arithmetic right, Koda knows it for certain — the language model doesn't get to vote.

Latency. Local is fast. The hint your child sees comes back in tens of milliseconds, not the half-second you get from a cloud round-trip. That matters when a child is mid-thought.

Curriculum freshness. The math curriculum doesn't change weekly. The explainer videos and lesson content ship pre-rendered with the device; new content arrives in software updates.

What we do — and don't — train on.

We don't use your child's work to train anything. Not the handwriting samples, not the audio, not the face embeddings, not the chat transcripts, not the worksheet photos. We don't aggregate them. We don't share them with researchers. We don't even have a place for them to go — they live on the device in your home and only there.

What we do train on, when we improve the tutor: synthetic problem sets we generate, public curriculum standards, open-source educational corpora, and curriculum content we author ourselves or license from contributors who've consented to that use. That's a smaller training corpus than a cloud-only product can build. It's the trade-off we accept for not asking your kid to be training data.

Things we won't do.

We're going to put a few things in writing because future-us will be tempted, and we want present-us to make it harder.

We won't run telemetry on session content. The events Koda records (a problem attempted, an XP delta, a hint given) stay on the device.
We won't insert ads. Not in the avatar, not in the explainers, not in the parent emails.
We won't sell the face embeddings. Not anonymized, not aggregated, not "for research." There is no version of "selling minors' biometric data" that we can defend.
We won't use your child's work to train models that other families will benefit from. The model we ship is the model we shipped; future versions train on data we sourced ourselves.
We won't ship a "cloud sync" toggle as a casual default. If we ever offer cloud features, they'll require a parent to turn them on, after a 30-day notice in the parent portal — and the local-only version will keep working.

If we change our minds on any of these, we'll tell you in a post like this one before we ship the change.

What this lets us do.

Single-cascade delete. "Forget my child" is one operation in the local database, and it's done. We can't lose track of a copy because there is no copy.

No data breach to disclose. The thing breaches happen to is a server. We don't have one. (Technically: we run a small infrastructure for waitlist signup and software updates. Neither of those holds your child's data.)

An honest privacy page. A lot of edtech privacy pages have to use careful language — "we do not sell your data" while leaving "share" undefined; "we may use your data to improve our services" without specifying which services. Ours doesn't have to do that, because the data isn't there to share or use. The page just says where things are. Read it.

A note on this post being marketing.

Yes, this post is also marketing. It would be slightly dishonest to pretend otherwise. Building a product this way costs us — in unit economics, in iteration speed, in the addressable market of parents who would have happily paid $9.99 a month and not asked. We picked the architecture first because we couldn't make ourselves comfortable with the cloud version when we sat down to draw it. The marketing came after.

If you want to know when Koda ships, the waitlist is here. The companion note on why we use a deterministic verifier instead of asking the language model to grade the math is now published: we don't trust the LLM to grade arithmetic.