For parents · 6 min read · 2026-05-20

What “mastery” means at home (and what it doesn't).

“Mastery” is one of the more loaded words in elementary math. Most of the time, the thing being labeled isn't mastery — it's accuracy, recall, or speed wearing a mastery costume. Here's what the word actually points at, and the four things parents most often mistake for it.

The short version.

The cleanest working definition we know is the National Research Council's Adding It Up(Kilpatrick, Swafford & Findell, 2001): mathematical proficiency as a braid of five strands — conceptual understanding, procedural fluency, strategic competence, adaptive reasoning, and productive disposition. Mastery lives in the overlap, not in any single strand. A kitchen-table version, three properties:

Transfer.The child can solve a problem they haven't seen before in this skill area — same idea, different surface.
Justification.They can explain why their answer is right, not just what it is. “Because that's how you do it” doesn't count.
Durability. They can come back to it cold two to four weeks later and still do it — not perfectly, but recognizably.

That triple maps onto two strands of the research. Transfer and justification are the heart of what Hatano & Inagaki (1986) call adaptive expertise— the kid can do the procedure, knows what it's for, and can flex it. Compare with routine expertise— fast and accurate inside the trained box, fragile the moment the box changes shape — which is what a lot of well-meaning practice produces. Durability is the spacing-and-retrieval literature (Cepeda et al. 2006, Dunlosky et al. 2013), not Hatano & Inagaki. Stitched together, the three are a useful working definition for what mastery looks like at home.

Four things that look like mastery but aren't.

Most confusion comes from mistaking mastery for one of its neighbors. Each is real and useful — but none is the thing.

1. 100% on this week's test.That's accuracy, measured on a sample the child just studied for. The same child, given the same skill in a word problem on a different topic in October, may or may not recognize it. A perfect score tells you the child cleared the assessment; it doesn't tell you the skill is portable.

2. “I memorized 7 × 8 = 56.”That's recall. Recall matters — the Common Core State Standards (CCSS-M, 2010) ask 3rd-graders to know from memoryall products of two single-digit numbers. But a child who can recite 7 × 8 and can't tell you it's 7 groups of 8 (or 8 groups of 7) has the fact without the structure around it. It stays useful only as long as the question stays in the shape of a flashcard.

3. “I can do it fast.”That's fluency — accuracy, efficiency, and appropriate strategy together in the Common Core sense (we wrote about what to expect at age 9). Fluency is about how smoothly the skill executes inside its known context. Mastery is about whether it survives when the context changes. A child can be fluent without being a master, and a master without yet being fluent (slow, careful, correct, transfers well — the typical shape of a thoughtful new learner).

4. “I never get it wrong.”That's a brittle representation. A child who never misses has often just never been pushed past the edges of where they were taught. Real mastery has been tested at its edges — the problem where the numbers don't come out clean, the word problem where the language doesn't cue the operation. The kid who sometimes misses, debugs the miss, and recovers is usually further along than the kid with the spotless worksheet.

Why durability is the part most adults underestimate.

Cepeda and colleagues' (2006) meta-analysis: massed practice and distributed practice (same minutes, spread across days) can look similar on an immediate test, and diverge sharply on a delayed one — the difference shows up days later, not on the same-day quiz. Dunlosky and colleagues (2013) ranked distributed practice and practice testing as the two highest-utility study techniques precisely because their benefits show up on the delayed measurement — the measurement that matters for whether a skill is load-bearing for next year. The implication is uncomfortable: the moment that feels like mastery — a kid powering through a worksheet after a long drill — has the weakest predictive value. The moment that matters is two weeks later (more on this in how a 10-minute review beats a 30-minute drill).

How to actually check at home (without making it a thing).

Three small probes, used quietly, beat any single test score. These aren't screening tools — they're informal checks. Persistent gaps are reasons to talk to your child's teacher or a learning specialist, not a verdict.

The unfamiliar surface probe. Take a skill the school says is mastered and find a problem your child has demonstrably notseen — same operation, different wrapper. If fractions of a whole are mastered, ask “if a pizza is cut into six pieces and we eat four, what fraction is left?” rather than 6/6 − 4/6. Worksheet but not kitchen version means accuracy, not transfer.

The “why” probe.After a correct answer, ask “how would you explain that to a kid who didn't get it?” This works because it's a question about teaching, not being quizzed. Kids' ability to verbalize lags their ability to do, so watch this across several attempts, not one.

The cold two-week probe.Wait. Move on. Two to four weeks later, drop a single problem from the “mastered” skill into a mixed page — no warm-up, no warning. Some forgetting is normal. A complete reset isn't necessarily a setback — it's a signal that the skill needs more rehearsal in different contexts. The measurement is of the rehearsal, not the child.

How Koda thinks about mastery (and what we don't claim).

Koda has an explicit mastery layer with named thresholds. From src/koda/intelligence/mastery.py: 70% in-session first-try success is the practice floor (Learning → Practice), 85% is exam-ready, and 65% is a defined relapsethreshold. These are mastery-readiness thresholds — they determine whether a skill is ready to be assessed, not how hard today's problems should be. A separate, session-level difficulty dial in mastery.py:SUCCESS_RATE_THRESHOLDS adjusts problem difficulty mid-session: a rolling rate below 70% nudges things easier, above 90% nudges them harder. The two systems are distinct — the 85% exam-readiness threshold and the 90% difficulty-nudge live in different parts of the code and measure different things. (For more on the difficulty-dial thresholds, see the when-not-to-interrupt note.) The thresholds themselves are mastery-learning-inspired product heuristics — Rosenshine's (2012) evidence supports the general ~80% high-success-rate band, not these specific numbers. They're recommendations, not gates. Under the hood, the session keeps a rolling buffer of recent first-try outcomes: the last ten (DEFAULT_WINDOW = 10) feed the mid-session difficulty dial, while the practice / exam-ready thresholds read overall first-try accuracy across the session (at least eight attempts), not an all-time per-skill number. Honest caveat: the 65% relapse threshold is a constant in the code, but the cross-session pathway that would consume it is defined-but-not-yet-wired. And none of these numbers, on their own, are mastery — they're a coarse in-product signal over problems the child has done with Koda. The dashboard is an input; the cold two-week probe is the test.

The honest counter-arguments.

“You're raising the bar so high nobody passes it.” Defining mastery as transfer + justification + durability risks declaring no 9-year-old to have mastered anything, which is unhelpful. The pragmatic move at home is to treat the three as a direction, not a checkpoint — the binary framing is for report cards; at the kitchen table you're tracking a slope.

“The school's definition is the one that counts.”Operationally, yes. But the school's mastery label is, in most districts, “the child cleared this assessment under these conditions on this date” — the accuracy definition above. It's a narrower question, not a wrong one. A child can be reported as having mastered a skill at school and still benefit from the cold two-week probe at home.

“Memorization comes first; structure comes from the facts.”A serious line in cognitive-load theory (Kirschner, Sweller & Clark, 2006) argues that retrieval has to be automatic beforestructure becomes teachable — derivation burns the working memory you're trying to free, and minimally-guided discovery underperforms direct instruction in controlled comparisons. We think this is more right than wrong for the single-digit times table; less right for conceptual skills like fractions and place value, where the structure holds the procedure up. The design problem is sequencing, not picking a side.

One last thing.

This vocabulary matters because the word shows up on every progress report and on the marketing page of every tool you're going to be sold this year — and it does not mean the same thing in each place. When the school says “mastered,” ask quietly what was being measured. When a vendor says it, ask whether they tested transfer or only accuracy on a familiar surface. When your child says “I get this now,” cheer the moment, and set a quiet reminder for two weeks from now. The thing that comes back two weeks later is the thing they actually have.

If you'd like to know when Koda ships, the waitlist is here. Related notes: math fluency at age 9, how a 10-minute review beats a 30-minute drill, and why we reward effort, not just correct answers.