Production notes
How this exhibit was made
Response Option Fit Lab is a static interactive lab about a narrow problem: how survey answer choices can fail before analysis begins. The home page is a single SQLBolt-style practice page — twelve compact, hands-on exercises, each isolating one recurring way response options distort what gets measured. You tinker, build, classify, and diagnose against a fixed authored cast of respondents, then read the consequence; the active loop keeps the task, controls, and primary consequence together while longer guidance opens as shelves. Wrong moves are part of the practice. A closing knowledge map organizes the failures under four house lenses — SLOT, RULER, PUSH, and BOUNDARY — and names honestly what the lab does not cover. Each solved exercise opens a source drawer with the real field terms, an evidence-strength label, and named public sources. Every question is drawn from one coffee shop's surveys, so the domain stays fixed while the design problem changes from exercise to exercise; finishing all twelve unlocks a self-issued completion keepsake, not a credential. It is not a guide to validated replacement wording, a model of respondent distributions, or a substitute for cognitive testing, expert review, or field evidence.
An earlier version organized the same material differently: a guided "walk" through twelve source-anchored specimens, opening from an overview hub, alongside a build-and-break exercise, a reviewer field guide, and a reference shelf. Those routes were retired on 2026-05-31; they remain in the project's source history, and their old URL hashes now resolve to the lab so older links still land on the current home. The decisions log below records how that architecture evolved into the current lab.
For people who write survey questions and answer choices and need a sharper prefielding check.
For researchers, editors, and product teams who inherit instruments and need language for diagnosing option-level problems.
For students and instructors who want interactive examples in response process, not a general survey-methods textbook.
A large share of survey criticism arrives too late. By the time a draft is challenged, the conversation is often already about estimates, models, or dashboards, even though many failures begin earlier, when a person has to fit a real situation into answer choices that are ambiguous, overbroad, overlapping, or more precise than the form can justify. Public cognitive-testing work repeatedly shows that apparently small category and wording decisions change how people interpret what is being asked. This exhibit exists to make that pre-analytic layer visible enough to inspect, discuss, and revise before the error hardens into data.
Response Option Fit Lab was written, edited, verified, and published by Ben Lei.
The active lab is author-first. Its scenarios and the small casts inside them are authored teaching material, labeled as illustrative — a fixed set to reason about, not observed respondent behavior or population estimates. Any count shown in an exercise is a count over that exercise’s named fictional cast, never a survey statistic.
Where the lab states a design principle, the solved exercise opens a source drawer that names standard survey-methodology references — among them Pew Research Center, Krosnick and Presser, Schuman and Presser, AAPOR, and the CDC/NCHS Q-Bank — and carries an honest evidence-strength label, so a textbook-consensus claim is not dressed the same as a directionally-supported or contested one. The references support the design principle behind an exercise, not the specific authored cast; the lab never presents invented numbers as empirical findings, and never presents a repair as validated wording.
Counterexamples and “looks fine” decoys are chosen for scope control, not for victory: their job is to show a nearby form that does not obviously trigger the same diagnosis, so the four lenses never become a hammer that hits every question. An earlier, Census-anchored version of this project — a guided walk through cognitive-testing specimens for instruments like the ACS, AHS, CPS, and NTIA Internet Use Survey — was retired on 2026-05-31 and remains in the git history; those public reports are historical provenance for that retired version, not the source model for the current lab. Nothing here implies endorsement by the U.S. Census Bureau or any other agency.
This exhibit uses a small, declared set of learning ideas rather than a generic "science of learning" halo. Each puzzle asks for a small decision before an explanation because prequestions and pretests can increase later learning when learners then study the answer; the pattern is promising, but the size and generality of the benefit vary by procedure and assessment. Steven Pan and Shana Carpenter are the main grounding here. Their 2023 review supports pre-instruction questioning as a useful design move. Later synthesis also makes clear that the benefits are often stronger for the prompted material than for broad untargeted transfer.
The lab revisits the same small vocabulary — the SLOT / RULER / PUSH / BOUNDARY lenses and the field terms inside them — across twelve different exercises, because interleaving helps people learn discriminations between confusable categories better than long blocked runs often do. That is the reason to meet the same label set in varied cases instead of exhausting one pattern before moving to the next. The relevant grounding is Matthias Brunmair and Tobias Richter's meta-analysis: encouraging overall, but clearly moderated by the type of material.
The loop is segmented because dense multimedia can overload attention. Each exercise gives the visitor a bounded check or decision, visible feedback, and a short reveal rather than a wall of prose, and the closing ship-review exercise asks the visitor to apply the whole set in context. That follows the segmenting and pre-training principles associated with Richard Mayer and colleagues.
The exhibit also accepts a mild amount of productive friction. Prediction usually feels worse than rereading, and learners often misread fluent performance as durable learning. That is the useful warning from Nicholas Soderstrom and Robert Bjork, and it is one reason the puzzles do not skip straight to diagnosis.
For learning category-like distinctions, adjacent comparison matters. That is why the exercises stay short and contrastive. The most direct support is Nate Kornell and Robert Bjork's work on spaced and interleaved induction. The retrieval rationale for asking learners to commit before the reveal comes from Henry Roediger and Jeffrey Karpicke.
None of this should be oversold. Prequestioning research still contains live debates about why benefits happen and about how far benefits generalize beyond the specific prompted material, and retrieval-based transfer gains appear stronger under some conditions than others. A good colophon should say that plainly.
Accessibility is treated here as release work, not as a certification claim. The page is built to common WCAG 2.2 AA expectations, but coverage is bounded by what is actually tested rather than asserted in the abstract.
Tested in CI on every build:
forced-colors;Known gaps and pending work. Manual screen-reader walkthroughs in NVDA, JAWS, and VoiceOver are still pending; the automated checks above approximate but do not replace them. Manual zoom audits with real browser magnification at 200% and 400% are pending, though automated reflow at a 320 px viewport passes. The automated Playwright suite runs in Chromium, Firefox, and WebKit (Safari's engine). The exhibit is not yet certified against WCAG 2.2 AA; what is published here is the test surface a build cannot ship without.
Accessibility issues are taken seriously. Report bugs at contact@benlei.org or open an issue at github.com/leibenjamin/response-option-fit/issues, and please include browser, device, assistive technology, and reproduction steps.
This exhibit runs entirely in your browser. With the Remember toggle on — its default — it keeps your progress (the exercises you have finished and your completion count) on this device so they resume across visits; turn it off in Settings and your progress is cleared and kept in memory only, lost on reload — the only thing then stored is your choice not to remember, so a reload will not quietly resume saving. That on-device data is yours: the Settings drawer shows exactly what is stored and lets you export, import, or clear it, and the lab does not upload anything you do (there is no freeform text input, and nothing is sent to a server-side analyzer). The posture on measurement is privacy-respecting rather than absolutist: there is no ad tech, no cookies for cross-site tracking, and no selling of data. Page-traffic analytics use Cloudflare Web Analytics — cookieless and aggregate, counting page views without personal data or cross-site tracking, the kind a privacy-conscious visitor would not mind. It is the one third-party request the site makes on load: a small beacon fetched from Cloudflare.
The performance budget is a spacious release-review guardrail, not a hard law: initial JavaScript at or below 200 KB gzip, lazy route chunks at or below 75 KB gzip each, and CSS at or below 90 KB gzip. Current release: initial JS about 52 KB gzip, the lab route chunk about 61 KB gzip, CSS about 17 KB gzip, measured 2026-06-09. All runtime interaction stays client-side; lazy route chunks may load from the same origin when opened, and the only third-party runtime request is the cookieless Cloudflare Web Analytics beacon noted above. Useful teaching content should be measured and route-split where appropriate, not compressed out merely to satisfy an arbitrary old byte ceiling. If motion is reduced, the exhibit keeps it reduced. If forced-colors is active, the user's contrast choices win.
A chronological record of past decisions, kept as history rather than edited away. Several early entries describe architecture the project has since moved past — the current site is the single twelve-exercise lab plus this colophon, as the last entry records. Read top to bottom to see how it got there.
Context: the exhibit needs a taxonomy small enough to learn and large enough to feel real. Decision: keep the scope to six recurrent option-level failures that can be named quickly and contrasted across puzzles. Consequences: the page gets a reusable vocabulary and stronger transfer; it also excludes many real survey problems that live elsewhere, including sampling, question flow, mode, translation, and estimation issues. Alternatives considered: a broader survey-defects atlas, or a looser scrapbook of examples without named categories. The chosen approach is narrower, but teachably narrow.
Context: once a taxonomy exists, there is pressure to turn it into a checklist or score. Decision: keep the diagnoses hand-authored and static. Consequences: every diagnosis remains inspectable, arguable, and tied to a source, but the exhibit scales more slowly and cannot pretend to automate review. Alternatives considered: heuristic scoring, automated diagnostics, or interactive rubrics. Those options would add speed and false authority at the same time. This exhibit should prefer explicit judgment over synthetic precision.
Context: the exhibit wants a transparent local settings surface without collecting accounts or behavioral data. Decision: save the settings opt-in state and walk visit history only after explicit consent through the Remember toggle. Consequences: the default user gets in-memory progress only; opted-in users get a compact map drawer that fills in across visits; export/import remains possible for stored local data. Alternatives considered: automatic local saving, cloud sync, account-based persistence, and analytics-backed progress tracking. Those would buy convenience at the cost of a very different data story.
Context: each failure pattern benefits from a slightly different review task, but the app also needs consistent navigation, source boundaries, and accessibility behavior. Decision: render every walk route as an authored interactive puzzle while sharing primitives for framing, progress, reveals, source details, and route state. Consequences: each page starts with a concrete user action and visible consequence, while the source posture and test surface remain consistent. Alternatives considered: a generic workspace renderer, mostly expository examples, one custom page per puzzle with no shared primitives, or a survey-analyzer style freeform tool.
Context: early taxonomies often drift toward every familiar survey pathology, whether or not it belongs at the response-option layer. Decision: remove scale anchoring from the core six and replace it with forced precision. Consequences: the taxonomy gets sharper about response-option fit and less diffuse about general scale design. Alternatives considered: keeping both, merging them, or leaving scale anchoring in as a "classic." Forced precision earns the slot because it is easier to show in worked option sets and closer to the exhibit's central claim about failure before analysis.
Context: rendering twelve puzzle routes plus a four-section reference shelf as one continuous scroll produced a page roughly thirty viewport heights tall on landing, which buried the editorial richness behind a long flat list. Decision: split into hash-routed views. The overview keeps a tightened hero, the six-pattern knowledge map, and one fully working embedded commute puzzle as the share-target. The walk paginates the twelve puzzles one at a time, with compact progress, a knowledge-map drawer open by default, and recap interstitials at thresholds four and eight. The field guide carries a reviewer console, portable tests, and prompt templates. The reference shelf moves to its own route. Consequences: the casual visitor grasps the gist in a few screens without forced clicking; the long-form reader gets a mode that paces twelve puzzles without any one page becoming overwhelming; the appendix material stops adding to every casual visit. Alternatives considered: a sticky table of contents on a still-flat scroll, beat-level pagination inside one workbench, a hub-only model with drill-in detail pages, and a tabs-by-pattern structure. The hybrid was chosen because it preserved the embedded "show, do not tell" surface while keeping the canonical interleaved order recommended by the cited learning literature.
Context: the puzzles were useful but not yet portable enough for someone reviewing their own questionnaire. Decision: add a static field-guide route with a reviewer console, reusable tests, pattern-specific checklists, and copyable prompts for visitors to use in their own survey-review workflows. Consequences: the exhibit now teaches both recognition and application while preserving its privacy boundary: no freeform survey input, no automated rewriting, no scoring, and no validation claim. Alternatives considered: an in-browser analyzer, an automated rewrite box, downloadable worksheets only, or more examples without a guide. The static guide was chosen because it helps analysts, product teams, and reviewers apply the framework without giving the page false authority.
Context: by mid-2026 the project had two parallel things — the multi-route exhibit (overview hub, paginated walk, build-and-break, field guide, reference shelf) built around an earlier six-pattern model, and the newer single-page lab built around the SLOT / RULER / PUSH / BOUNDARY lenses. The exhibit had become roughly half the codebase, was unlinked from the front door, and taught a taxonomy the lab had moved past, which made the production notes hard to keep coherent. Decision: retire the routed exhibit and keep the focused twelve-exercise lab plus this colophon. Consequences: the site is smaller and reads as one coherent thing; the initial JavaScript and CSS shrank; the older routes and their source-anchored Census specimens remain in the project's git history (and an external backup) rather than shipping unused, and their old URL hashes now resolve to the lab. Alternatives considered: keep everything archived-but-reachable (the prior state — maintenance drag and a divergent second model), or delete the old exhibit entirely with no preserved copy (loses the build-and-break export-trap and the verbatim-quote layer as references for future work). Retiring while preserving history was chosen as the balance between a clean public surface and not destroying genuinely good earlier material.
This exhibit does not:
The label 'forced_precision' is editorially synthesized; the underlying wording and memory problems are documented separately in QAS-99 §4b and Bradburn, Rips & Shevell (1987).
Body copy is set in Iowan Old Style with Charter, Source Serif Pro, and Georgia as fallbacks. Eyebrows, receipts, and utility text use JetBrains Mono with IBM Plex Mono and the platform monospace as fallbacks. Navigation, controls, and short labels use Inter with the platform sans-serif as a fallback. All faces are loaded from the user's system; no web fonts are fetched at runtime.
Built as a static React 18 + TypeScript site with Vite and tested with Playwright in Chromium, Firefox, and WebKit. No backend services. The one third-party runtime origin is Cloudflare Web Analytics — a cookieless, aggregate page-view beacon.
Repository: github.com/leibenjamin/response-option-fit
License: MIT for source code; CC BY 4.0 for exhibit text, diagrams, and authored scenarios; quoted source material remains under its original terms. U.S. Census Bureau working papers are public-domain U.S. Government works.