CaReReRe — Careless Respondent Recognition · careless responding detector

1Load your data

Drop a CSV or Excel file here — or click to browse

Rows = respondents, columns = questionnaire items (Likert-type). Accepts .csv / .tsv (comma, semicolon or tab) and .xlsx / .xls (you'll be asked which sheet to use). ID columns are detected and set aside.

Advanced settings

Expected carelessness

How deep the automatic cut reaches into the attentive tail (a false-positive-rate knob). Standard (≈0.6% FPR-on-attentive) is the validated default and is best whenever the careless rate is up to about 20%, favouring precision (fewest false positives). Switch to High (≈7%) only for heavily contaminated samples above ~20% careless, where it recovers more of the careless at the cost of more false positives.

Custom sensitivity cut (z) — advanced

Power-user override: set the cut directly as a z on the attentive cluster, bypassing the preset above. Lower z flags more (more false positives), higher z flags fewer. The two validated settings are z = 2.5 (Standard) and z = 1.5 (High); a value outside that 1.5–2.5 range is unvalidated and raises an out-of-limits warning on the results.

Expected careless rate (%) — override

Leave blank for automatic calibration (a two-groups model estimates the rate and flags the careless cluster). Enter a value only if you have a prior (pilot, earlier wave): it then flags exactly that top share — apply once, to raw data.

Coupled-pair proportion (corProp)

Share of the highest-|r| item pairs the rr index uses as its coherence signal. Default 0.03.

Permutation iterations

Random pair-sets for each respondent's rr baseline. More = more stable, slower. 200 is plenty (per-respondent SD ≈ 0.01).

Minimum coupled pairs

Floor on the number of coupled pairs (protects short questionnaires).

Random seed

Fixed seed → identical results on re-run.

Delimiter

Override only if auto-detection fails. Pick a preset above, or type any single character in the box for an unusual delimiter (pipe, tilde, caret, …) — the custom box wins when filled.

Computing…

2Results

–

respondents

–

items used

–

careless rate (est.)

–

flagged careless

Each dot is one respondent, ranked left→right by CaReReRe ensemble careless probability (higher = more careless). Red dots are flagged; the dashed line is the cutoff. The rate is set automatically — a two-groups model estimates whether a careless subpopulation exists and where it separates, so no rate has to be guessed. Override it with a fixed expected rate under Advanced settings if you want.

Columns added: careless_prob, flagged, rr, longstring, person_total, irv, d2, n_missing

How it works

CaReReRe is an ensemble. It combines three complementary careless-detection signals into a single per-respondent probability, calibrated on real data with a known careless/attentive ground truth. The distinctive ingredient — the one we built and the reason the method exists — is the rr index; the other two are established detectors that catch the failure modes rr is deliberately blind to. Together they cover careless responding far better than any one signal alone.

The rr index — the core signal

Respondents who answer attentively are internally consistent: on items that tap the same construct they give compatible answers, so those items covary within the person just as they do across the sample. Careless respondents break that internal covariance. The rr index asks: did this person answer the questionnaire's strongly-related item pairs more coherently than they would by chance?

Find the coupled pairs. Correlate every pair of items across your sample; keep the top fraction with the largest |r| (default 3%) — the pairs that genuinely move together, typically items from the same scale.
Score each person's coherence. For one respondent, measure how tightly their answers track across those coupled pairs (an individual-level |correlation|). Attentive people track them closely; careless ones produce near-noise.
Build a personal chance baseline. Recompute the same coherence many times on random item pairs — a permutation done separately for each respondent — absorbing their own response style and scale usage.
Turn it into a z-score. rr = (coupled − mean random) / sd random. A low rr means their answers to items that should agree are no more coherent than random pairs — the signature of inconsistent carelessness.

Comparing each person to their own random baseline is what makes rr self-calibrating: no assumption of multivariate normality, no external cut-off table, no per-dataset tuning. That is the crucial difference from distance-based outlier detectors such as Mahalanobis distance, whose chi-square thresholds assume a normality that Likert data violate and which, applied by the book, flag almost no one.

The two partners in the ensemble

rr targets inconsistent carelessness (random or intermittent responding). It is deliberately blind to consistent carelessness — someone who answers “3” to everything is trivially coherent. Two established detectors fill that gap:

LongString — the longest run of identical consecutive answers, catching straight-lining that rr cannot see.
Person-Total correlation — how well a respondent's profile aligns with the sample's average profile, catching people who answer against the grain.

A logistic model fitted on our labelled validation study combines the three into one probability. On that data the ensemble reaches AUC 0.985 under leave-one-out validation — matching a far heavier random forest and clearly beating rr alone (AUC 0.90). The two partners are automatically down-weighted when your data's item-mean profile is too flat to support them (a case rr handles alone), so the ensemble never does worse than rr — the results panel tells you when this happens. Two further indicators, IRV (within-person response variability) and Mahalanobis D², are computed and shown for transparency but kept out of the score: their careless direction is not stable across questionnaires (IRV) or degrades when items outnumber respondents (D²).

From probabilities to flags — no rate to guess

The ensemble ranks everyone by careless probability, but turning a ranking into a yes/no decision usually needs a threshold — and asking the analyst for the “expected careless rate” is asking for exactly the unknown they are trying to measure. Worse, a fixed top-X% rule flags X% even on perfectly clean data. CaReReRe avoids this with an automatic two-groups calibration: it fits a mixture of two distributions to the scores — an attentive cluster and a careless cluster — and asks whether a careless cluster is actually there (via BIC). If it is, the tool estimates the careless rate and flags that cluster; if the scores form a single cluster, it flags nobody. So a clean or already-cleaned dataset yields ~0% (no false-positive cascade), and a contaminated one yields a data-driven rate — with nothing to guess.

The automatic mode is deliberately conservative: it confidently flags the clearly-separated careless and reports them as the estimated rate; mild or partial inattention that overlaps the attentive cluster is intrinsically ambiguous and is left for you to judge from the dot plot and the per-respondent columns. If you do have a prior on the rate (a pilot, an earlier wave), you can override the automatic cut with a fixed expected rate under Advanced settings.

How deep the automatic cut reaches is governed by an Expected carelessness setting (Advanced settings) — a false-positive-rate knob on the attentive cluster. Standard (≈0.6% of attentive respondents) is the validated default and is best whenever the careless rate is up to about 20%, minimising false positives. High (≈7%) reaches further into the attentive tail and should be used only for heavily contaminated samples (above ~20% careless), where it recovers more of the careless at the cost of more false positives.

Validated on real data. On our collected study (real attentive respondents with real careless mixed in at controlled rates from 5% to 50%, 100 resamples per point), the ensemble ranks careless vs attentive essentially perfectly at every prevalence (AUC 0.98–0.99), so with the right cut the detection ceiling (oracle MCC) sits at ≈0.9 throughout. The automatic estimate of the careless rate is well-calibrated up to ~15% — it lands within a point or two of the truth — and then turns deliberately conservative: as careless grow past a fifth of the sample they stop being a clear minority, so the estimate plateaus around 20% rather than tracking higher (and detection is unreliable below ~5%, where there are simply too few careless to separate). Because the ranking stays excellent, if you expect heavy contamination set Expected carelessness to High to push the cut and recover the cases the conservative default leaves out. In the realistic 10–25% band the default reaches MCC ≈ 0.82–0.85, close to the oracle ceiling.

Handling real questionnaires

Reverse-keyed items are aligned automatically from the sign of the sample correlation, so you needn't recode anything. Items on different Likert ranges are rescaled to a common proportion, so a 1–7 scale can't drown out a 1–4 one. Constant (zero-variance) respondents are handled explicitly. Every feature is standardised within your dataset before the ensemble combines them, so the fixed weights transfer across questionnaires of different length and scale. A structure diagnostic, calibrated against pure sampling noise, warns you when the data lack the cross-item structure the method needs.

When it works best

The signal grows with the number of correlated item pairs, so CaReReRe is strongest on multi-construct batteries. In our validation study the ensemble is reliable (AUC ≈ 0.88–0.95) from about 45 items and 50 respondents upward — and already usable (AUC ≈ 0.80) from ~20 items — which is a wider envelope than rr alone. The applicability guide above reports where your specific dataset falls.

Privacy: your data never leaves your device — by design

Questionnaire data is often sensitive (personality, mental health, workplace surveys). This tool is built so that uploading your data is not just avoided — it is impossible: there is no server to receive it.

All computation happens in your browser. Your file is read locally (FileReader) and analysed by JavaScript running on your own machine (a Web Worker). Nothing is transmitted, stored, or logged anywhere.
The only network activity is downloading this page itself. The app makes zero outgoing requests with your data: no upload endpoint, no cookies, no analytics, no trackers, no runtime CDN calls. Every script (including the Excel reader, SheetJS) is bundled and served from this same origin — nothing is fetched from elsewhere and no library ever phones home.
Don't take our word for it — verify. Open your browser's developer tools (F12 → Network) and run an analysis: you will see no request leaving your machine. The source code is unminified and human-readable (rerere.js, app.js, worker.js). Once the page has loaded you can even switch off your connection and the tool keeps working.
GDPR standpoint: since no personal data ever reaches us, we perform no processing at all — you remain the sole data controller, no data-processing agreement is needed, and there is nothing for us to retain or erase. Using this tool does not constitute a data transfer to a third party.
The flip side: we cannot recover anything for you. Results exist only in your browser tab — download the results CSV before closing it.

The static files are served by Cloudflare Pages; like any web host, Cloudflare sees the ordinary page request (your IP requesting the site) — but never your data, which is opened only after the page is already on your machine. Your institution's policies for handling data on your own computer still apply.