DH

Detection explained

How AI detectors actually work.

AI detectors don't read minds — they measure statistical patterns. Most look at perplexity (how predictable the word choices are) and burstiness (how much sentence complexity varies), often paired with a classifier trained on human and AI examples, and from those signals they estimate a likelihood. Understanding the mechanics also explains the limits: why edited or paraphrased text shifts a score, why short passages are unreliable, and why detectors both flag human writing and miss AI writing.

Signal 1Perplexity
Signal 2Burstiness
Signal 3Trained classifier
OutputA likelihood, not proof

The signals

What detectors measure — and where it breaks.

Each signal is a useful proxy for AI authorship, and each has a failure mode. That combination is exactly why a detector score is an estimate to weigh, not a verdict to trust.

SignalWhat it measuresWhy it hints at AIWhy it misfiresWhat shifts it
Perplexity How predictable word choices are to a model Very predictable text can read as machine-like Clear human writing is also predictable Less expected phrasing, concrete detail
Burstiness Variation in sentence length and complexity Uniform rhythm can look automated Careful human writing can be uniform Mixing long and short sentences
Trained classifier Pattern match vs. labelled examples Learns features common in AI text Weak on new models, styles, languages Newer model output, unusual style
Text length How much signal is available Longer text gives more to measure Short snippets are unreliable Adding or trimming context
Editing / paraphrase Changes to wording and rhythm Reveals or hides surface patterns Same ideas, different score Rewriting, humanizing, light edits
Honest note: these signals are proxies, not proof. Because perplexity, burstiness and classifier outputs only estimate authorship, every detector produces false positives (flagging human writing) and false negatives (missing AI writing), and tools weight the signals differently. Cross-check important text on more than one tool, verify current behavior on each official site, and never treat a score as a verdict.

After detection

Fix the tells, not just the score.

Knowing what detectors measure points to the real work: add concrete detail, vary sentence rhythm and cut generic phrasing so the writing reads naturally to people first. MultipleChat helps by rewriting with several models, critiquing the result and preserving meaning — then you re-check with a detector and read it yourself, instead of blindly paraphrasing to chase a number.

Private by design: MultipleChat doesn't save your chats to memory and doesn't share your data with model providers or let them train on it.

Open the AI Humanizer
Measure Perplexity / burstiness
Diagnose Find AI tells
Humanize Multiple models
Re-check Detector + you
Natural, reviewed writing

All guides

Detect & humanize, in detail.

FAQ

How AI detection works — FAQ.

Short answers — see the full detect & humanize FAQ.

How do AI detectors work?

Most AI detectors look for statistical patterns that tend to differ between human and machine writing. Common signals include perplexity (how surprising or predictable the word choices are) and burstiness (how much sentence length and complexity vary), often combined with a trained classifier that has learned from many human and AI examples. From these signals the tool estimates a likelihood that text was AI-generated.

What is perplexity in AI detection?

Perplexity is a measure of how predictable a text is to a language model. Writing that a model finds very unsurprising — where each word is an expected continuation — tends to score as more machine-like, while less predictable phrasing reads as more human. It is only a signal, though: plenty of clear human writing is also predictable, which is one reason detectors can misfire.

What is burstiness?

Burstiness describes the variation in a piece of writing — the mix of long and short sentences and the rise and fall of complexity. Human writing often varies more, while some AI text is more uniform. Detectors may treat low variation as a sign of AI, but this is a tendency, not a rule, so it can mislabel both careful human writing and edited AI writing.

Do AI detectors use machine learning classifiers?

Many do. Alongside statistical signals like perplexity and burstiness, detectors often use a classifier trained on labelled human and AI examples to predict which a new text resembles. The classifier is only as good as its training data, so it can struggle with newer models, unusual writing styles or languages it saw less of during training.

Why does paraphrasing change a detector's score?

Paraphrasing or editing changes the exact word choices and sentence rhythm that detectors measure. Because the signals shift, the estimated likelihood can move up or down even when the underlying ideas are unchanged. This is also why a humanized rewrite can read very differently to a detector — and why a score should never be treated as definitive.

Why do AI detectors get fooled or misfire?

Detectors rely on patterns, and patterns are imperfect proxies for authorship. Short text gives them little to measure, edited or paraphrased text changes the signals, non-native English can look unusual, and AI models keep changing. As a result detectors produce both false positives, flagging human writing, and false negatives, missing AI writing.

Can AI detectors read minds or prove who wrote something?

No. A detector cannot observe how a text was actually produced; it can only compare the text to statistical patterns and estimate a probability. That estimate is evidence to weigh, not proof of authorship, so it should support human judgment rather than replace it.

Does text length affect AI detection?

Yes. Most detectors are more confident with longer passages because there is more signal to measure. Very short snippets give too little information, so results on a sentence or two are especially unreliable. This is one reason to cross-check and to be cautious with brief text.

How can I make AI text read more naturally after detection?

Address the underlying tells rather than chasing a score: add concrete detail, vary sentence rhythm, cut generic phrasing and write for the real reader. A multi-model humanizer like MultipleChat helps by having one model rewrite, another critique and you verify so meaning is preserved, then you re-check with a detector and read it yourself.

Are detector signals the same across every tool?

No. Tools weight perplexity, burstiness and classifier outputs differently and set different thresholds, so two detectors can disagree on the same text. That disagreement is a useful reminder that scores are estimates, and that important text deserves a second tool and a human read.