Detection Guide

How to Identify AI-Written Text

Understanding the rhythm, predictability, and stylometric markers of machine-generated writing.

The Need for Text Authentication

As Large Language Models become exceptionally proficient at mimicking human communication, the ability to verify authorship is increasingly important. From teachers evaluating student essays to publishers reviewing submissions, determining whether a piece of literature is genuinely crafted by a person protects academic integrity and authentic discourse. But identifying machine text is fundamentally different from spotting manipulated photos—it requires analyzing the invisible math behind the words.

For a deeper look into how fast synthetic media is spreading and current detection metrics, explore our Research & Statistics page.

The Hallmarks of Machine Writing

Language models operate by predicting the most statistically likely next word. Knowing this mechanic reveals the distinct fingerprint they leave behind:

  • 1
    Lack of Burstiness Humans write with "burstiness". We combine short, punchy statements with long, meandering, complex sentences. Machine text tends to be highly uniform in sentence length and structure, creating a monotonous reading rhythm.
  • 2
    High Predictability (Low Perplexity) If a text uses consistently common vocabulary and highly expected phrasing, its "perplexity" is low. Language models inherently avoid taking creative risks with unusual word choices unless explicitly prompted to do so.
  • 3
    Hedging and Over-Explanation To avoid making factual errors, generated text often relies heavily on hedging ("It is important to note," "There are many factors to consider"). It also tends to summarize its own points repeatedly at the end of paragraphs.

How Our Engine Reads Between the Lines

Our detector relies on stylometric algorithms rather than gut feeling. We look for the mathematical signatures of an LLM. Importantly, these are probabilistic indicators, not absolute determinations.

Perplexity Scoring

We run the text through a baseline language model to evaluate how "surprised" it is by the vocabulary. Texts that exactly match the model's highest probability predictions receive a higher AI score.

Syntactic Variance

The analyzer breaks down grammar trees and sentence lengths plotting them on a timeline to measure structural diversity. A perfectly flat, consistent rhythm is a strong indicator of automation.

Frequently Asked Questions (Quick Answers)

What does perplexity mean in AI detection?
Perplexity is a measurement of how statistically predictable a string of text is to a Large Language Model. A low perplexity score means the text uses highly expected vocabulary and structures, a strong indicator that the text was machine-generated.
What is burstiness in writing?
Burstiness refers to the variance in sentence length and structure within a document. Human writers naturally alternate between short, sudden statements and long, complex paragraphs. AI models tend to produce highly uniform, non-bursty text.
Can prompting an AI to write like a human beat the detector?
Advanced prompting (like instructing an AI to use high burstiness or unique vocabulary) can artificially inflate perplexity and lower the AI detection score. However, deep stylometric analysis can often still detect the underlying mathematical predictability.

Interpreting the Results

Text analysis is inherently more challenging than image analysis. A human who writes heavily structured, highly professional, or formulaic content (like a legal contract or a standard resume) might trigger false positives. Conversely, heavily edited machine text might pass as human. Always treat our analysis as one piece of the puzzle, combining our statistical signals with your own contextual knowledge of the author.

Try the Text Detector