Research & Data

State of AI Generation

An analytical overview of trends in synthetic media based on aggregated, anonymized detections.

Image Detection Trends

As generative image software like Midjourney, Stable Diffusion, and DALL-E continue to improve, examining the traces they leave behind in metadata provides key insights into the adoption of provenance standards.

Metric	Current Value	Trend (QoQ)
Images flagged as >90% Synthetic	41.2%	+5.4%
Presence of Valid C2PA Credentials	12.8%	+2.1%
Missing or Synthesized EXIF Data	78.5%	Unchanged
High ELA Variance (>0.15)	34.1%	+1.2%

Key Takeaway

Despite the industry push for the Coalition for Content Provenance and Authenticity (C2PA) standard, nearly 80% of generated visual media circulating on the web continues to feature entirely stripped or artificially fabricated EXIF metadata, preventing absolute cryptographic authentication.

LLM Text Fingerprints

Large Language Models tend to construct text using a predictable, statistically average vocabulary structure. This creates a quantifiable fingerprint in the text's "perplexity."

68% Flagged Documents Submitted texts identified as predominantly AI-generated
< 15 Avg. Perplexity Score Of flagged documents (Human text typically scores > 40)
92% Vocabulary Predictability The frequency of highly expected next-tokens in AI texts.

The data reveals that while human authors routinely inject low-probability vocabulary and structural irregularities (burstiness) into their prose, autonomous generative systems overwhelmingly favor safe, uniform phrase constructions, driving their perplexity scores significantly lower.