Research & Data
State of AI Generation
An analytical overview of trends in synthetic media based on aggregated, anonymized detections.
Image Detection Trends
As generative image software like Midjourney, Stable Diffusion, and DALL-E continue to improve, examining the traces they leave behind in metadata provides key insights into the adoption of provenance standards.
| Metric | Current Value | Trend (QoQ) |
|---|---|---|
| Images flagged as >90% Synthetic | 41.2% | +5.4% |
| Presence of Valid C2PA Credentials | 12.8% | +2.1% |
| Missing or Synthesized EXIF Data | 78.5% | Unchanged |
| High ELA Variance (>0.15) | 34.1% | +1.2% |
Key Takeaway
Despite the industry push for the Coalition for Content Provenance and Authenticity (C2PA) standard, nearly 80% of generated visual media circulating on the web continues to feature entirely stripped or artificially fabricated EXIF metadata, preventing absolute cryptographic authentication.
LLM Text Fingerprints
Large Language Models tend to construct text using a predictable, statistically average vocabulary structure. This creates a quantifiable fingerprint in the text's "perplexity."
- 68% Flagged Documents Submitted texts identified as predominantly AI-generated
- < 15 Avg. Perplexity Score Of flagged documents (Human text typically scores > 40)
- 92% Vocabulary Predictability The frequency of highly expected next-tokens in AI texts.
The data reveals that while human authors routinely inject low-probability vocabulary and structural irregularities (burstiness) into their prose, autonomous generative systems overwhelmingly favor safe, uniform phrase constructions, driving their perplexity scores significantly lower.