AI Detection Benchmark

9 open-source forensic models × 182 images from NL Local Elections 2026

Model Performance vs. Human Screener Ground Truth

Threshold: 0.5. Ground truth = post-screener verdict (keep=AI, kick_out/low_score=non-AI).

Model Type Accuracy Precision Recall F1 Scored

Model Agreement

How many AI detection models (B-Free, SPAI, CommFor) agree per image?

Score Distributions

Distribution of scores by ground truth class for each AI detection model.

About the Models

Click an image in the Gallery to see detailed model results.