Model Performance vs. Human Screener Ground Truth
Threshold: 0.5. Ground truth = post-screener verdict (keep=AI, kick_out/low_score=non-AI).
| Model | Type | Accuracy | Precision | Recall | F1 | Scored |
|---|
Model Agreement
How many AI detection models (B-Free, SPAI, CommFor) agree per image?
Score Distributions
Distribution of scores by ground truth class for each AI detection model.
About the Models
Click an image in the Gallery to see detailed model results.