AI Detection Benchmark — CampAIgn Tracker

Model Performance vs. Human Screener Ground Truth

Threshold: 0.5. Ground truth = post-screener verdict (keep=AI, kick_out/low_score=non-AI).

Model	Type	Accuracy	Precision	Recall	F1	Scored

How many AI detection models (B-Free, SPAI, CommFor) agree per image?

Distribution of scores by ground truth class for each AI detection model.

Click an image in the Gallery to see detailed model results.