Position: let's develop data probes to fundamentally understand how data affects LLM performance

Read the full paper as a hosted PDF: Download PDF .

Also available on arXiv: https://arxiv.org/abs/2605.18801 .