#agiEvaluation: AI Benchmarks Beyond ARC-AGI, MMMU, MLE-bench, and the FrontierMath TeststephenJan 15, 20254m