Sharing on Mastodon:
https://howtonotcode.com/story/309-anthropic-benchmark-pushes-task-based-evals-over-leaderboards
Save
Home
About