I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms

Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and how the release of such open-source models could genuinely change the course of LLMs forever. That is really exciting! And also, too big of a scope to write about… but when a model like DeepSeek […]

The post I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms appeared first on Towards Data Science.

Click here to read the article