The Quirky World of Unofficial AI Benchmarks: Will Smith's Spaghetti Adventure
Eulerpool Research Systems •Jan 1, 2025
Takeaways NEW
- Classical benchmarks are often irrelevant for average users and applications.
- Unconventional AI benchmarks like the Will Smith spaghetti test are causing a stir.
In the constantly evolving AI landscape, not only academic tests but also unusual challenges are making waves. A particularly curious trend is the "Will Smith Spaghetti Benchmark." This test involves the ability of a video generator to realistically depict the well-known actor eating a bowl of spaghetti—a task that has become a popular meme and has already been humorously parodied by Smith himself.
In addition to the spaghetti test, there are other crazy benchmarks that are captivating the AI community in 2024. A 16-year-old developer has created an app that allows AI to build structures in the Minecraft world, while a British programmer developed platforms where AIs compete against each other in games like Pictionary and Connect Four.
Despite the popularity of these unconventional testing methods, there are also classic, scientific benchmarks. However, many of these standardized tests communicate little to the average user. Common performance demonstrations, such as solving mathematical Olympiads or tackling tasks at the doctoral level, seem irrelevant for the everyday use of chatbots—such as answering emails.
Even publicly accessible benchmarks like the "Chatbot Arena" have limitations. There, users can evaluate the performance of AIs on specific tasks. However, the assessments often come from a narrow circle of AI and technology enthusiasts and are subjective.
As Ethan Mollick, a professor of management at the Wharton School, recently emphasized, many benchmarks lack realistic comparisons with average human performance in various fields such as medicine or law.
The quirky AI benchmarks like the spaghetti adventure, Minecraft creations, or games like Connect Four are entertaining but neither empirical nor universally applicable. Passing the Will Smith test does not mean that an AI will automatically excel in other creative tasks.
Eulerpool Markets
Finance Markets
New ReleaseEnterprise Grade
Institutional
Financial Data
Access comprehensive financial data with unmatched coverage and precision. Trusted by the world's leading financial institutions.
- 10M+ securities worldwide
- 100K+ daily updates
- 50-year historical data
- Comprehensive ESG metrics

Save up to 68%
vs. legacy vendors