While different benchmark tests are used to measure the abilities of artificial intelligence models, a new approach recently draws attention: Super Mario Bros. play Hao AI Lab, a research organization at the University of California, has the popular artificial intelligence models Super Mario Bros. He tested it by inserting it in the game and achieved striking results.
In the experiment, Anthropic’s Claude 3.7 model exhibited the best performance, followed by Claude 3.5. Google’s GIZİ 1.5 PRO and OpenAI’s GPT-4O models showed a lower performance than expected.
Thinking models were the victims of “thinking” too much
However, this test did not use exactly the same as the 1985 classic game. The game, which was run at the emulator and integrated with a special framework called Gamingagent, offered artificial intelligence to control Mario. This system provided simple commands and screenshots, such as bouncing to avoid obstacles or enemies, and enabled artificial intelligence to make moves. Models directed Mario by producing Python codes.
According to the researchers of Hao AI Lab, this test is important to test the ability of artificial intelligence to plan the complex maneuvers and develop game strategies. Interestingly, it was seen that the “Thinking” models, which make sense of step by step, were more unsuccessful than intuitive working models. OpenAI’s O1 model often performed strongly in many comparison tests, but failed here.
The main reason for this is that the speed of decision -making in real -time games is critical. Artificial intelligence models like O1 need to “think için for a certain period of time before making moves. However, even one second delay in Super Mario Bros. can result in the death of the character.
On the other hand, artificial intelligence has been tested through games for decades. However, some experts question if the game skills give the right idea about the general intelligence or technological progress of artificial intelligence. Because games often offer more abstract, certain rules based on certain rules and provide infinitely data in theory.