Unlock your trading potential! Become a verified Bitget elite trader and earn 10,000 USDT to help skyrocket your profits. Join now and start your journey to success!
Share link:In this post: Google’s Gemini 2.5 Pro AI has completed Pokémon Blue, outperforming Anthropic’s Claude, which is still playing Pokémon Red. The AI navigated the game using visual input and agent tools, with minimal but strategic human intervention from developer Joel Z. While the achievement is notable, the developer cautioned against using it as a strict benchmark due to differing tools and frameworks across models.
Google’s flagship AI model, Gemini 2.5 Pro, has completed the 1996 Game Boy classic “Pokémon Blue”.
Last night, Google chief executive Sundar Pichai shared news on X, writing, “What a finish! Gemini 2.5 Pro just completed Pokémon Blue!”
TechCrunch reported that Joel Z, a 30-year-old software engineer who says he is “unaffiliated with Google,” streamed the run on Twitch.
Even so, executives at the search giant have been rooting for the project. Logan Kilpatrick, product lead for Google AI Studio, posted last month that Gemini was “making great progress at completing Pokémon” and had “earned its 5th badge (next best model only has 3 so far, though with a different agent harness).” Pichai replied with a joke, “We are working on API, Artificial Pokémon Intelligence :)”
Gemini beats the Anthropic AI model Claude, which is still working on Pokémon Red
The choice of game is no accident. In February, rival firm Anthropic spotlighted steady gains made by its Claude models while playing “Pokémon Red.” The company said Claude’s “extended thinking and agent training” gave it a “major boost” on unexpected tasks such as a classic role-playing game.
See also OpenAI reverses ChatGPT updates after users complain of its 'sycophantic' behavior
Joel Z cited the Claude Plays Pokémon Twitch feed as one of his inspirations.
So far, Claude has not finished “Pokémon Red.” This means Gemini is a better gamer.
However, Joel Z warned viewers against reading too much into the comparison. “Please don’t consider this a benchmark for how well an LLM can play Pokémon,” he wrote on his Twitch page. “You can’t really make direct comparisons — Gemini and Claude have different tools and receive different information.”
Google’s Gemini, like other AI models, requires help from prompts or so-called agent harnesses
The agent harness shares updated screenshots with the LLM that carry extra on-screen data. Gemini then reasons about the situation, may call sub-agents for specialized tasks, and finally instructs which button to press in the Game Boy emulator.
Joel Z livestreamed “Gemini plays Pokemon” on Twitch
Joel Z admitted that he occasionally steps in, but argued that the help stays within fair limits. “My interventions improve Gemini’s overall decision-making and reasoning abilities,” he explained. Moreover, the model doesn’t require specific hints, walkthroughs, or direct instructions for particular challenges like Mt. Moon.
He added, “The only thing that comes even close is letting Gemini know that it needs to talk to a Rocket Grunt twice to obtain the Lift Key, which was a bug that was later fixed in Pokémon Yellow.”
See also Brazil to attract data centers with tax breaks for green tech investments
“Gemini Plays Pokémon is still actively being developed,” said Joel, noting that the framework behind the project “continues to evolve.”
Cryptopolitan Academy: Want to grow your money in 2025? Learn how to do it with DeFi in our upcoming webclass. Save Your Spot
0
0
Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.