Google upgrades its Gemini image generator to Gemini 2.5 Flash Image, enhancing complex editing capabilities without distorting faces.
The Google Vids app introduces AI-powered tools, including Veo AI for converting images to video clips and a free tier to attract more users.
Google's AI advancements aim to close the user gap with OpenAI's ChatGPT, which has significantly more users.
DeepMind's Genie 3 model can generate interactive 3D worlds from text prompts, expanding Google's AI capabilities beyond images and videos.
Google is escalating its battle with OpenAI, rolling out a major upgrade to its Gemini image generator and simultaneously boosting its Google Vids app with new AI-powered creation tools and a free tier for consumers.
A 'bananas' upgrade: The new image model, Gemini 2.5 Flash Image, is designed to execute complex edits—like changing an article of clothing—without distorting faces, a common flaw in many AI tools. The model gained notoriety on the evaluation platform LMArena, where it impressed users while being tested anonymously as “nano-banana.”
Chasing ChatGPT: The move is a clear shot at OpenAI, as Google is still playing catch-up on user numbers. According to TechCrunch, Google’s Gemini has 450 million monthly users, a figure that pales in comparison to ChatGPT's reported 700 million weekly users.
From pictures to motion: On the video front, Google Vids now integrates the Veo AI model to turn still images into eight-second video clips from a text prompt. The company is also rolling out AI avatars and an automatic transcript trimmer, while making the basic Vids editor free for all consumers to attract more users.
Google is fighting on two fronts, pushing the technical capabilities of its consumer-facing AI while making its creation tools more accessible for businesses. It's an aggressive, bundled play to close the user gap with its rivals by emphasizing power and safety in equal measure. Meanwhile, Google's ambitions extend beyond images and short videos. Its DeepMind lab recently unveiled Genie 3, an AI model that can generate entire interactive, video-game-like 3D worlds from a simple text prompt.