“The model is innately more capable,” Sundar Pichai, the CEO of Google and its parent company Alphabet, told MIT Technology Review. “It’s a platform. AI is a profound platform shift, bigger than web or mobile. And so it represents a big step for us.”
It’s a big step for Google, but not necessarily a giant leap for the field as a whole. Google DeepMind claims that Gemini outmatches GPT-4 on 30 out of 32 standard measures of performance. And yet the margins between them are thin. What Google DeepMind has done is pull AI’s best current capabilities into one powerful package. To judge from demos, it does many things very well—but few things that we haven’t seen before. For all the buzz about the next big thing, Gemini could be a sign that we’ve reached peak AI hype. At least for now.
Chirag Shah, a professor at the University of Washington who specializes in online search, compares the launch to Apple’s introduction of a new iPhone every year. “Maybe we just have risen to a different threshold now, where this doesn’t impress us as much because we’ve just seen so much,” he says.
Like GPT-4, Gemini is multimodal, meaning it is trained to handle multiple kinds of input: text, images, audio. It can combine these different formats to answer questions about everything from household chores to college math to economics.
In a demo for journalists yesterday, Google showed Gemini’s ability to take an existing screenshot of a chart, analyze hundreds of pages of research with new data, and then update the chart with that new information. In another example, Gemini is shown pictures of an omelet cooking in a pan and asked (using speech, not text) if the omelet is cooked yet. “It’s not ready because the eggs are still runny,” it replies.
Most people will have to wait for the full experience, however. The version launched today is a back end to Bard, Google’s text-based search chatbot, which the company says will give it more advanced reasoning, planning, and understanding capabilities. Gemini’s full release will be staggered over the coming months. The new Gemini-boosted Bard will initially be available in English in more than 170 countries, not including the EU and the UK. This is to let the company “engage” with local regulators, says Sissie Hsiao, a Google vice president in charge of Bard.
Gemini also comes in three sizes: Ultra, Pro and Nano. Ultra is the full-powered version; Pro and Nano are tailored to applications that run with more limited computing resources. Nano is designed to run on devices, such as Google’s new Pixel phones. Developers and businesses will be able to access Gemini Pro starting December 13. Gemini Ultra, the most powerful model, will be available “early next year” following “extensive trust and safety checks,” Google executives told reporters on a press call.
“I think of it as the Gemini era of models,” Pichai told us. “This is how Google DeepMind is going to build and make progress on AI. So it will always represent the frontier of where we are making progress on AI technology.”