Did Google deceive people with its Gemini video?
In the 6-minute demo video, Gemini was able to recognize images, respond within seconds, accurately track the paper hidden under the cup at the cup trick, and more. But this video was a little too good to be true. As a matter of fact, Google accepts this. While this video spread quickly around the world, the description below it seems to have gone unnoticed: “For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.” This explanation is not found in other videos.
On the other hand, this situation is not something to cause outrage. Because companies often do this type of thing in their demo videos. “All user commands and output in the video are real and have been truncated for brevity,” Oriol Vinyals, vice president of research and deep learning leader at Google DeepMind, said in a statement to X. So, according to Google, the capabilities shown in the video are real, just not that reactive. On the other hand, Vinyals states that they have prepared such a video to show what multi-modal user experiences created with Gemini can look like and to inspire developers.
Google calls Gemini the most advanced AI model, and perhaps it really is. We don’t know that yet, but the most important thing is that this model is basically “multimodal”. In other words, it can process input such as photos, videos, audio and text. ChatGPT and others do this with plugins, they’re basically not true multimodel. Besides these, it might be better for Google to launch a small beta version to understand the true potential of Gemini. This way, people can challenge the model in real-world conditions and experience how powerful it is.