ChatGPT, one of the most prominent among productive artificial intelligences, unfortunately cannot analyze images and videos in its version opened to us. Well, what could he do if he could?
The answer to this question was given by the artificial intelligence developer Mckay Wrigley. Using the iPhone and MacBook, Wrigley gave ChatGPT an ‘eye’ thanks to the software he wrote. ChatGPT also made suggestions based on surrounding objects.
Video with ChatGPT having eyes:
All the objects around Wrigley, as well as the food and drink in his refrigerator, are recognized by a distinctive artificial intelligence. This data is then presented to ChatGPT. ChatGPT first learns which objects are around with a question, and then answers another question about those objects.
With this method, Wrigley shows and teaches ChatGPT the objects in the refrigerator. Then it asks ChatGPT to give a recipe that matches the contents in the refrigerator. ChatGPT tells the appropriate recipe after a short search on the Internet.
Voice conversation in the video, of course, is not something ChatGPT can do. Wrigley uses OpenAI’s Whisper artificial intelligence to instantly convert text to audio. All the artificial intelligence he uses and the purposes of use are as follows:
- GPT-4: The language model that makes up ChatGPT
- YoloV8: Artificial intelligence that identifies objects visible to the camera
- Whisper: Converting text to audio
- Google Custom Search Engine: The tool that allows ChatGPT to search the internet
Wrigley runs these artificial intelligences together with the codes he wrote in Python, and such a result is obtained. Wrigley says that Apple is also preparing to develop tools for the augmented reality glasses that it is preparing to launch.
In other words, the video above is a concrete presentation of the potential of augmented reality glasses that we already know.