OpenAI is rolling out its new version that allows you to direct the AI bot not just by typing sentences into a text box, but by speaking out loud or simply uploading an image. According to OpenAI, the new features will be available to ChatGPT Plus subscribers within the next two weeks. The free version and everyone else will have this feature “soon.”
voice chat
On the other hand, we can say that the voice chat part is quite familiar. ChatGPT will be possible just like Alexa, Cortana, Google Assistant or Siri. You tell the artificial intelligence what you want with just the touch of a button. ChatGPT converts this into text and feeds it into the large language model, gets a response, converts it back into speech and relays the response back to you out loud.
This will actually be something we will hear frequently in a short time. We can say that OpenAI is just a pioneer. Because it seems like most virtual assistants are being rebuilt to be based on large language models. In a short time, we will actually start carrying virtual assistant artificial intelligence such as ChatGPT on our phones.
OpenAI’s outstanding Whisper model does most of the speech-to-text work, and the company is rolling out a new text-to-speech model that it says can produce “human-like voice from just text and a few seconds of sample speech.” You’ll be able to choose ChatGPT’s voice from five options, but OpenAI thinks the model has much more potential than that. OpenAI is working with Spotify to translate podcasts into other languages, for example, while preserving the original audio. There are many interesting uses for synthetic voices, and OpenAI could be a big part of this industry.
Voice synthesis comes with risks
Being able to create a capable synthetic voice with just a few seconds of audio opens the door to all kinds of anxiety-provoking situations. “These capabilities introduce new risks, such as the potential for malicious actors to impersonate public figures or commit fraud,” the company says in a blog post announcing the new features. OpenAI says the model is not suitable for broad use for this very reason: It will be much more controlled and limited to specific use cases and partnerships.
Visual support
By the way, the new visual search is actually a bit like Google Lens. You take a photo of what you’re interested in, and ChatGPT tries to understand what you’re asking about and responds accordingly. You can also use the app’s drawing tool or write questions with the image to clarify your query.
Frankly, visual search also has potential problems. One of them is about what can happen when you ask a chatbot about a person: OpenAI says ChatGPT intentionally limits its “ability to analyze and make direct statements about people” for both accuracy and privacy reasons.
With this new release, OpenAI is trying to create safer artificial intelligence by deliberately limiting what its new models can do. But this approach won’t work forever. As more people use voice control and visual search, and ChatGPT moves closer to being a truly multi-modal, useful virtual assistant, it will become increasingly difficult to keep the guardrails built there.