Highlights:

  • It utilizes an innovative text-to-speech AI model capable of producing human-like audio with a brief speech sample.
  • The voice feature update will roll out to ChatGPT Plus and Enterprise users on iOS and Android as an opt-in option in the next two weeks.

People have been able to hold text-based conversations with OpenAI LP’s chatbot powered by artificial intelligence for quite some time, but the company announced recently that it will soon be able to hold verbal conversations.

Users will also be able to take photographs and engage in back-and-forth dialogues with the chatbot in order to learn more about the subject of the image.

Using a model similar to OpenAI’s open-source Whisper model, which can transcribe human speech into text, the voice chat feature is designed to pick up what a person is saying to the AI chatbot and transform it so that the system can comprehend. It employs a new text-to-speech artificial intelligence model that can synthesize human-sounding audio from just a few seconds of sample speech.

OpenAI reported that the company’s developers worked with professional voice actors to create a variety of voices for the new experience. OpenAI provides five distinct accents with names that sound natural, including “Juniper,” “Ember,” “Sky,” and “Cove” and “Breeze.” The voices are of both genders, have exceptional clarity and intonation, and are therefore suitable for storytelling, reciting the news, and general conversation.

OpenAI added that it is also collaborating with Spotify on the pilot of its new Voice Translation feature, which will enable podcasters to translate their podcasts into other languages by using their own voices and the new voice model.

The new voice feature will be opt-in for iOS and Android ChatGPT Plus and Enterprise users within the next two weeks. Users can locate it in the New Features section of the mobile app’s settings and activate it by selecting the headphones button.

Conversations Concerning Images

With images, users will be able to get even more out of ChatGPT by photographing a scene, an object, or anything else and then asking the AI about it. Then, they will be able to converse with the chatbot about what it sees in order to solve a difficult math problem, construct a crèche, learn about a landmark, or obtain distant directions.

For instance, if enough potential ingredients are visible, a user could take a photograph of the contents of their refrigerator and inquire what they could prepare for dinner. They could stroll down a store aisle and obtain product information from ChatGPT by taking photographs of items for comparison shopping. It would also be possible to capture a picture of a grill that had been in the garage for an entire winter that a user couldn’t get lit in an attempt to get assistance and ChatGPT could look up the manual and help the user get it working again.

This new capability is an improvement over currently available capabilities, such as Google Lens, which provides a potent image search that can identify what is in a photograph. Google DeepMind, the artificial intelligence (AI) division of Google LLC, has also developed a vision-impaired AI model for Android called Lookout. It employs an AI model to characterize photographs and allows users to ask follow-up inquiries.

OpenAI explained that its experience with Be My Eyes, a free mobile app powered by GPT-4, informed the company’s approach to developing the new image capabilities integrated into ChatGPT.

With the ability to connect real-world images to internet queries and to converse with the chatbot, users will have access to brand-new capabilities, and it is evident that OpenAI is attempting to test the limits of its capabilities.

Additionally, the company emphasized that there are privacy implications when individuals may be in view. What happens, for instance, if someone takes a photograph of a person about whom the AI has public information but that presumably shouldn’t be disclosed? OpenAI stated that the company made measures to restrict the model’s analysis of individuals and would not make direct statements about them in order to respect their privacy, especially given that ChatGPT is not always accurate.