OpenAI multimodal digital assistant could launch soon

OpenAI on website on smartphone stock photo (1)

Edgar Cervantes / Android Authority


  • On Monday, OpenAI is holding an event that could see an announcement about a new multimodal digital assistant.
  • Being multimodal would allow the assistant to use images for prompts, such as identifying and translating a sign in the real world.
  • This would be a direct threat against Google’s digital assistants, namely Google Assistant and the newer Gemini.

Over the past few weeks, the rumor mill has been churning, suggesting that OpenAI — the company responsible for ChatGPT — could soon launch an AI-powered search engine, which would be a direct threat to Google’s core business. Given how prominent ChatGPT has become in such a short time, this would represent the first real threat to Google Search in decades.

However, it’s looking less likely that OpenAI has a search engine on the way (via The Information). Instead, new rumors suggest that OpenAI’s scheduled event on Monday could see the company announcing a multimodal digital assistant. While not a traditional search engine, it would still allow people to search for things using the power of AI, so it would still be a significant threat to Google.

Multimodal means the AI can handle multiple input forms, not just text. In the case of this rumored digital assistant, it would be able to link to a camera, process real-world information, and then speak back to you with more information on what it sees. For example, you could point a camera at a sign in a different language and ask ChatGPT to both identify and translate the sign for you, and the AI would speak to you in response.

If this sounds familiar, that’s because it’s something Google Lens, Google Assistant, and, most recently, Google Gemini already do. In fact, ChatGPT can already do this, too, but not through one interface. In other words, Monday’s launch could see the company announce an upgraded GPT model that offers faster, more accurate responses with both image input and audible responses packaged into an app. In other words, a direct competitor to Gemini (and, subsequently, Google Assistant and Apple’s Siri).

To be clear, this would almost certainly not be GPT-5, the long-awaited follow-up to GPT-4 and GPT-4 Turbo. The company has indicated that GPT-5 isn’t coming to this event. The Information suggests it will only land sometime late in 2024.

Got a tip? Talk to us! Email our staff at You can stay anonymous or get credit for the info, it’s your choice.

You might like

Source link

C. Scott Brown