OpenAI, the company behind the popular ChatGPT, has announced the launch of its new language model, GPT-4o. The “o” in GPT-4o stands for “omni,” signifying the model’s ability to handle text, speech, and video. This new model is an improvement over its predecessor, GPT-4 Turbo, offering enhanced capabilities, faster processing, and cost savings for users.
GPT-4o is set to power OpenAI’s ChatGPT chatbot and API, enabling developers to utilize the model’s capabilities. The new model is available to both free and paid users, with some features rolling out immediately and others in the following weeks.
The new model brings a significant improvement in processing speed, a 50% reduction in cost, five times higher rate limits, and support for over 50 languages. OpenAI plans to gradually roll out the new model to ChatGPT Plus and Team users, with enterprise availability “coming soon.” The company also began rolling out the new model to ChatGPT Free users, albeit with usage limits, on Monday.
In the upcoming weeks, OpenAI will introduce improved voice and video features for ChatGPT. The voice capabilities of ChatGPT may intensify competition with other voice assistants, such as Apple’s Siri, Alphabet’s Google, and Amazon’s Alexa. Users can now interrupt ChatGPT during requests to simulate a more natural conversation.
GPT-4o greatly improves the experience in OpenAI’s AI-powered chatbot, ChatGPT. The platform has long offered a voice mode that transcribes the chatbot’s responses using a text-to-speech model, but GPT-4o supercharges this, allowing users to interact with ChatGPT more like an assistant. The model delivers “real-time” responsiveness and can even pick up on nuances in a user’s voice, in response generating voices in “a range of different emotive styles” (including singing).
GPT-4o also upgrades ChatGPT’s vision capabilities. Given a photo — or a desktop screen — ChatGPT can now quickly answer related questions, from topics ranging from “What’s going on in this software code?” to “What brand of shirt is this person wearing?”. These features will evolve further in the future, with the model potentially allowing ChatGPT to, for instance, “watch” a live sports game and explain the rules.
GPT-4o is more multilingual as well, with enhanced performance in around 50 languages. And in OpenAI’s API and Microsoft’s Azure OpenAI Service, GPT-4o is twice as fast as, half the price of and has higher rate limits than GPT-4 Turbo.
During the demonstration, GPT-4o showed it could understand users’ emotions by listening to their breathing. When it noticed a user was stressed, it offered advice to help them relax. The model also showed it could converse in multiple languages, translating and answering questions automatically.
OpenAI’s announcements show just how quickly the world of AI is advancing. The improvements in the models and the speed in which they work, along with the ability to bring multi-modal capabilities together into one omni-modal interface, are set to change how people interact with these tools
.