ChatGPT Got a Secret Update Last Week, And It’s Performing At Its Best
Increasingly, AI companies are testing new and experimental models under strange names on the LMSYS Chatbot Arena and quietly deploying them without any release notes. Case in point, since last week, X users have been discussing improved performance onChatGPT, whether for coding or creative tasks. Many believed it was a new OpenAI model, likely related to Project Strawberry — a new advanced reasoning engine.Something might be going on w/ GPT-4oFor the first time in a long time, it provided better “vibes” on an output than 3.5 SonnetReally surprised… will keep using it today to see if it continues— Matt Shumer (@mattshumer_)August 12, 2024
Something might be going on w/ GPT-4oFor the first time in a long time, it provided better “vibes” on an output than 3.5 SonnetReally surprised… will keep using it today to see if it continues— Matt Shumer (@mattshumer_)August 12, 2024
Finally, OpenAI let the genie out of the bottle and revealed thatChatGPT is indeed running a new model. It’s not a new frontier-class model but an improved GPT-4o model. Therelease notesays that it is an updated GPT-4o model optimized for chat, and its name ischatgpt-4o-latest. Based on qualitative feedback and experiment results, OpenAI has tuned theGPT-4o model for better performance.there’s a new GPT-4o model out in ChatGPT since last week. hope you all are enjoying it and check it out if you haven’t! we think you’ll like it 😃— ChatGPT (@ChatGPTapp)August 12, 2024
there’s a new GPT-4o model out in ChatGPT since last week. hope you all are enjoying it and check it out if you haven’t! we think you’ll like it 😃— ChatGPT (@ChatGPTapp)August 12, 2024
OpenAI further says that it continues to remove bad data from the training dataset and add good ones along with “experimenting with new research methods.” This is where the intrigue begins.Project Strawberryis supposed to bring a new post-training method to improve reasoning. Is the new ChatGPT model already running the Strawberry engine?
Wow, GPT-4o now uses multi-step reasoning. impressive to see this in action. Turns out the update wasn’t a new model, but a new method.pic.twitter.com/kVF0ndA21T— Ra (@misaligned_agi)August 13, 2024
I can’t say for sure, but many X users noticed that ChatGPT now uses multi-step reasoning to give correct answers. In this method, themodel improves itselfby generating various step-by-step reasoning rationales, and ultimately, coming to a correct conclusion.
By the way, OpenAI also tested the new ChatGPT model on LMSYS under the name “anonymous-chatbot” and it received more than 11,000 votes. The new “chatgpt-4o-latest” model has again taken the first spot, outranking other AI models from Google, Anthropic, and Meta. It has become the first model to score 1314 points in LMSYS Arena.Exciting Update from Chatbot Arena!The latest@OpenAIChatGPT-4o (20240808) API has been tested under “anonymous-chatbot” for the past week with over 11,000 community votes.OpenAI has now successfully re-claimed the #1 position, surpassing Google’s Gemini-1.5-Pro-Exp with an…https://t.co/9lJlASI9UWpic.twitter.com/gxCDuBOi9N— lmsys.org (@lmsysorg)August 14, 2024
Exciting Update from Chatbot Arena!The latest@OpenAIChatGPT-4o (20240808) API has been tested under “anonymous-chatbot” for the past week with over 11,000 community votes.OpenAI has now successfully re-claimed the #1 position, surpassing Google’s Gemini-1.5-Pro-Exp with an…https://t.co/9lJlASI9UWpic.twitter.com/gxCDuBOi9N— lmsys.org (@lmsysorg)August 14, 2024
Does the New ChatGPT Model Pass the Vibe Test?
To test the updated ChatGPT model, I tried a few reasoning prompts, and well, I did not find much difference between the older and the latest model. I asked it to find the bigger number between 9.11 and 9.9, and it gave a correct response, just like before. I also ran other commonsense reasoning questions, and it was in line with the older model.
However, in some prompts, it still fails to get the answer right. For example, in response to the below prompt, it tells me to stack 9 eggs on top of the bottle, which is impossible.
In another test, it says that there are only two “R”s in the word strawberry, which is again incorrect.
It might be the case that the new ChatGPT model has not been rolled out widely. Either way, with OpenAI’s new model, we can expect improvements in other key areas. If you have any queries, let us know in the comments below.
Arjun Sha
Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.
Add new comment
Name
Email ID
Δ
01
02
03
04
05