Elon Musk’s Grok-2 Beta Launched; Outperforms ChatGPT, Claude, and Gemini
Elon Musk’s AI venture, xAI has released an early preview of the Grok 2 model, and it has surprisingly outperformedClaude, Gemini, and even ChatGPT as well. The earlierGrok-1.5model was not received well, but Grok-2 has delivered great performance on the LMSYS leaderboard. xAI has released two new models: Grok-2 and a smaller Grok-2 mini model.
xAI says Grok-2 has been significantly improved in key areas including reasoning, instruction following, and providing accurate and factual information. In traditional AI benchmarks, Grok-2 has scored a whopping 87.5% in MMLU and 88.4% in HumanEval. This is particularly interesting because the MMLU score has been derived using 0-shot CoT.
Grok-2 was tested on LMSYS under the name “sus-column-r”. With around 12,000 votes, it stands at the third position, just below ChatGPT-4o-latest, Gemini-1.5-Pro-Experimental, and GPT-40-2024-05-13. However, it performs better than GPT-4o-mini, Claude 3.5 Sonnet,Gemini 1.5 Pro, andLlama 3.1 405B.Woah, another exciting update from Chatbot Arena❤️🔥The results for@xAI’s sus-column-r (Grok 2 early version) are now public**!With over 12,000 community votes, sus-column-r has secured the #3 spot on the overall leaderboard, even matching GPT-4o! It excels in Coding (#2),…https://t.co/gqSWSwYN0zpic.twitter.com/j9UYDBYNt4— lmsys.org (@lmsysorg)August 14, 2024
Woah, another exciting update from Chatbot Arena❤️🔥The results for@xAI’s sus-column-r (Grok 2 early version) are now public**!With over 12,000 community votes, sus-column-r has secured the #3 spot on the overall leaderboard, even matching GPT-4o! It excels in Coding (#2),…https://t.co/gqSWSwYN0zpic.twitter.com/j9UYDBYNt4— lmsys.org (@lmsysorg)August 14, 2024
In coding and math-related tasks, Grok-2 takes the 2nd spot, and in hard prompts, it takes the 4th position. xAI says that the Grok-2 multimodal model will be released soon. The company has not revealed the parameter size for both models. You can start using the new Grok-2 model on x.com and developers can get started with the API as well.
Arjun Sha
Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.
Add new comment
Name
Email ID
Δ
01
02
03
04
05