Anthropic Releases Claude 3 Generative AI Models Claiming to Beat GPT-4

on March 5, 2024 at 12:00 pm

Generative AI startup Anthropic has released the next iteration of its large language models (LLMs), called Claude 3. The new set of models includes, in ascending power, the Haiku, Sonnet, and Opus variants, which Anthropic claims are not only an improvement over the previous Claude models but could set an industry standard as they are comparable and, in some ways, even better than OpenAI’s GPT-4 models.

Claude 3

The Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus all have various upgrades over Claude 2 and Claude 2.1. Opus and Sonnet are currently available through the Claude API and on claude.ai in 159 countries, with Haiku slated to arrive soon. Opus, the flagship model of the family, boasts superior performance across various common AI benchmarks, including those that test undergraduate and graduate-level knowledge and reasoning, basic mathematics, and more.

The Claude 3 models are designed for instant response applications, such as live customer chats and data extraction, where real-time interaction is crucial. Haiku stands out as the fastest and most cost-efficient option in its class, promising to process extensive and complex documents in under three seconds. Sonnet offers double the speed of its predecessors while maintaining higher intelligence, and Opus matches the speed of earlier models but with enhanced cognitive abilities. The new models have also made improvements in reducing unnecessary refusals to answer prompts, indicating a more nuanced understanding of user requests. Anthropic’s eye on the enterprise market is also evident in how it pitches the way some of Claude 3’s improvements over its predecessors might be employed.

“The Claude 3 models are better at following complex, multi-step instructions. They are particularly adept at adhering to brand voice and response guidelines, and developing customer-facing experiences our users can trust,” Anthropic wrote in a blog post. “In addition, the Claude 3 models are better at producing popular structured output in formats like JSON—making it simpler to instruct Claude for use cases like natural language classification and sentiment analysis.”

Like Claude 2.1, the new models support a context window of 200,000 tokens but can rise to over a million tokens for certain applications. The Claude 3 Opus model, in particular, has demonstrated near-perfect recall in the ‘Needle In A Haystack’ (NIAH) evaluation. Anthropic’s claims about Claude 3 beating GPT-4 can be seen in the chart on the side. The new models don’t beat GPT-4 everywhere, but the Opus version is a winner in the MMLU, GPQA, GSM8K, MATH, MGSM, HumanEval, Drop, Big-Bench-Hard, ARC-Challenge, and Hellaswag tests. That includes a lot of knowledge exams, logic puzzles, and computer code composition. The sheer number of tests and the fact that Claude 3 Opus doesn’t always beat GPT-4 does lend some credence to Anthropic’s claims too.

“Opus, our most intelligent model, outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more. It exhibits near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence,” Anthropic explained. “All Claude 3 models show increased capabilities in analysis and forecasting, nuanced content creation, code generation, and conversing in non-English languages like Spanish, Japanese, and French.”

Claude 3’s models also bring multimodal operations to the table with vision capabilities that can process different visual formats like PDFs and images, which all beat OpenAI’s GPT-4V. The attempt to outperform GPT-4 is becoming a familiar refrain. Google made similar boasts about its Gemini Ultra model in at least some benchmarks. GPT-4 is likely to maintain its role as the model others are compared to for a while to come.

Anthropic’s new set of models comes just a few months after the debut of Claude 2.1 in November. The startup raised $100 million from SK Telecom in August, which came after a $450 million round in May. In between, it released the now-superseded Claude 2.0 as well as the smaller and faster Claude Instant 1.2. The startup also began taking in revenue through a paid version of Claude called Claude Pro. The $20 per month chatbot offers quintupled usage and faster access compared to Claude’s free service. A new and better LLM is obviously of interest, especially one that is unconnected to OpenAI, but Voicebot founder Bret Kinsella pointed out in our Synthedia newsletter that the implications go beyond just OpneAI and Anthropic.

“The other winner here is Amazon Bedrock. Azure is winning many enterprise generative AI deals today because it is the only public cloud option for accessing OpenAI’s foundation models. When OpenAI had the two most performant models in GPT-3.5 and GPT-4, the other cloud ecosystems were at a competitive disadvantage. Now AWS has a true GPT-4 rival in Claude 3, while Google Cloud has that model coming soon to sit alongside the Gemini model family,” Kinsella wrote. “Competition among LLM providers is about to get more interesting. The biggest unknown at the moment is when GPT-5 will be released (or GPT-4.5) and how much it will raise the performance bar. Epoch 2 of the LLM wars has arrived.”

Follow @voicebotai Follow @erichschwartz

Anthropic Releases Claude 2.1 Generative AI Chatbot With 200K Context Window and Reduced Errors

Google Bard (and Duet and Assistant Mobile App) No More – Gemini Now The Star of Generative AI Show

Anthropic Introduces Upgraded Generative AI Chatbot ‘Claude 2’

Anthropic Releases Claude 3 Generative AI Models Claiming to Beat GPT-4

Claude 3

Subscribe to Voicebot Weekly

Latest Posts

McDonald’s Abandons Drive Through AI for Order Taking

Apple Debuts ‘Apple Intelligence’ Generative AI Features Across All Devices

Stability AI Shares Open-Source Generative AI Audio Model for Creative Sound Design

Fable Studio Launches Generative AI TV Show Production Platform for Custom Streaming Content