Anthropic PBC unveiled the Claude 3 series; a new family of large language models that it claims can outperform Google LLC’s Gemini Ultra and GPT-4.

There are three models in the series, varying in price and level of sophistication. Claude 3 Opus, the most advanced LLM, is believed to have “near-human levels of comprehension and fluency on complex tasks.” Two additional models, Claude 3 Sonnet and Claude 3 Haiku, are added to it, which compromise on response quality in exchange for lower inference costs.

Significant upgrades over Anthropic’s previous flagship LLM are included in all three models. They are less likely to produce biased responses or ignore innocuous cues that don’t violate the terms of service of the organization than they were with Claude 2.1. Another significant distinction is that users of the Claude 3 series can input images, technical diagrams, and other visual elements in addition to text.

A Claude 3 model may accept up to 200,000 tokens—data units comprised of a few letters or numbers—in a prompt. Anthropic claims that all three models can hypothetically consume prompts that include one million tokens or more. The company mentioned it “may make this available to select customers who need enhanced processing power.”

Opus, an LLM who can respond to complicated questions twice as accurately as Claude 2.1, is the star of the Claude 3 series. According to Anthropic, this increase in accuracy allows it to beat GPT-4 and Gemini Ultra on a number of well-known artificial intelligence criteria.

GSM8K, one of the benchmarks the business evaluated, has a lot of math problems from elementary school. According to Anthropic, Claude accurately answered 95% of the questions, while Gemini Ultra and GPT-4 received scores of 94.4% and 92%, respectively. Additionally, Opus outperformed its competitors in two other benchmark tests, MMLU and GPQA, which gauge how well-versed AI models are in subjects like physics.

The remaining models comprising the new Claude 3 series will be offered to customers at a reduced price despite their comparatively limited reasoning capabilities. They also produce timely responses more rapidly.

According to Anthropic, Haiku, the Claude 3 model that is the fastest and least expensive, can read a research article with 10,000 tokens’ worth of data in less than three seconds. Additionally, Sonnet, a third model that falls between Haiku and Opus, is available to customers. While it’s not as fast as the previous model, it still generates prompts roughly twice as quickly as Anthropic’s previous flagship LLM and has a greater response quality.

Currently, Anthropic’s free Claude.ai chatbot and an application programming interface are the only ways to access Sonnet and Opus. Haiku, on the other hand, will launch shortly. Anthropic intends to improve the Claude 3 series in the future by adding capabilities like the capacity to perform actions in third-party applications.

It looks like competitors may soon break Anthropic’s new AI benchmark records. OpenAI revealed in November that it had started working on a more sophisticated GPT-4 replacement. Google recently released information about a new version of Gemini, which is said to be significantly better than the existing version and can execute prompts with up to 10 million tokens.