Highlights:

  • The business claims that Grok-1.5V, which specializes in what it terms “multidisciplinary reasoning,” is more than capable of competing with current multimodal models across a range of fields.
  • According to benchmark data provided by xAI, Grok-1.5V performs better than industry competitors including GPT-4V, Claude, 3Sonnet, Claude 3 Opus, and Gemini Pro 1.5.

Elon Musk-led xAI Corp. launched its first multimodal model recently. The development adds to an AI arms race that never seems to get over.

Grok-1.5 Vision, also known as Grok-1.5V, is a considerably more advanced large language model than the original Grok-1 since it can comprehend both text and visuals, including displayed documents, images, screenshots, charts, diagrams, and more.

The business claims that Grok-1.5V, which specializes in what it terms “multidisciplinary reasoning,” is more than capable of competing with current multimodal models across a range of fields. It has intelligent spatiotemporal perception capabilities, or what’s called real-world spatial understanding in the AI community, which enable it to reason with complex text, analyze scientific images, and engage with visual content in a manner akin to that of a human.

The developer provided several real-world applications for the Grok-1.5V. For example, it can be used to convert drawings into kid-friendly stories, determine which object in a group is the largest, help drivers navigate obstacles by ensuring there is enough room, convert tables into CSV files, and determine whether a wooden deck needs to be replaced because it is decaying. Even the context of internet memes that the user is unfamiliar with will be explained.

According to benchmark data provided by xAI, Grok-1.5V performs better than industry competitors including GPT-4V, Claude, 3Sonnet, Claude 3 Opus, and Gemini Pro 1.5. Grok-1.5V outperformed its competitors by a significant margin in a new benchmark known as the RealWorldQA benchmark, which the company developed to assess real-world spatial comprehension.

Less than a month has passed since Musk’s team debuted the regular Grok-1.5 LLM, which defeated Grok-1 in terms of math and coding capabilities. Now, Grok is available in multimodal form. Additionally, Grok-1.5 demonstrated that it could handle far longer contexts than the original, allowing it to verify information from other sources and enhance answer accuracy.

The xAI claims that Grok-1.5V will soon be made accessible to early testers, beginning with those who have enrolled in X’s Premium service, which offers extra advantages to users of the social media platform formerly Twitter.

The startup, which debuted in July 2023, has advanced rapidly. Musk stated at the time that he was starting the business in response to AI developers like OpenAI and Google, who are very secretive about the inner workings of their AI models. According to Musk, the objective is to develop AI that is more accountable and transparent than the work of its competitors.