Grok Visual

X.AI Offers a Look at New Multimodal Grok-1.5V with Vision Capabilities has introduced a new, multimodal member of the Grok family of large language models. Grok-1.5V is capable of interpreting both text and visual information, with the “V”  standing for “vision.”  The Elon Musk-founded company notably timed the reveal of the new model to come only days after the launch of OpenAI’s GPT-4-turbo-vision model.


Grok-1.5V is the first multimodal model from and boasts enhanced text processing abilities along with the capacity to interpret a diverse array of visual data. The company suggests the model can absorb and respond to information in the form of drawings, photos, charts, and diagrams, among others. Right now, it is only available to early testers and existing users of Grok technology, but the startup is eager to share how Grok-1.5V matches or beats rivals in the multimodal space among multiple benchmarks. The company claimed that the model’s capabilities are on par with OpenAI’s GPT-4 and Anthropic’s Claude 3 Opus.

“Grok-1.5V is competitive with existing frontier multimodal models in a number of domains, ranging from multi-disciplinary reasoning to understanding documents, science diagrams, charts, screenshots, and photographs,” the company explained in announcing the model. “We are particularly excited about Grok’s capabilities in understanding our physical world. Grok outperforms its peers in our new RealWorldQA benchmark that measures real-world spatial understanding.”

Alongside the Grok-1.5V announcement, introduced the RealWorldQA benchmark, a dataset of over 700 images, questions, and answers designed to evaluate a model’s spatial understanding capabilities in real-world scenarios. The dataset includes anonymized images collected from vehicles, potentially leveraging data from Tesla’s self-driving AI models, given Musk’s ownership of both companies. has moved rapidly to catch up and try to beat out other generative AI model developers in a relatively short time. The company only just debuted the Grok-1.5 model, a new iteration that improves on the Grok-1 model. Grok-1.5 also arrived soon after made Grok-1 open-source, albeit not the fine-tuning and training data used on the model. Musk has also announced plans to widen access to the Grok chatbot to more X Premium subscribers. There’s more likely to come thanks to advantages like’s integration of Grok technology into platforms like X and Tesla’s self-driving AI.

