Mistral Unveils Its First Multimodal AI Model

Why Trust Techopedia
Key Takeaways

  • Mistral's new Pixtral 12B model supports both text and image processing.
  • Pixtral 12B is built on Mistral's previously released Nemo 12B text model that now has the added benefit of a vision adapter.
  • Mistral Pixtral 12B can be downloaded through GitHub and Hugging Face.

Mistral, a French AI startup, has released Pixtral 12B, its first model that can handle both images and text. 

Pixtral 12B is based on Nemo 12B, a text model developed by Mistral. The new model includes a 400-million-parameter vision adapter, allowing users to input images alongside text for tasks such as image captioning, counting objects in an image, and image classification—similar to other multimodal models like Anthropic’s Claude and OpenAI’s GPT-4. Images can be provided either through URLs or encoded via base64.

When processing images, Pixtral 12B divides them into 16 x 16 pixel patches, enabling it to handle high-resolution images more effectively. The model uses 2D RoPE (Rotary Position Embeddings) for the vision encoder, allowing it to better understand spatial relationships within the provided images.

Pixtral 12B features 12 billion parameters that essentially reflect the model’s problem-solving ability. The more parameters, the better the model typically is at solving complex problems. For comparison, GPT-3 has over 175 billion parameters, highlighting that Pixtral 12B still has a long way to go to compete with OpenAI’s more than year-old model.

Mistral Pixtral 12B is available for download via a torrent link on GitHub and Hugging Face. Mistral hasn’t clarified under which license Pixtral 12B is released, but some of Mistral’s previous models were released under Apache 2.0, so it’s possible Pixtral 12B follows the same licensing.

As of now, the model is free to use for research and academic purposes but requires a paid license for commercial use. Additionally, Mistral’s Head of Developer Relations, Sophia Yang, said the model will soon be available for testing on Mistral’s chatbot and API platforms, Le Chat and Le Platform.