Microsoft Releases a Small Phi-3 Vision Multimodal Model

Earlier in April, Microsoft released its first AI model under the open-source Phi-3 family:Phi-3 Mini. And now, after almost a month, the Redmond giant hasreleaseda small multimodal model called Phi-3 Vision. At the Build 2024, Microsoft also unveiled two more Phi-3 family models including Phi-3 Small (7B) and Phi-3 Medium (14B). All of these models are open-source under the MIT license.

As for the Phi-3 Vision model, it’s trained on 4.2 billion parameters. It means that the model is fairly lightweight. This is the first time a mega-corporation like Microsoft has open-sourced a multimodal model. It has a context length of 128K and you can feed images as well. Google did release the PaliGemma model, but it’s not meant for conversational use.

Apart from that, Microsoft says that the Phi-3 Vision model was trained on publicly available, high-quality educational and code data. Microsoft has also generated synthetic data for math, reasoning, general knowledge, charts, tables, diagrams, and slides.Image Courtesy: Microsoft

Despite its small size, the Phi-3 Vision model performs better thanClaude 3 Haiku, LlaVa, and Gemini 1.0 Pro on many multimodal benchmarks. It even comes pretty close to OpenAI’s GPT-4V model. Microsoft says that developers can use the Phi-3 Vision model for OCR, chart and table understanding, general image understanding, and more.

If you want to check out the Phi-3 Vision model, head over to Azure AI Studio (visit).

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Add new comment

Name

Email ID

Δ

01

02

03

04

05