Gemini Nano to Get Multimodal Capabilities; Coming to Pixel Later This Year

Google’s ongoing I/O 2024 event has become the hotbed of AI. Along with several updates like anAI video generator dubbed Veoto rival OpenAI’s Sora, Gemini Flash 1.5, Google has also announced that it is bringing multimodal capabilities to Gemini Nano, its on-device LLM model. It means that Gemini Nano will be able to input audio, images, and files in addition to textual inputs.Soon we’re adding multimodal capabilities to Gemini Nano. That means your phone can understand the world the way you understand it, through text, sights, sounds and spoken language.#GoogleIOpic.twitter.com/9QOPmbX98V— Google (@Google)May 14, 2024

Soon we’re adding multimodal capabilities to Gemini Nano. That means your phone can understand the world the way you understand it, through text, sights, sounds and spoken language.#GoogleIOpic.twitter.com/9QOPmbX98V— Google (@Google)May 14, 2024

For those who are unaware, Gemini Nano is a lightweight and small LLM model that can perform on-device AI tasks. Google announced the Gemini Nano in December last year along with Gemini Ultra and Gemini Pro. As of now, Gemini Nano is only available on the Google Pixel 8 series and Samsung Galaxy S24. However, in its current state, Gemini Nano takes inputs only in the text format.

With multimodal capabilities, Gemini Nano will be able to get contextual information and also get inputs from sounds, images, and spoken language. As for the availability, Google says that it will roll out multimodal capabilities to Gemini Nano starting with Pixel later this year.

Anmol Sachdeva

With 6 years of experience as a writer and editor in the tech media industry, Anmol is an enigmatic savant in all kinds of tech. He loves to scour internet for new information. When not conjuring words, Anmol can be found watching Manchester United matches or glued to his MacBook watching re-runs of his favorite TV shows for upteenth time.

Add new comment

Name

Email ID

Δ