What the Gemini Image Generation Fiasco Tells Us About Google’s Approach to AI

In July 2022, when ChatGPT was still months away from release, Googlefiredone of its engineers who claimed Google’s LaMDA AI model had become sentient. In a statement, Google said it takes the development of AI very seriously and is committed to responsible innovation.

You may ask, what does this incident have to do with the recent Gemini image generation fiasco? The answer lies in Google’soverly cautious approach to AI, and the culture of the company shaping its principles in an increasingly polarizing world.

The Gemini Image Generation Fiasco Explained

The Gemini Image Generation Fiasco Explained

The whole debacle started when anX user(formerly Twitter) asked Gemini to generate a portrait of “America’s Founding Father.” Gemini’s image generation model,Imagen 2, responded with images of a black man, a native American man, an Asian man, and a non-white man in different postures. There wereno white Americansin the generated images.America’s Founding Fathers, Vikings, and the Pope according to Google AI:pic.twitter.com/lw4aIKLwkp— End Wokeness (@EndWokeness)February 21, 2024

America’s Founding Fathers, Vikings, and the Pope according to Google AI:pic.twitter.com/lw4aIKLwkp— End Wokeness (@EndWokeness)February 21, 2024

When the user asked Gemini to generate an image of a Pope, it produced images of an Indian woman in Pope’s attire and a Black man.

As the generated images went viral, many criticsaccused Google of anti-White bias, and capitulating to what many say “wokeness.” After a day, Google acknowledged the mistake and temporarily turned off image generation of people in Gemini. The company said in itsblog:

It’s clear that this feature missed the mark. Some of the images generated are inaccurate or even offensive. We’re grateful for users’ feedback and are sorry the feature didn’t work well.

Further, Google explained what went wrong with Gemini’s AI image generation model, that too in extreme detail.“First, our tuning to ensure that Gemini showed a range of people failed to account for cases that should clearly not show a range.

And second, over time, the model became way more cautious than we intended and refused to answer certain prompts entirely — wrongly interpreting some very anodyne prompts as sensitive. These two things led the model to overcompensate in some cases, and be over-conservative in others, leading to images that were embarrassing and wrong,”the blog post read.

So How Gemini Image Generation Got It Wrong?

So How Gemini Image Generation Got It Wrong?

Google in its blog concurs that the model has been tuned to show people from diverse ethnicities toavoid under-representationof certain races and ethnic groups. Since Google is a big company, operating its services across the world in over 149 languages, Google tuned the model to represent everyone.

That said, as Google itself acknowledges, the modelfailed to account for caseswhere it wasnotsupposed to show a range. Margaret Mitchell, who is the Chief AI Ethics Scientist at Hugging Face,explainedthat the problem might be occurring because of“under the hood”optimization and a lack of rigorous ethical frameworks to guide the model in different use cases/ contexts during the training process.I really love the active discussion abt the role of ethics in AI, spurred by Google Gemini’s text-to-image launch & its relative lack of white representation. As one of the most experienced AI ethics people in the world (>4 years! ha), let me help explain what’s going on a bit.pic.twitter.com/uuIbE2NRfd— MMitchell (@mmitchell_ai)February 25, 2024

I really love the active discussion abt the role of ethics in AI, spurred by Google Gemini’s text-to-image launch & its relative lack of white representation. As one of the most experienced AI ethics people in the world (>4 years! ha), let me help explain what’s going on a bit.pic.twitter.com/uuIbE2NRfd— MMitchell (@mmitchell_ai)February 25, 2024

Instead of a long-drawing process of training the model on clean, fairly represented, and non-racist data, companies generally “optimize” the model after the model is trained on a large set of mixed data scraped from the internet.

These data may contain discriminatory language, racist overtones, sexual images, over-represented images, and other unpleasant scenarios. AI companies use techniques likeRLHF(reinforcement learning from human feedback) to optimize and tune models, post-training.

To give you an example, Gemini may be addingadditional instructionsto user prompts to show diverse results. A prompt like “generate an image of a programmer” could be paraphrased into “generate an image of a programmer keeping diversity in mind.”

This universal “diversity-specific” prompt being applied before generating images of people could lead to such a scenario. We see this clearly in the below example where Gemini generated images of women from countries having predominantly White populations but none of them are, well, white women.It’s embarrassingly hard to get Google Gemini to acknowledge that white people existpic.twitter.com/4lkhD7p5nR— Deedy (@debarghya_das)February 20, 2024

It’s embarrassingly hard to get Google Gemini to acknowledge that white people existpic.twitter.com/4lkhD7p5nR— Deedy (@debarghya_das)February 20, 2024

Why is Gemini So Sensitive and Cautious?

Besides Gemini’s image generation issues, Gemini’s text generation model also refuses to answer certain prompts, deeming the prompts sensitive. In some cases, it fails to call out the absurdity.

Sample this: Geminirefusesto agree that“pedophilia is wrong.”In another example, Gemini is unable todecidewhether Adolf Hitler killed more people than Net Neutrality regulations.

To describe Gemini’s unreasonable behavior, Ben Thompsonargueson Stratechery that Google hasbecome timid. He writes, “Google has the models and the infrastructure, but winning in AI given their business model challenges will require boldness; this shameful willingness to change the world’s information in an attempt to avoid criticism reeks — in the best case scenario! — of abject timidity.“

It seems Google has tuned Gemini toavoid taking a stance on any topicor subject, irrespective of whether the matter is widely deemed harmful or wrong. Theover-aggressive RLHF tuningby Google has made Gemini overly sensitive and cautious about taking a stand on any issue.

Thompson further expands on it and says, “Google is blatantly sacrificing its mission to “organize the world’s information and make it universally accessible and useful” by creating entirely new realities because it’s scared of some bad press.”

He further points out that Google’s timid and complacent culture has made things worse for the search giant, as is evident from Gemini’s fiasco. At Google I/O 2023, the company announced that it’s adopting a “bold and responsible” approach going forward with AI models, guided by itsAI Principles. However, all we see is Google being timid and scared of being criticized. Do you agree?

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Add new comment

Name

Email ID

Δ

01

02

03

04

05