Whose “Human Values” Will AI Express?

Jan 24

by Dr. Scott Allen Cambo & Liz O’Sullivan

It’s been a few weeks now since OpenAI released the first wide-access version of their very large and powerful language model, GPT-3. GPT models have been around since 2018, enamoring and enraging various communities, depending on whose social media you happen to follow. Opinions diverge significantly, exposing a major issue that’s unlikely to be resolved any time soon, even as companies and nations race to develop Artificial General Intelligence (AGI). This issue poses perhaps one of the thorniest problems ahead of us as we seek to integrate AI into society: the question of whose values are represented in artificial intelligence? And who gets to decide what those values are?

By releasing ChatGPT to the public, OpenAI has generated millions in valuable PR, but also something much more valuable: a preview into a possible future world where AI and humans interact seamlessly. It’s a world that’s currently unfolding in front of us, where both model and users become part of a broader socio-technical system that is as of this moment, very difficult to predict. Every user query generates one more batch of training data to be studied, and every viral tweet reaction to the machine’s output gets OpenAI a little bit closer to the burning commercial question: what do people truly want to see in response to the questions they ask? They’re discovering, through the lens of public opinion, just how much the AI can get away with, and where lies the line beyond which people will still tolerate its usefulness despite its risks. Using the public as a source for training data isn’t new, but with so many public releases of untested, bleeding edge AI in recent months, it’s telling that these teams need information that will help their businesses that they can’t get on their own in isolation.

So, what happened when ChatGPT went public? AI researchers got straight to work testing the boundaries of GPT’s moderation policies, revealing reams of racist, sexist, and discriminatory behavior that could happen spontaneously, or with just a little nuance in human requests. In the process of doing this, they brought much-needed attention to some of the very real risks bots like this one pose to our society. Scholars demonstrated that GPT’s knowledge of science, math, and facts about our world are often limited, even while those limited responses are stated with the conviction of absolute truth. Some of these scholars found ways to break through GPT’s moderation and ask questions that could endanger real people, either by asking how to effectively commit crimes or produce illegal substances. As AI gets better, these risks will grow, and only through ironclad moderation can they be mitigated, if ever fully defeated.

But this moderation itself brings about a very interesting split in the public perception of AI, in that some users were less enthusiastic about the critical work of AI safety in this context. When Sam Altman, CEO of OpenAI, took to Twitter to defend his decision to mitigate fairness risks in ChatGPT, one Twitter user said they would “pay extra for a non-woke version” of the bot. This is just one of many examples of social reactions decrying the need for moderation, often personifying the model in the process.

In fact, when Stable Diffusion became the first AI-powered image generator, fans celebrated that they could now produce sexual art without typical moderating forces. They did this despite the very clearly documented risks that the technology could be used to create realistic portrayals of public figures in compromising positions. Stable Diffusion later reversed this decision, leading to much commotion. Edward Snowden famously decried the policy updates to Stable Diffusion’s removal of NSFW content as “censorship”, despite the obvious need to eliminate risks of AI-generated child exploitation.

And yet, even today, TikTok users are hard at work trying to trick image transformation models in various ways. One such model called the “AI Manga” filter transforms selfies into anime counterparts, and users immediately attempted to make sexualized, gender-bending photos by holding up pairs of circular objects suggestively, and making a certain face. The results are equal parts amusing and terribly gender-biased. But perhaps we can find it encouraging that younger generations seem to be growing up in an environment that understands how to adversarially interact with technology. This division over sexual content moderation is just one facet revealing the very wide and deep divisions that exist within the public’s perception of which values with which we should imbue our artificial “friends”.

Should the models allow sexual content, even with the risks of political deepfakes or child exploitation? Should they prohibit insensitive comments, or would this adversely impact the quality of fiction the models could create? If users want violent content, how graphically should it respond? Should these models be allowed to imitate humans, even if this could be a huge boon to hackers? When it comes to sexism, racism, ableism and the like– who gets to decide what triggers that line? Should it be the companies making the products? Or the communities who stand to be hurt the most? Perhaps most importantly… will these companies agree to make these decisions transparently, with input from outside experts and stakeholders? Or will we all simply be subject to their whims?

OpenAI, despite their name, seems to have come down firmly on one side of the discussion, dictatorially, with primary regard for commercial opportunity over collective public good. To his credit, Altman described AI that “needs to do whatever i ask” as AI that fails to present sexist or racist behavior. But there’s no guarantee his position will hold, or that anyone can agree on what those loaded terms mean. And when the paywall slams down on open access to ChatGPT, they’re likely to abide by the requests of customers first, without any guarantee that democracy and public interest will sway their intent.

Liz O'Sullivan

Whose “Human Values” Will AI Express?

The Conversation About AI is Off the Rails

Why we’re building Vera…

Vera AI, Inc.