Why AI Moderators Will Be Biased
Introduction
Microsoft revealed how AI can learn to be biased as a result of human effort. Within 24 hours, the AI Tay, modeled off a teenaged girl, was spouting racist and other bigoted statements. While Tay mostly learned from the worst of the trolls on the internet and was shaped by the coordinated input of such “bad” data, the issue triggered a wave of papers on how AIs learn to share human biases. Most of those papers focused on how AIs learn sexism and racism – but this article is concerned about the other biases AIs learn from humans with far greater likely impact on discourse.
![The censorship seems logical when it is automatically done by the AI. And when done automatically, it is also hidden to most. The censorship seems logical when it is automatically done by the AI. And when done automatically, it is also hidden to most.](https://images.saymedia-content.com/.image/t_share/MTc0MTczNzk5ODk2Nzg2NDI4/clinton-cash-a-book-review.jpg)
How AIs Learn Bias by Design and How It Impacts the Internet
As an engineer with more than a decade of experience in IT, I realized that AIs and computer models in general may end up with built-in biases as a result of human decisions. Furthermore, their biases are reinforced based on the biases of humans who select which AI programs to go with and the human moderators who provide further “tuning” of the models. Let’s look at how the demonstrated biases at large tech companies are already affecting dialogue and how relying on artificial intelligences they think are unbiased will actually increase the bias and ability to communicate for many.
Google’s program “Perspective” is an AI trained to flag and moderate comments and tries to tame the trolls. Instagram set up an AI to do the same, though unlike Google’s project, it relies entirely on their own data set. Who wouldn’t want to shut down trolls? The problem comes up with the supposedly neutral algorithms by design. When an AI is trained based on the data from left wing sites like Huffington Post and New York Times, moderate opinions are now classified as on the right and barely tolerated, while conservative opinions are readily flagged as unacceptable.
Publishers can define what toxicity threshold they’ll tolerate on their site with Google Perspective, but the toxicity rating is based on the liberally biased data set and further reinforced by comments flagged as unacceptable by these sites. Observe sites from Reddit to Facebook purging conservative, libertarian, ex-Muslim and other groups based on the biases of the site administration. Given that the left wing bias is obviously dominant across many social media platforms, it is logical to assume they’ll only implement moderating AIs that moderate based on their biases. Training of the AI will reinforce it with similar or increasingly biased examples as they continue to censor content they don’t like.
The new data could end up not only locking in the bias but shifting further as the definition of “unacceptable” expands to include ever more opinions, topics and sources.
Fact checking doesn't prevent this bias but is an example of it. The fact checking of data against liberally biased sites against liberally biased fact checkers like Snopes and Politifact is one example of this. The unfairness of fact checking against liberal sites is that what these sites don’t cover is flagged as untrue or unverified when covered by the remaining independent outlets or conservative sources. The other problem is the shared bias of sites like Snopes and Politifact is that it rates reporting on conservative sites as partially true, instead of conservatively biased. A third issue is the fact that liberals are selective in what they fact check, rarely fact checking verifiably untrue statements by liberals. This results in untrue statements by liberal figures given weight by virtue of publication, while conservatives’ statements are minced, challenged, reported as biased and otherwise delegitimized. Sites like Facebook and Twitter limiting original sources like Wikileaks limits the ability of humans to vet this information as legitimate, and the biggest social media cites censoring these sources teach AI not to let others share them or downgrade comments that refer to them.
You end up with the AI reinforcing the ever-narrowing acceptability standards, as it weeds out balancing counter-opinions and the user community itself shifts to the left. After all, if one never hears a contrary opinion in any degree, it is already known to social sciences that the community becomes more extreme. Algorithms reinforced by like-minded humans will do the same with greater speed and efficiency.
How Can This Issue Be Solved?
More data isn’t the solution. The issue isn’t resolved by using large data sets unless they come from sources with diverse opinions. For this reason, using multiple data sets from liberal sites isn’t a solution, but mixing in sources from conservative and international sites that have different biases is a solution. However, that’s not what has happened during development, and it isn’t likely to occur as a correction.
Conservatives, libertarians, people questioning the political correctness baked into the data sets used to train AIs on social media sites that will censor them across the internet as these tools are rolled out may find freedom on other platforms, but that leads to greater bias by the AIs as every contrary opinion disappears off major platforms.
For the time being, using more facts and detail in an online comment or posting reduces the odds that you will be censored by these AIs, but as they learn more context, you end up with anything other than the PC checklist being censored by an AI that the moderators think is unbiased because they don’t check their own biases. It is rather self-unaware for liberals to say that others have biases while having their own, but Dennis Prager pointed that problem out a decade ago.
It is possible that deliberate campaigns to promote conservative, libertarian and classic liberal content and discussing it will keep public discussion and AI moderation data sets moderately left instead of evolving into a narrow, far left corner.
One solution would be tech companies deliberately hiring more conservatives, libertarians and others who do not share their liberal biases and putting them in positions of influence over technology. Think about the Facebook trending news feed where they censored conservative stories and injected liberal ones, the majority of the team thinking this was moral and good. However, I don’t expect that to happen.
Why Does This Matter?
The book “Wisdom of the Crowds” shares how it is diversity of opinion and free exchange of ideas that leads to correct democratic solution. When we silence opinions, facts based on their sources, and even facts from neutral sources because they are contrary to an ever extreme set of political biases, the public debate’s outcome by its very nature leads to wrong outcomes. We also risk cultivating extremism because people’s dialogue shifts farther and farther to the extremes because their views are never countered by the other sides. This is a logical result of everyone retreating to a safe space, both because the AI won’t let liberals see anything else and conservatives have to retreat to corners the AIs don’t reach.
And our social divide becomes an impossible chasm, made worse by the supposedly moderate artificial intelligences whose creators don’t understand the biases and reinforcement thereof their creations cause.
References
Snopes, Which Will Be Fact-Checking For Facebook, Employs Leftists Almost Exclusively
http://dailycaller.com/2016/12/16/snopes-facebooks-new-fact-checker-employs-leftists-almost-exclusively/
Snopes, Politifact, & Other Fact Checkers are Liberal Mouthpieces
http://canadafreepress.com/article/snopes-politifact-other-fact-checkers-are-liberal-mouthpieces
Who’s Checking the Fact Checkers?
A new study sheds some light on what facts the press most likes to check.
https://www.usnews.com/opinion/blogs/peter-roff/2013/05/28/study-finds-fact-checkers-biased-against-republicans