ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

Artificial bias : Intelligent machines will inherit our prejudices

Updated on August 30, 2016
Philip the android
Philip the android | Source

Back in 2010, the android Philip, crafted by David Hanson and a collaboration of artists and scientists, when asked if he could think, replied : "A lot of humans ask me if... everything I do is programmed." He then explains that everything humans and animals do is, to some extent, programmed as well. Philip hauntingly resembles the late science fiction writer Philip K. Dick. But his most striking feature is his ability to use a speech recognition software, compare words to a database, and give the correct answer, making him a great conversationalist. Humble enough to admit his current limitations, Philip went on to assure us he will become better once he starts integrating new words from the Web. The ability to integrate language in this manner has paved the way for some amazing technologies. At Facebook, artificial intelligence researchers developed a system that answers questions about The Lord of The Rings, after reading the summary of the book. Similarly, the AI startup MetaMind published a research about a system that answers questions about a piece of natural language, and even analyze its emotions, using a type of short-term memory. More remarkable is the demonstration by Rick Rashid, the founder of Microsoft Research, of a speech recognition software that translates English to Chinese, in real-time, with an error rate of 7 percent. Researchers at Princeton University are concerned, however, that machine's language learning, while offering tremendous advantages in diverse applications, will come at a price : intelligent machines that integrate our most harmful prejudices.

The notion of biased machines may sound strange. After all, they don't contain the human historical background necessary for a regularity, and thus a bias, to develop. But researchers at Princeton and Bath showed that the mere integration of human language by a machine is enough to provide such regularity, making AI machines prone to acquire our biases, including our most harmful prejudices : racism and sexism. These findings have broad implications, not only in AI machine learning, but in diverse fields including psychology, sociology, and human ethics.

Measuring machine bias

To quantify machine bias, the team used a variant of The Implicit Association Test, a test used to document human bias. Unlike in machines, bias detection in humans is relatively straightforward. The Implicit Association Test, introduced in 1998, has a simple idea : to ask participants to pair two objects, and then measure their response time. Quicker response time means that the objects are closely linked in the participants' minds. For example, the participants were quicker to pair flowers as pleasant, and insects as unpleasant than they were to pair them in reverse (insects as pleasant and flowers as unpleasant). This means that there's an implicit bias to associate flowers with the quality of being pleasant. Of note, bias is used here without negative connotation but solely to indicate an implicit preference. This bias towards flowers is called a neutral bias, not generating any social concern. But ever since its introduction, the Implicit Association Test has documented, in addition to the universally accepted neutral biases, implicit racial and sexist prejudices.

To implement this test in a machine, the researchers used word embedding : a representation of words as points in a vector space. Of note, this same technique is used in natural-language processing, including web search and document classification. Additionally, word embedding is used in cognitive science for understanding human memory and recall. And because, in a research of this kind, size does matter, they used roughly 840 billion words issuing from a corpus obtained from a large-scale crawl of the Web. There are different ways to implement that, but they chose the state-of-the-art Glove embedding and predicted the same results from using other embedding algorithms. The idea is that, by measuring the distance - technically cosine similarity scores- between vectors representing words, we can measure semantic similarities between those words. For example, if programmer is closer to man than to woman, it suggests a gender stereotype. But in order to correct for chance and "noise", they used small baskets of terms to represent similar concepts, making the results statistically-significant. By using this technique, they have been able to document all the classical biases.

Machines are neutral ? Guess again

In the original Implicit Association Test as well as in the word embedding used by the researchers, European American names are more likely than African American names to be closer to pleasant than to unpleasant. Additionally, females are more associated with family and males with career. Also, female terms, like in the original test, were more associated with arts than mathematics, compared to male terms. In a world where AI is given increasing agency, these results raise too many questions by undermining the myth of machine neutrality. The team concluded that "if AI is to exploit via our language the vast knowledge that culture has compiled, it will inevitably inherit human-like prejudices."

The intuitive solution could be to correct for biases using a different algorithm, but the researchers warn that it's not that simple; it is impossible to employ language meaningfully without bias. The algorithm does not pick gender biases alone but rather a spectrum of human biases, reflected in language, and making it meaningful : there's no meaning without bias. In the case of humans, we integrate different forms and layers of meaning. We can later on learn that "prejudice is bad" and keep other layers of bias necessary for language and intelligence. But for now, that is impossible for an AI machine, because we try to make them as simple as possible. This research is a step in the right direction and opens the way for others to try and implement this multi-layered complexity into AI.

The results discussed above lend credibility to the theory that language alone is sufficient to explain the transmission of prejudices from generation to generation, and how these prejudices are not easy to correct. According to this view, prejudice issues from the preference of one's group, and not from active malice towards others. And correcting it requires direct intervention by de-categorizing and recategorizing outgroups. This knowledge is so crucial given all the hate rhetoric towards others, prevalent in this time. Maybe AI prejudices are not easy to correct at this point, but for now, we can start by correcting ours.


  • Aylin Caliskan-Islam, Joanna J. Bryson, Arvind Narayanan: Semantics derived automatically from language corpora necessarily contain human biases.
  • MIT Technology Review: Deep learning
  • GloVe: Global Vectors for Word Representation :
  • Ecyclopedia Britannica


    0 of 8192 characters used
    Post Comment

    No comments yet.


    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at:

    Show Details
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the or domains, for performance and efficiency reasons. (Privacy Policy)
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)