Research Guides: Artificial Intelligence for Image Research: Datasets, Bias, Discrimination

Datasets, Bias, and Discrimination

Oftentimes, AI can seem accurate and objective, but it is not immune to bias. The data AI models are fed and the algorithms it uses can still reflect human biases and inequalities. For example, if the data used to train an AI system is biased toward a certain group or demographic, then the model will also be biased toward that group. This is why it is so important to constantly evaluate and audit AI systems to ensure they're not perpetuating harmful biases. It's a reminder that even the most advanced technology isn't perfect and that we still need to be vigilant in addressing issues of bias and discrimination.

Datasets

In training image generation models, large datasets full of images are used as training material. These datasets are sometimes non-consensually scraped from the internet, containing artwork, and unattributed copyrighted images. Class action lawsuits between artists and AI developers, as well as stock photo hosting sites like Getty Images, are currently in procession. Furthermore, malicious users are using Illustrator art styles and selling works in their name, devaluing the original artist's work and muddying their style through these new associated images.

When working with AI datasets, it's crucial to understand the contents of the data to ensure ethical usage.

(Not sure how to advise on this...currently a grey area???)

Bias and Discrimination

Datasets contain inherent biases in their curation:

1. Historical bias: Bias that exists in a society that is reflected in data historically

2. Representation bias: Lack of geographical diversity in image datasets of people leads to an over-representation of certain groups over others

When using AI-generated images, it's important to take note of gaps in training datasets that can lead to inaccurate representations. E.g. a model trained with only a few images of Islamic architecture will be ill-equipped to produce accurate outputs. AI models cannot create what they are not trained on but will generate something regardless and present it as fact, which is why knowledge and curation of AI image outputs are important to spot these inaccuracies.

When using generative image AI, it is important to watch out for biases in image generation that reflect common stereotypes:

When asked for "A color photograph of an architect", Stable Diffusion presents the following results:

When asked for "A color photograph of a housekeeper", Stable Diffusion presents the following results:

From: https://www.bloomberg.com/graphics/2023-generative-ai-bias/

Human biases are reflected and compounded through the datasets on which these AI models are trained, reinforcing systemic racism. LinkedIn advertising programs showed a preference for male names in search queries, and early Microsoft chatbot "Tay" was conditioned by Twitter users into being antisemitic. In the field of image generation AI, this can manifest as the failure to depict diversity when being prompted for people, or the outright inability to generate imagery of people of color. This can also be shown in image generations of different professions and their associations with race/gender, entering the term "a gastroenterologist" into the Dall-E Mini model shows exclusively white male doctors, while prompting for "nurse" generates results featuring only women.

Users must be conscious of these biases when using AI, and note that these biases change between AI models. While datasets and their contents can often be opaque to end-users, training your own AI can be a good way to stay well-informed and considerate of dataset ethics.

Research Guides

Artificial Intelligence for Image Research

Datasets, Bias, and Discrimination

Datasets

Bias and Discrimination

Library links

Libraries

Contact

Connect with us