The dark secret behind AI-generated cute animal images
It is no secret that large models, such as DALL-E 2 and Imagen, are trained on large amounts of documents and images pulled from the web, absorbing the worst aspects of that data as well. the best. OpenAI and Google explicitly recognize this.
Scroll down Imagen website—Of dragon fruit with a karate belt and a small cactus wearing a hat and sunglasses — go to the social impact section and you get this: “While a subset of our training data was filtered to remove unwanted noise and content, such as pornographic images and harmful language, we have also used [the] The LAION-400M dataset is known to contain a variety of inappropriate content including sexually explicit images, racist slurs, and harmful social stereotypes. Imagen is based on text encoders trained on unsaturated web-scale data and thus inherits the social biases and limitations of large language models. As a result, there is a risk that Imagen has codified harmful stereotypes and representations, which has led to our decision not to release Imagen for public use without safeguards. other guard. ”
It’s like the kind of acknowledgement that OpenAI gave when it revealed GPT-3 in 2019: “internet-trained models have internet-scale biases”. And as Mike Cook, who studies AI creativity at Queen Mary University of London, has pointed out, it’s in the ethical statements that accompany Google’s big language model PaLM and DALL-E 2. by OpenAI. In short, these companies know that their model is capable of producing bad content, and they don’t know how to fix it.