OpenAI has a post explaining the three main techniques it used to “prevent generated images from violating our content policy.”
First, they filtered out violent and sexual images from the training dataset:
[W]e prioritized filtering out all of the bad data over leaving in all of the good data. This is because we can always fine-tune our model with more data later to teach it new things, but it’s much harder to make the model forget something that it has already learned.
Second, they found that the filtering can actually amplify bias because the smaller remaining datasets may be less diverse:
We hypothesize that this particular case of bias amplification comes from two places: first, even if women and men have roughly equal representation in the original dataset, the dataset may be biased toward presenting women in more sexualized contexts; and second, our classifiers themselves may be biased either due to implementation or class definition, despite our efforts to ensure that this was not the case during the data collection and validation phases. Due to both of these effects, our filter may remove more images of women than men, which changes the gender ratio that the model observes in training.
They fix this by re-weighting the training dataset so that the categories of filtered data are as balanced as the categories of unfiltered data.
Third, they needed to prevent image regurgitation to avoid IP and privacy issues. They found that most regurgitated images (a) were simple vector graphics; and (b) had many near-duplicates in the training set. As a result, these images were easier for the model to memorize. So they de-duplicated images with a clustering algorithm.
To test the effect of deduplication on our models, we trained two models with identical hyperparameters: one on the full dataset, and one on the deduplicated version of the dataset. . . . Surprisingly, we found that human evaluators slightly preferred the model trained on deduplicated data, suggesting that the large amount of redundant images in the dataset was actually hurting performance.
Given the obviously impressive results, this is an instructive set of techniques for AI model bias mitigation.