Extracting personal data from AI models

Google researchers have published a new paper outlining techniques for extracting personal data from deep learning models:

We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.

Extracting Training Data from Large Language Models

The technique involves repeatedly sampling model responses to a specific starting prompt. If the responses are the mostly the same, the technique assumes that the model is likely to be responding with memorized text, and that this memorized text is more likely to contain personal data.

Identifying and extracting memorized text leaks personal data:

Nice research, and good example of why (sometimes) deep learning models can have privacy implications.