Google researchers have published a new paper outlining techniques for extracting personal data from deep learning models:
We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.Extracting Training Data from Large Language Models
The technique involves repeatedly sampling model responses to a specific starting prompt. If the responses are the mostly the same, the technique assumes that the model is likely to be responding with memorized text, and that this memorized text is more likely to contain personal data.
Identifying and extracting memorized text leaks personal data:
Nice research, and good example of why (sometimes) deep learning models can have privacy implications.
Cade Metz, writing for the NYT:
When the Chula Vista police receive a 911 call, they can dispatch a flying drone with the press of button.Police Drones Are Starting to Think for Themselves
Some interesting civil liberty issues being raised. But no doubt very useful.
Remarkable example of AI that could scale and protect:
In a paper published recently in the IEEE Journal of Engineering in Medicine and Biology, the team reports on an AI model that distinguishes asymptomatic people from healthy individuals through forced-cough recordings, which people voluntarily submitted through web browsers and devices such as cellphones and laptops.
The researchers trained the model on tens of thousands of samples of coughs, as well as spoken words. When they fed the model new cough recordings, it accurately identified 98.5 percent of coughs from people who were confirmed to have Covid-19, including 100 percent of coughs from asymptomatics — who reported they did not have symptoms but had tested positive for the virus.Artificial intelligence model detects asymptomatic Covid-19 infections through cellphone-recorded coughs
Is it ok to be surveilled at all times if it keeps you super safe? I expect this is exactly the trade-off most people will accept.
Security engineer Peter Gasper:
What I found is that I can ask Waze API for data on a location by sending my latitude and longitude coordinates. Except the essential traffic information, Waze also sends me coordinates of other drivers who are nearby. What caught my eyes was that identification numbers (ID) associated with the icons were not changing over time. I decided to track one driver and after some time she really appeared in a different place on the same road.Waze: How I Tracked Your Mother (via Schneier on Security)
The task of proper anonymization is harder than it looks. Yet another example:
It turns out, though, that those redactions are possible to crack. That’s because the deposition—which you can read in full here—includes a complete alphabetized index of the redacted and unredacted words that appear in the document.We Cracked the Redactions in the Ghislaine Maxwell Deposition (via Schneier on Security)
This seems to be a corollary of Schneier’s Law: Any person can anonymize data in a way that he or she can’t imagine breaking it.
Although the truth is most don’t even try to break their own work.
Kashmir Hill, writing for the NYT:
“I am involved with developing facial recognition to in fact use on Portland police officers, since they are not identifying themselves to the public,” Mr. Howell said. Over the summer, with the city seized by demonstrations against police violence, leaders of the department had told uniformed officers that they could tape over their name. Mr. Howell wanted to know: Would his use of facial recognition technology become illegal?
Portland’s mayor, Ted Wheeler, told Mr. Howell that his project was “a little creepy,” but a lawyer for the city clarified that the bills would not apply to individuals. The Council then passed the legislation in a unanimous vote.Activists Turn Facial Recognition Tools Against the Police
The democratization of surveillance continues, and it will not slow down.
The FBI does not conduct mass surveillance. But many US corporations do, as a normal part of their business model. And the FBI uses that surveillance infrastructure to conduct its own about searches. Google Responds to Warrants for “About” Searches
If you Google someone’s home address, and then something bad happens to that home, Google knows. And so does the FBI.
Kashmir Hill for the NYT:
Floyd Abrams, one of the most prominent First Amendment lawyers in the country, has a new client: the facial recognition company Clearview AI.
Litigation against the start-up “has the potential of leading to a major decision about the interrelationship between privacy claims and First Amendment defenses in the 21st century,” Mr. Abrams said in a phone interview. He said the underlying legal questions could one day reach the Supreme Court.Facial Recognition Start-Up Mounts a First Amendment Defense
Is everything known to the public truly available for any use whatsoever? We are trending away from that view, and this will be a battle to watch closely.
James Vincent, reporting for The Verge:
As reported by The Philadelphia Inquirer, at the start of their investigation, FBI agents only had access to helicopter footage from a local news station. This showed a woman wearing a bandana throwing flaming debris into the smashed window of a police sedan.
By searching for videos of the protests uploaded to Instagram and Vimeo, the agents were able to find additional footage of the incident, and spotted a peace sign tattoo on the woman’s right forearm. After finding a set of 500 pictures of the protests shared by an amateur photographer, they were able to clearly see what the woman was wearing, including a T-shirt with the slogan: “Keep the Immigrants. Deport the Racists.”
The only place to buy this exact T-shirt was an Etsy store, where a user calling themselves “alleycatlore” had left a five-star review for the seller just few days before the protest. Using Google to search for this username, agents then found a matching profile at the online fashion marketplace Poshmark which listed the user’s name as “Lore-elisabeth.”
A search for “Lore-elisabeth” led to a LinkedIn profile for one Lore Elisabeth Blumenthal, employed as a massage therapist at a Philadelphia massage studio. Videos hosted by the studio showed an individual with the same distinctive peace tattoo on their arm. A phone number listed for Blumenthal led to an address. As reported by NBC Philadelphia, a subpoena served to the Etsy seller showed a “Keep the Immigrants. Deport the Racists.” T-shirt had recently been delivered to that same address.FBI used Instagram, an Etsy review, and LinkedIn to identify a protestor accused of arson