Google researchers have published a new paper outlining techniques for extracting personal data from deep learning models:
We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.Extracting Training Data from Large Language Models
The technique involves repeatedly sampling model responses to a specific starting prompt. If the responses are the mostly the same, the technique assumes that the model is likely to be responding with memorized text, and that this memorized text is more likely to contain personal data.
Identifying and extracting memorized text leaks personal data:
Nice research, and good example of why (sometimes) deep learning models can have privacy implications.
Cade Metz, writing for the NYT:
When the Chula Vista police receive a 911 call, they can dispatch a flying drone with the press of button.Police Drones Are Starting to Think for Themselves
Some interesting civil liberty issues being raised. But no doubt very useful.
Lots of news on the DeepMind announcement that it has solved the protein folding problem. From the NYT:
Computer scientists have struggled to build such a system for more than 50 years. For the last 25, they have measured and compared their efforts through a global competition called the Critical Assessment of Structure Prediction, or C.A.S.P. Until now, no contestant had even come close to solving the problem.
DeepMind solved the problem with a wide range of proteins, reaching an accuracy level that rivaled physical experiments. Many scientists had assumed that moment was still years, if not decades, away.
“I always hoped I would live to see this day,” said John Moult, a professor at the University of Maryland who helped create C.A.S.P. in 1994 and continues to oversee the biennial contest. “But it wasn’t always obvious I was going to make it.”London A.I. Lab Claims Breakthrough That Could Accelerate Drug Discovery
This is phenomenal and wonderful, but it is also an oracle into which we have limited insight. To quote a 2018 essay on AlphaZero:
Suppose that deeper patterns exist to be discovered — in the ways genes are regulated or cancer progresses; in the orchestration of the immune system; in the dance of subatomic particles. And suppose that these patterns can be predicted, but only by an intelligence far superior to ours. If AlphaInfinity could identify and understand them, it would seem to us like an oracle.
We would sit at its feet and listen intently. We would not understand why the oracle was always right, but we could check its calculations and predictions against experiments and observations, and confirm its revelations. Science, that signal human endeavor, would reduce our role to that of spectators, gaping in wonder and confusion.
Maybe eventually our lack of insight would no longer bother us. After all, AlphaInfinity could cure all our diseases, solve all our scientific problems and make all our other intellectual trains run on time. We did pretty well without much insight for the first 300,000 years or so of our existence as Homo sapiens. And we’ll have no shortage of memory: we will recall with pride the golden era of human insight, this glorious interlude, a few thousand years long, between our uncomprehending past and our incomprehensible future.
Remarkable example of AI that could scale and protect:
In a paper published recently in the IEEE Journal of Engineering in Medicine and Biology, the team reports on an AI model that distinguishes asymptomatic people from healthy individuals through forced-cough recordings, which people voluntarily submitted through web browsers and devices such as cellphones and laptops.
The researchers trained the model on tens of thousands of samples of coughs, as well as spoken words. When they fed the model new cough recordings, it accurately identified 98.5 percent of coughs from people who were confirmed to have Covid-19, including 100 percent of coughs from asymptomatics — who reported they did not have symptoms but had tested positive for the virus.Artificial intelligence model detects asymptomatic Covid-19 infections through cellphone-recorded coughs
Is it ok to be surveilled at all times if it keeps you super safe? I expect this is exactly the trade-off most people will accept.
Security engineer Peter Gasper:
What I found is that I can ask Waze API for data on a location by sending my latitude and longitude coordinates. Except the essential traffic information, Waze also sends me coordinates of other drivers who are nearby. What caught my eyes was that identification numbers (ID) associated with the icons were not changing over time. I decided to track one driver and after some time she really appeared in a different place on the same road.Waze: How I Tracked Your Mother (via Schneier on Security)
The task of proper anonymization is harder than it looks. Yet another example:
It turns out, though, that those redactions are possible to crack. That’s because the deposition—which you can read in full here—includes a complete alphabetized index of the redacted and unredacted words that appear in the document.We Cracked the Redactions in the Ghislaine Maxwell Deposition (via Schneier on Security)
This seems to be a corollary of Schneier’s Law: Any person can anonymize data in a way that he or she can’t imagine breaking it.
Although the truth is most don’t even try to break their own work.
Kashmir Hill, writing for the NYT:
“I am involved with developing facial recognition to in fact use on Portland police officers, since they are not identifying themselves to the public,” Mr. Howell said. Over the summer, with the city seized by demonstrations against police violence, leaders of the department had told uniformed officers that they could tape over their name. Mr. Howell wanted to know: Would his use of facial recognition technology become illegal?
Portland’s mayor, Ted Wheeler, told Mr. Howell that his project was “a little creepy,” but a lawyer for the city clarified that the bills would not apply to individuals. The Council then passed the legislation in a unanimous vote.Activists Turn Facial Recognition Tools Against the Police
The democratization of surveillance continues, and it will not slow down.
The FBI does not conduct mass surveillance. But many US corporations do, as a normal part of their business model. And the FBI uses that surveillance infrastructure to conduct its own about searches. Google Responds to Warrants for “About” Searches
If you Google someone’s home address, and then something bad happens to that home, Google knows. And so does the FBI.
Most ethicists are concerned that AI’s are wrong, and we harm people by deferring to them. But they can be right and ignored too:
NURSE DINA SARRO didn’t know much about artificial intelligence when Duke University Hospital installed machine learning software to raise an alarm when a person was at risk of developing sepsis, a complication of infection that is the number one killer in US hospitals. The software, called Sepsis Watch, passed alerts from an algorithm Duke researchers had tuned with 32 million data points from past patients to the hospital’s team of rapid response nurses, co-led by Sarro.
But when nurses relayed those warnings to doctors, they sometimes encountered indifference or even suspicion. When docs questioned why the AI thought a patient needed extra attention, Sarro found herself in a tough spot. “I wouldn’t have a good answer because it’s based on an algorithm,” she says.AI Can Help Patients—but Only If Doctors Understand It
One college student went viral on TikTok after posting a video in which she said that a test proctoring program had flagged her behavior as suspicious because she was reading the question aloud, resulting in her professor assigning her a failing grade.A student says test proctoring AI flagged her as cheating when she read a question out loud. Others say the software could have more dire consequences.
This is basic ethics: if your AI has real consequences, you’d better get it right.