Extracting personal data from AI models

Google researchers have published a new paper outlining techniques for extracting personal data from deep learning models:

We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.

Extracting Training Data from Large Language Models

The technique involves repeatedly sampling model responses to a specific starting prompt. If the responses are the mostly the same, the technique assumes that the model is likely to be responding with memorized text, and that this memorized text is more likely to contain personal data.

Identifying and extracting memorized text leaks personal data:

Nice research, and good example of why (sometimes) deep learning models can have privacy implications.

AI solves protein folding

Lots of news on the DeepMind announcement that it has solved the protein folding problem. From the NYT:

Computer scientists have struggled to build such a system for more than 50 years. For the last 25, they have measured and compared their efforts through a global competition called the Critical Assessment of Structure Prediction, or C.A.S.P. Until now, no contestant had even come close to solving the problem.

DeepMind solved the problem with a wide range of proteins, reaching an accuracy level that rivaled physical experiments. Many scientists had assumed that moment was still years, if not decades, away.

“I always hoped I would live to see this day,” said John Moult, a professor at the University of Maryland who helped create C.A.S.P. in 1994 and continues to oversee the biennial contest. “But it wasn’t always obvious I was going to make it.”

London A.I. Lab Claims Breakthrough That Could Accelerate Drug Discovery

This is phenomenal and wonderful, but it is also an oracle into which we have limited insight. To quote a 2018 essay on AlphaZero:

Suppose that deeper patterns exist to be discovered — in the ways genes are regulated or cancer progresses; in the orchestration of the immune system; in the dance of subatomic particles. And suppose that these patterns can be predicted, but only by an intelligence far superior to ours. If AlphaInfinity could identify and understand them, it would seem to us like an oracle.

We would sit at its feet and listen intently. We would not understand why the oracle was always right, but we could check its calculations and predictions against experiments and observations, and confirm its revelations. Science, that signal human endeavor, would reduce our role to that of spectators, gaping in wonder and confusion.

Maybe eventually our lack of insight would no longer bother us. After all, AlphaInfinity could cure all our diseases, solve all our scientific problems and make all our other intellectual trains run on time. We did pretty well without much insight for the first 300,000 years or so of our existence as Homo sapiens. And we’ll have no shortage of memory: we will recall with pride the golden era of human insight, this glorious interlude, a few thousand years long, between our uncomprehending past and our incomprehensible future.

AI detection of COVID-19 via cough

Remarkable example of AI that could scale and protect:

In a paper published recently in the IEEE Journal of Engineering in Medicine and Biology, the team reports on an AI model that distinguishes asymptomatic people from healthy individuals through forced-cough recordings, which people voluntarily submitted through web browsers and devices such as cellphones and laptops.

The researchers trained the model on tens of thousands of samples of coughs, as well as spoken words. When they fed the model new cough recordings, it accurately identified 98.5 percent of coughs from people who were confirmed to have Covid-19, including 100 percent of coughs from asymptomatics — who reported they did not have symptoms but had tested positive for the virus.

Artificial intelligence model detects asymptomatic Covid-19 infections through cellphone-recorded coughs

Is it ok to be surveilled at all times if it keeps you super safe? I expect this is exactly the trade-off most people will accept.

Surveillance for everyone

Kashmir Hill, writing for the NYT:

“I am involved with developing facial recognition to in fact use on Portland police officers, since they are not identifying themselves to the public,” Mr. Howell said. Over the summer, with the city seized by demonstrations against police violence, leaders of the department had told uniformed officers that they could tape over their name. Mr. Howell wanted to know: Would his use of facial recognition technology become illegal?

Portland’s mayor, Ted Wheeler, told Mr. Howell that his project was “a little creepy,” but a lawyer for the city clarified that the bills would not apply to individuals. The Council then passed the legislation in a unanimous vote.

Activists Turn Facial Recognition Tools Against the Police

The democratization of surveillance continues, and it will not slow down.

Believing AI’s is sometimes easy, and sometimes hard

Most ethicists are concerned that AI’s are wrong, and we harm people by deferring to them. But they can be right and ignored too:

NURSE DINA SARRO didn’t know much about  artificial intelligence when Duke University Hospital installed machine learning software to raise an alarm when a person was at risk of developing sepsis, a complication of infection that is the number one killer in US hospitals. The software, called Sepsis Watch, passed alerts from an algorithm Duke researchers had tuned with 32 million data points from past patients to the hospital’s team of rapid response nurses, co-led by Sarro.

But when nurses relayed those warnings to doctors, they sometimes encountered indifference or even suspicion. When docs questioned why the AI thought a patient needed extra attention, Sarro found herself in a tough spot. “I wouldn’t have a good answer because it’s based on an algorithm,” she says.

AI Can Help Patients—but Only If Doctors Understand It

AI test proctor fails

One college student went viral on TikTok after posting a video in which she said that a test proctoring program had flagged her behavior as suspicious because she was reading the question aloud, resulting in her professor assigning her a failing grade.

A student says test proctoring AI flagged her as cheating when she read a question out loud. Others say the software could have more dire consequences.

This is basic ethics: if your AI has real consequences, you’d better get it right.

Detecting deep fakes by detecting heart beats

Deep fakes have so far not learned to simulate heart beats in images, and so they can be detected as fraudulent. But given time they will learn this as well; it’s an arms race.

In other news, heart beats are clearly visible in processed images!

In particular, video of a person’s face contains subtle shifts in color that result from pulses in blood circulation. You might imagine that these changes would be too minute to detect merely from a video, but viewing videos that have been enhanced to exaggerate these color shifts will quickly disabuse you of that notion. This phenomenon forms the basis of a technique called photoplethysmography, or PPG for short, which can be used, for example, to monitor newborns without having to attach anything to a their very sensitive skin.

The Subtle Effects of Blood Circulation Can Be Used to Detect Deep Fakes (via Schneier on Security)

Check out the video at 1:30 and then again at 3:18:

The AI’s are certainly going to know a lot about us.

The value of distinguishing AI’s from humans

What will happen when we can no longer distinguish human tweets from AI tweets? Does it matter? Should we care? Will there be a verified human status?

Renée DiResta, writing for The Atlantic:

Amid the arms race surrounding AI-generated content, users and internet companies will give up on trying to judge authenticity tweet by tweet and article by article. Instead, the identity of the account attached to the comment, or person attached to the byline, will become a critical signal toward gauging legitimacy. Many users will want to know that what they’re reading or seeing is tied to a real person—not an AI-generated persona. . . .

. . . . .

The idea that a verified identity should be a precondition for contributing to public discourse is dystopian in its own way. Since the dawn of the nation, Americans have valued anonymous and pseudonymous speech: Alexander Hamilton, James Madison, and John Jay used the pen name Publius when they wrote the Federalist Papers, which laid out founding principles of American government. Whistleblowers and other insiders have published anonymous statements in the interest of informing the public. Figures as varied as the statistics guru Nate Silver (“Poblano”) and Senator Mitt Romney (“Pierre Delecto”) have used pseudonyms while discussing political matters on the internet. The goal shouldn’t be to end anonymity online, but merely to reserve the public square for people who exist—not for artificially intelligent propaganda generators.

The Supply of Disinformation Will Soon Be Infinite

The idea that we should reserve the public square for humans is remarkable, in just the sense that this technology is now upon us. Human sentiments have value; AI facsimiles do not.

An optimistic take is that perhaps we will instead pay attention to the useful content of such messages, rather than inflammatory rhetoric. A good idea is a good idea, AI or not.

Ransomware causes death

The Associated Press:

German authorities said Thursday that what appears to have been a misdirected hacker attack caused the failure of IT systems at a major hospital in Duesseldorf, and a woman who needed urgent admission died after she had to be taken to another city for treatment.

German Hospital Hacked, Patient Taken to Another City Dies

Looks like this was not intended, but the story is an illustration of how dependent we are on software.

Update: Nope. Patient was apparently not going to survive regardless.