Google researchers have published a new paper outlining techniques for extracting personal data from deep learning models:
We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.
The technique involves repeatedly sampling model responses to a specific starting prompt. If the responses are the mostly the same, the technique assumes that the model is likely to be responding with memorized text, and that this memorized text is more likely to contain personal data.
Identifying and extracting memorized text leaks personal data:
Nice research, and good example of why (sometimes) deep learning models can have privacy implications.
Computer scientists have struggled to build such a system for more than 50 years. For the last 25, they have measured and compared their efforts through a global competition called the Critical Assessment of Structure Prediction, or C.A.S.P. Until now, no contestant had even come close to solving the problem.
DeepMind solved the problem with a wide range of proteins, reaching an accuracy level that rivaled physical experiments. Many scientists had assumed that moment was still years, if not decades, away.
“I always hoped I would live to see this day,” said John Moult, a professor at the University of Maryland who helped create C.A.S.P. in 1994 and continues to oversee the biennial contest. “But it wasn’t always obvious I was going to make it.”
This is phenomenal and wonderful, but it is also an oracle into which we have limited insight. To quote a 2018 essay on AlphaZero:
Suppose that deeper patterns exist to be discovered — in the ways genes are regulated or cancer progresses; in the orchestration of the immune system; in the dance of subatomic particles. And suppose that these patterns can be predicted, but only by an intelligence far superior to ours. If AlphaInfinity could identify and understand them, it would seem to us like an oracle.
We would sit at its feet and listen intently. We would not understand why the oracle was always right, but we could check its calculations and predictions against experiments and observations, and confirm its revelations. Science, that signal human endeavor, would reduce our role to that of spectators, gaping in wonder and confusion.
Maybe eventually our lack of insight would no longer bother us. After all, AlphaInfinity could cure all our diseases, solve all our scientific problems and make all our other intellectual trains run on time. We did pretty well without much insight for the first 300,000 years or so of our existence as Homo sapiens. And we’ll have no shortage of memory: we will recall with pride the golden era of human insight, this glorious interlude, a few thousand years long, between our uncomprehending past and our incomprehensible future.
Remarkable example of AI that could scale and protect:
In a paper published recently in the IEEE Journal of Engineering in Medicine and Biology, the team reports on an AI model that distinguishes asymptomatic people from healthy individuals through forced-cough recordings, which people voluntarily submitted through web browsers and devices such as cellphones and laptops.
The researchers trained the model on tens of thousands of samples of coughs, as well as spoken words. When they fed the model new cough recordings, it accurately identified 98.5 percent of coughs from people who were confirmed to have Covid-19, including 100 percent of coughs from asymptomatics — who reported they did not have symptoms but had tested positive for the virus.
“I am involved with developing facial recognition to in fact use on Portland police officers, since they are not identifying themselves to the public,” Mr. Howell said. Over the summer, with the city seized by demonstrations against police violence, leaders of the department had told uniformed officers that they could tape over their name. Mr. Howell wanted to know: Would his use of facial recognition technology become illegal?
Portland’s mayor, Ted Wheeler, told Mr. Howell that his project was “a little creepy,” but a lawyer for the city clarified that the bills would not apply to individuals. The Council then passed the legislation in a unanimous vote.
Most ethicists are concerned that AI’s are wrong, and we harm people by deferring to them. But they can be right and ignored too:
NURSE DINA SARRO didn’t know much about artificial intelligence when Duke University Hospital installed machine learning software to raise an alarm when a person was at risk of developing sepsis, a complication of infection that is the number one killer in US hospitals. The software, called Sepsis Watch, passed alerts from an algorithm Duke researchers had tuned with 32 million data points from past patients to the hospital’s team of rapid response nurses, co-led by Sarro.
But when nurses relayed those warnings to doctors, they sometimes encountered indifference or even suspicion. When docs questioned why the AI thought a patient needed extra attention, Sarro found herself in a tough spot. “I wouldn’t have a good answer because it’s based on an algorithm,” she says.
One college student went viral on TikTok after posting a video in which she said that a test proctoring program had flagged her behavior as suspicious because she was reading the question aloud, resulting in her professor assigning her a failing grade.
Deep fakes have so far not learned to simulate heart beats in images, and so they can be detected as fraudulent. But given time they will learn this as well; it’s an arms race.
In other news, heart beats are clearly visible in processed images!
In particular, video of a person’s face contains subtle shifts in color that result from pulses in blood circulation. You might imagine that these changes would be too minute to detect merely from a video, but viewing videos that have been enhanced to exaggerate these color shifts will quickly disabuse you of that notion. This phenomenon forms the basis of a technique called photoplethysmography, or PPG for short, which can be used, for example, to monitor newborns without having to attach anything to a their very sensitive skin.
What will happen when we can no longer distinguish human tweets from AI tweets? Does it matter? Should we care? Will there be a verified human status?
Renée DiResta, writing for The Atlantic:
Amid the arms race surrounding AI-generated content, users and internet companies will give up on trying to judge authenticity tweet by tweet and article by article. Instead, the identity of the account attached to the comment, or person attached to the byline, will become a critical signal toward gauging legitimacy. Many users will want to know that what they’re reading or seeing is tied to a real person—not an AI-generated persona. . . .
. . . . .
The idea that a verified identity should be a precondition for contributing to public discourse is dystopian in its own way. Since the dawn of the nation, Americans have valued anonymous and pseudonymous speech: Alexander Hamilton, James Madison, and John Jay used the pen name Publius when they wrote the Federalist Papers, which laid out founding principles of American government. Whistleblowers and other insiders have published anonymous statements in the interest of informing the public. Figures as varied as the statistics guru Nate Silver (“Poblano”) and Senator Mitt Romney (“Pierre Delecto”) have used pseudonyms while discussing political matters on the internet. The goal shouldn’t be to end anonymity online, but merely to reserve the public square for people who exist—not for artificially intelligent propaganda generators.
Microsoft Corp. has released a new version of its open-source DeepSpeed tool that it says will enable the creation of deep learning models with a trillion parameters, more than five times as many as in the world’s current largest model.
Microsoft AI tool enables ‘extremely large’ models with a trillion parameters
That’s a lot of transformations. If there’s a pattern, a trillion parameters should be able to find and store it.