AI model predicts who will become homeless


It pulls data from eight county agencies to pinpoint whom to assist, looking at a broad range of data in county systems: Who has landed in the emergency room. Who has been booked in jail. Who has suffered a psychiatric crisis that led to hospitalization. Who has gotten cash aid or food benefits — and who has listed a county office as their “home address” for such programs, an indicator that often means they were homeless at the time.

A computer model predicts who will become homeless in L.A. Then these workers step in

That’s a lot of sensitive personal data. The word “privacy” does not appear in the article.

Data is of course exceptionally helpful in making sure money and resources are applied efficiently. (See also personalized advertising.)

This seems great, so… ok?

Frustration with GDPR bottleneck in Ireland

VINCENT MANANCOURT writing for Politico:

So far, officials at the EU level have put up a dogged defense of what has become one of their best-known rulebooks, including by publicly pushing back against calls to punish Ireland for what activists say is a failure to bring Big Tech’s data-hungry practices to heel.

Now, one of the European Union’s key voices on data protection regulation is breaking the Brussels taboo of questioning the bloc’s flagship law’s performance so far.

“I think there are parts of the GDPR that definitely have to be adjusted to the future reality,” European Data Protection Supervisor Wojciech Wiewiórowski told POLITICO in an interview earlier this month.

What’s wrong with the GDPR?

The main complaint appears to be that the Irish Data Protection Commission (which handles most big-tech privacy complaints) is overworked and slow.

Otherwise there appears to be a sense that things haven’t quite worked out as hoped, whatever that means.

The Privacy “Duty of Loyalty”

The draft American Data Privacy and Protection Act has a section called “duty of loyalty.” What the heck is that?

In the draft it’s a collection of specific requirements to minimize data collection and prohibit the use and transfer of social security numbers, precise geolocation, etc. See Sections 101, 102, 103 in the Discussion Draft.

But the “duty of loyalty” as a data privacy concept is broader. It means that data collectors must use data in a way that benefits users and places their interests above the interests of making a profit, much like a duty of loyalty (or a fiduciary duty) that a lawyer must have to their client.

Neil M. Richards and Woodrow Hartzog explain the concept in a 2021 paper:

Put simply, under our approach, loyalty would manifest itself primarily as a prohibition on designing digital tools and processing data in a way that conflicts with a trusting party’s best interests. Data collectors bound by such a duty of loyalty would be obligated to act in the best interests of the people exposing their data and engaging in online experiences, but only to the extent of their exposure. 

A Duty of Loyalty for Privacy Law at 966.

Richards and Hartzog suggest that a broad duty of loyalty combined with specific prohibitions against especially troubling practices would work like other areas of regulation (e.g., “unfair and deceptive trade practices”).

But although the American Data Privacy and Protection Act refers to this concept, the broad duty of loyalty is not (yet) part of the draft.

Privacy Policies are Long, redux

Geoffrey A. Fowler for the Washington Post:

Facebook . . . last week rewrote its infamous privacy policy to a secondary-school reading level — but also tripled its length to 12,000 words. The deeper I dug into them, the clearer it became that understandability isn’t our biggest privacy problem. Being overwhelmed is.

We the users shouldn’t be expected to read and consent to privacy policies. Instead, let’s use the law and technology to give us real privacy choices. And there are some very good ideas for how that could happen.

I tried to read all my app privacy policies. It was 1 million words.

Proposed solution is something like a privacy nutrition label. Or maybe machine readable privacy disclosures that enable automated decision making by a user’s client (e.g., phone or browser).

Location Data, redux

Your phone tracks you everywhere and other people can buy that data.

Here’s another story about this:

The data we were given showed what some in the tech industry might call a God-view vantage of that dark day. It included about 100,000 location pings for thousands of smartphones, revealing around 130 devices inside the Capitol exactly when Trump supporters were storming the building. Times Opinion is only publishing the names of people who gave their permission to be quoted in this article.

They Stormed the Capitol. Their Apps Tracked Them.

Extracting personal data from AI models

Google researchers have published a new paper outlining techniques for extracting personal data from deep learning models:

We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.

Extracting Training Data from Large Language Models

The technique involves repeatedly sampling model responses to a specific starting prompt. If the responses are the mostly the same, the technique assumes that the model is likely to be responding with memorized text, and that this memorized text is more likely to contain personal data.

Identifying and extracting memorized text leaks personal data:

Nice research, and good example of why (sometimes) deep learning models can have privacy implications.

AI detection of COVID-19 via cough

Remarkable example of AI that could scale and protect:

In a paper published recently in the IEEE Journal of Engineering in Medicine and Biology, the team reports on an AI model that distinguishes asymptomatic people from healthy individuals through forced-cough recordings, which people voluntarily submitted through web browsers and devices such as cellphones and laptops.

The researchers trained the model on tens of thousands of samples of coughs, as well as spoken words. When they fed the model new cough recordings, it accurately identified 98.5 percent of coughs from people who were confirmed to have Covid-19, including 100 percent of coughs from asymptomatics — who reported they did not have symptoms but had tested positive for the virus.

Artificial intelligence model detects asymptomatic Covid-19 infections through cellphone-recorded coughs

Is it ok to be surveilled at all times if it keeps you super safe? I expect this is exactly the trade-off most people will accept.

Anonymization is hard, Waze edition

Security engineer Peter Gasper:

What I found is that I can ask Waze API for data on a location by sending my latitude and longitude coordinates. Except the essential traffic information, Waze also sends me coordinates of other drivers who are nearby. What caught my eyes was that identification numbers (ID) associated with the icons were not changing over time. I decided to track one driver and after some time she really appeared in a different place on the same road.

Waze: How I Tracked Your Mother (via Schneier on Security)

Anonymizing is hard

The task of proper anonymization is harder than it looks. Yet another example:

It turns out, though, that those redactions are possible to crack. That’s because the deposition—which you can read in full here—includes a complete alphabetized index of the redacted and unredacted words that appear in the document.

We Cracked the Redactions in the Ghislaine Maxwell Deposition (via Schneier on Security)

This seems to be a corollary of Schneier’s Law: Any person can anonymize data in a way that he or she can’t imagine breaking it.

Although the truth is most don’t even try to break their own work.