Getty Images bans upload of AI-generated content

James Vincent, writing for The Verge:

Getty Images has banned the upload and sale of illustrations generated using AI art tools like DALL-E, Midjourney, and Stable Diffusion. It’s the latest and largest user-generated content platform to introduce such a ban, following similar decisions by sites including NewgroundsPurplePort, and FurAffinity.

Getty Images CEO Craig Peters told The Verge that the ban was prompted by concerns about the legality of AI-generated content and a desire to protect the site’s customers.

Getty Images bans AI-generated content over fears of legal challenges

Getty Images is being appropriately cautious. AI image synthesis tools, being trained on the open internet, can be easily prompted into copyright violations.

Creative Commons raises questions about use of CC-licensed works to train AI’s

Creative Commons licenses typically put few constraints on the re-use of copyrighted material. And that flexibility has allowed AI’s to be trained on CC-licensed material, which sometimes surprises copyright holders.

In a new blog post, Creative Commons outlines the issue and states that it will “examine, throughout the year, the intersection of AI and open content.”

155 votes in a Twitter poll where the plurality selects “Depends” is… not a lot of guidance.

AI image synthesis models may struggle with copyright

James Vincent, writing for The Verge:

Like most modern AI systems, Stable Diffusion is trained on a vast dataset that it mines for patterns and learns to replicate. In this case, that core of the training data is a huge package of 5 billion-plus pairs of images and text tags known as LAION-5B, all of which have been scraped from the public web. . . .

We know for certain that LAION-5B contains a lot of copyrighted content. An independent analysis of a 12 million-strong sample of the dataset found that nearly half the pictures contained were taken from just 100 domains. The most popular was Pinterest, constituting around 8.5 percent of the pictures sampled, while the next-biggest sources were sites known for hosting user-generated content (like Flickr, DeviantArt, and Tumblr) and stock photo sites like Getty Images and Shutterstock. In other words: sources that contain copyrighted content, whether from independent artists or professional photographers.

Anyone can use this AI art generator — that’s the risk

Vincent points out that Stable Diffusion even sometimes inserts the “Getty Images” watermark in its generated imagery. Not a good look.

More data for AI interpretation of patents

Google has released the Patent Phrase Similarity dataset, intended to help AI models better understand the somewhat odd world of patent language:

The process of using traditional patent search methods (e.g., keyword searching) to search through the corpus of over one hundred million patent documents can be tedious and result in many missed results due to the broad and non-standard language used. For example, a “soccer ball” may be described as a “spherical recreation device”, “inflatable sportsball” or “ball for ball game”.

Announcing the Patent Phrase Similarity Dataset

The dataset was used in the U.S. Patent Phrase to Phrase Matching Kaggle competition with some close-to-human results.

Commercial (legal) limitations of DALL-E 2

Louise Matsakis reporting for The Information:

At least one major brand has already tried incorporating Dall-e 2 into an advertising campaign, inadvertently demonstrating how legal snafus could arise. When Heinz’s marketing team fed Dall-e 2 “generic ketchup-related prompts,” the program almost exclusively produced images closely resembling the company’s trademarked condiment bottle. “We ultimately found that no matter how we were asking, we were still seeing results that looked like Heinz,” a company representative told AdWeek.

Can Creatives Survive the Future War Against Dall-e 2?

The image generation AI’s are remarkable, but they do still have significant technical limitations as well, particularly an inability to generate unusual images (“a cup on a spoon”).

Discovery sanctions for GDPR redactions

An order by Judge Payne out of the Eastern District of Texas does not agree that redactions allegedly required by GDPR were proper:

To further demonstrate the alleged bad faith application of the GDPR, Arigna showed where Continental blacked out the faces of its Executive Board in a picture even though that picture was available on Continental’s public website without the redactions. Based on these redactions and failure to timely produce the ESI, Argina seeks an adverse inference instruction; an order precluding Continental from using any document that it did not timely produce, and Arigna’s costs and fees.

In response, Continental argued (but did not show) that it received an opinion letter from a law firm based in Europe stating the redactions were required by the GDPR, and that it had worked diligently to produce the ESI while also complying with the GDPR.

July 29, 2022 Memorandum Order, Case No. 22-cv-00126 (EDTX)

Wikipedia influences judicial decisions

Bob Ambrogi:

To assess whether Wikipedia impacts judicial decisions, the researchers set out to test for two types of influence: (1) whether the creation of a Wikipedia article on a case leads to that case being cited more often in judicial decisions; and (2) whether the text of judicial decisions is influenced by the text of the corresponding Wikipedia article.

Scientists Conclude that Wikipedia Influences Judges’ Legal Reasoning

They found that the addition of a case to Wikipedia increased the case’s citations by 20%.

They also purport to demonstrate with natural language analysis that “a textual similarity exists between the judicial decisions and the Wikipedia articles.”

I’m skeptical that this method proves actual influence by a Wikipedia article. But it’s easy to believe that case salience would have an impact.

Is ShotSpotter AI?

 A federal lawsuit filed Thursday alleges Chicago police misused “unreliable” gunshot detection technology and failed to pursue other leads in investigating a grandfather from the city’s South Side who was charged with killing a neighbor.

. . . . .

ShotSpotter’s website says the company is “a leader in precision policing technology solutions” that help stop gun violence by using sensors, algorithms and artificial intelligence to classify 14 million sounds in its proprietary database as gunshots or something else.

Lawsuit: Chicago police misused ShotSpotter in murder case

Some commentators (e.g., link) have jumped on this story as an example of someone (allegedly) being wrongly imprisoned due to AI.

But maybe ShotSpotter is just bad software that is used improperly? Does it matter?

The definition of AI is so difficult that we may soon find ourselves regulating all software.

UK IPO suggests copyright exception for text and data mining

The United Kingdom’s Intellectual Property Office has concluded a study on “how AI should be dealt with in the patent and copyright systems.”

For text and data mining, we plan to introduce a new copyright and database exception which allows TDM for any purpose. Rights holders will still have safeguards to protect their content, including a requirement for lawful access.

Consultation outcome / Artificial Intelligence and IP: copyright and patents

They also considered copyright protection for computer-generated works without a human author, and patent protection for AI-devised inventions. But they suggest no changes in the law for these latter two areas.