AI image synthesis models may struggle with copyright

James Vincent, writing for The Verge:

Like most modern AI systems, Stable Diffusion is trained on a vast dataset that it mines for patterns and learns to replicate. In this case, that core of the training data is a huge package of 5 billion-plus pairs of images and text tags known as LAION-5B, all of which have been scraped from the public web. . . .

We know for certain that LAION-5B contains a lot of copyrighted content. An independent analysis of a 12 million-strong sample of the dataset found that nearly half the pictures contained were taken from just 100 domains. The most popular was Pinterest, constituting around 8.5 percent of the pictures sampled, while the next-biggest sources were sites known for hosting user-generated content (like Flickr, DeviantArt, and Tumblr) and stock photo sites like Getty Images and Shutterstock. In other words: sources that contain copyrighted content, whether from independent artists or professional photographers.

Anyone can use this AI art generator — that’s the risk

Vincent points out that Stable Diffusion even sometimes inserts the “Getty Images” watermark in its generated imagery. Not a good look.

AI that learns from the internet

Ben Thompson at Stratechery points out that new deep learning models are not requiring access to curated data in a way that would advantage large companies:

If not just data but clean data was presumed to be a prerequisite, then it seemed obvious that massively centralized platforms with the resources to both harvest and clean data — Google, Facebook, etc. — would have a big advantage.

. . . . .

To the extent that large language models (and I should note that while I’m focusing on image generation, there are a whole host of companies working on text output as well) are dependent not on carefully curated data, but rather on the Internet itself, is the extent to which AI will be democratized, for better or worse.

The AI Unbundling

This means the new AI models are relatively cheap and also more a reflection of internet content itself, “for better or worse.”

Facebook does not know what data it has

Bruce Schneier, linking to an article in The Intercept about a court hearing in the Cambridge Analytica suit:

Facebook’s inability to comprehend its own functioning took the hearing up to the edge of the metaphysical. At one point, the court-appointed special master noted that the “Download Your Information” file provided to the suit’s plaintiffs must not have included everything the company had stored on those individuals because it appears to have no idea what it truly stores on anyone. Can it be that Facebook’s designated tool for comprehensively downloading your information might not actually download all your information? This, again, is outside the boundaries of knowledge.

“The solution to this is unfortunately exactly the work that was done to create the DYI file itself,” noted Zarashaw. “And the thing I struggle with here is in order to find gaps in what may not be in DYI file, you would by definition need to do even more work than was done to generate the DYI files in the first place.”

FACEBOOK ENGINEERS: WE HAVE NO IDEA WHERE WE KEEP ALL YOUR PERSONAL DATA

Schneier has repeatedly made this fundamental but counter-intuitive point: “Today, it’s easier to build complex systems than it is to build simple ones.”

None of this is surprising to people familiar with modern data center services at scale. Twitter allegedly doesn’t know how to restart its services if they really go down:

The company also lacks sufficient redundancies and procedures to restart or recover from data center crashes, Zatko’s disclosure says, meaning that even minor outages of several data centers at the same time could knock the entire Twitter service offline, perhaps for good.

Ex-Twitter exec blows the whistle, alleging reckless and negligent cybersecurity policies

Most of this is overblown rhetoric, but the underlying point is that no single person understands how any of these complex systems work. And they are not easy to fix or change.

4.2 gigabytes of pure knowledge

Andy Salerno created a digital painting with Stable Diffusion, a new open source image synthesis model that allows anyone with a PC and a decent GPU to create almost any image they can describe:

Andy Salerno’s masterpiece created “with literally dozens of minutes of experience”

Salerno’s step-by-step guide is very straightforward and worth the review.

Most remarkable perhaps is that Stable Diffusion is small enough to be used by almost anyone:

4.2 gigabytes.

That’s the size of the model that has made this recent explosion possible.

4.2 gigabytes of floating points that somehow encode so much of what we know.

4.2 Gigabytes, or: How to Draw Anything

Stable Diffusion was trained on about 5 billion images and cost about $600k on GPUs leased from AWS.

Learning from synthetic data

Microsoft trained an excellent 3D face reconstruction model using synthetic data.

Synthetic (i.e. computer generated) data is helpful because it takes a long time for humans to look at many faces and label all of their features. But synthetic data arrives already labeled. And that allows for good and fast training:

Can we keep things simple by just using more landmarks?

In answer, we present the first method that accurately predicts ten times as many landmarks as usual, covering the whole head, including the eyes and teeth. This is accomplished using synthetic training data, which guarantees perfect landmark annotations.

3D Face Reconstruction with Dense Landmarks

Maybe the police should be able to use facial recognition…

Scott Ikeda for CPO Magazine:

Some cities and states that were early to ban law enforcement from using facial recognition software appear to be having second thoughts, which privacy advocates with the Electronic Frontier Foundation (EFF) and other organizations largely attribute to an uptick in certain types of urban crime.

Facial Recognition Bans Begin To Fall Around the US as Re-Funding of Law Enforcement Becomes Politically Popular

New Orleans and Virginia have both backtracked a bit with facial recognition technology now being allowed with supervision and for more serious types of crime.

Virginia in particular has imposed a requirement that facial recognition technology have an accuracy rating of at least 98% across all demographics.

More data for AI interpretation of patents

Google has released the Patent Phrase Similarity dataset, intended to help AI models better understand the somewhat odd world of patent language:

The process of using traditional patent search methods (e.g., keyword searching) to search through the corpus of over one hundred million patent documents can be tedious and result in many missed results due to the broad and non-standard language used. For example, a “soccer ball” may be described as a “spherical recreation device”, “inflatable sportsball” or “ball for ball game”.

Announcing the Patent Phrase Similarity Dataset

The dataset was used in the U.S. Patent Phrase to Phrase Matching Kaggle competition with some close-to-human results.

Commercial (legal) limitations of DALL-E 2

Louise Matsakis reporting for The Information:

At least one major brand has already tried incorporating Dall-e 2 into an advertising campaign, inadvertently demonstrating how legal snafus could arise. When Heinz’s marketing team fed Dall-e 2 “generic ketchup-related prompts,” the program almost exclusively produced images closely resembling the company’s trademarked condiment bottle. “We ultimately found that no matter how we were asking, we were still seeing results that looked like Heinz,” a company representative told AdWeek.

Can Creatives Survive the Future War Against Dall-e 2?

The image generation AI’s are remarkable, but they do still have significant technical limitations as well, particularly an inability to generate unusual images (“a cup on a spoon”).