The Private AI Series, Course 1, Part 2
Limitations of information flows: My summary of lesson 3 of the first course in The Private AI Series by OpenMined
- Limitations of Information Flows
- The Copy Problem
- The Bundling Problem
- The Recursive Enforcement Problem
- Conclusion
Limitations of Information Flows
In part 1 of my summary of the Private AI series we covered lesson 2. This lesson was all about information flows and how they are fundamental to our society and human collaboration. We also learned about how information flows are often broken today because of the privacy-transparency trade-off.
- The copy problem
- The bundling problem
- The recursive enforcement problem
The Copy Problem
There are laws attempting to prevent people from misusing information, like HIPPA or GDPR or CCPA. But they are really difficult to enforce.
That's why the copy problem is so important as a technical issue. Because - no matter what the law says - it determines what people actually can do with a piece of information.
You might be tempted to say: uncontrolled copying of all information sounds terrible, let's stop this! But be careful. While the copy problem might hurt you sometimes, it is also protecting some of your most treasured freedoms. While anyone who stores your information can make copies of it, you can also copy anyone's data that you store. Any attempt to limit this ability could have a big impact on your life.
Example: Digital piracy - the sharing of copyrighted songs, movies, software - is a classic example of the copy problem. As soon as a digital copy of a file is sold to the first customer, this customer could share it with all other potential customers. There is no way for the copyright holder to control this.
In reaction to this, the entertainment industry developed DRM software. You can read about DRM in this comprehensive article.
But funding for the creation of art also stands at risk. Artists deserve compensation for the value they create!
An ideal solution would be a very selective enforcement of a copy limitation. Unfortunately, this is impossible to do: computers are machines that operate by making copies. Even a stream is a download - without a save button. But you still can make copies of the content. To prevent data from being copied, you need incredibly invasive software.
Example: Dropbox prevents you from sharing copyrighted material. They scan every file you upload to a shared folder, to check if it contains copyrighted material.
The copy problem causes a privacy-transparency trade-off. Sometimes you might want to share data, but you have to weigh the benefits of sharing against the risks of misuse. A solution would radically change many industries, offering the best that both sides have to offer.
The Bundling Problem
This problem is everywhere. More examples:
- You share an image to prove something, but there are other things in that image, too
- A news organization reports about protests. It shows videos of individual protesters, which could later be used against them
- Researchers share sensitive medical data, when all they needed were the patterns within this data
The Problem of Surveillance
Another example is home security systems. If you set up a video camera outside your front door, does it only record information about intruders? Of course not! It records every person that walks by, every car, every dog. Absolutely everything, 24/7 and 365 days a year. Your ability to watch the 0.01 percent of the footage that actually matters, comes bundled with the need to record also the other 99.99 percent. And we hope that the 99.99 percent are not misused.
Almost all sorts of surveillance suffer from this bundling problem. Rare events justify the collection of massive amounts of information, which is not supposed to be used for anything. Most people don't know how to build a surveillance system that only records the rare events that it is intended to identify. But at the end of this course, you will learn how to do this.
AI Governance
The bundling problem is also a topic in AI governance.
- How exactly the algorithm works might be valuable intellectual property
- If the details of the algorithm were public, it might be easy to fool
Artificial Bundling Problems
Sometimes information that could be unbundled isn't unbundled, because someone in a powerful position does not want it to be unbundled.
Examples:
- You have to provide your email address to read an article
- You want to use a free trial, but you have to enter your full details and credit card
- You want to text with your friends, but you have to agree that a service scans all images and links you send
So, there are different forms of the bundling problem:
- Artificial bundling problems that are forced upon you
- Natural bundling problems
The boundary between them is increasingly grey. But in this course, you will learn how to tell the difference. And in many cases, how to avoid both.
The Recursive Enforcement Problem
Couldn't third party oversight institutions solve a lot of the issues caused by the copy problem and bundling problems? Why not make undesirable uses of data illegal? While this sounds good in theory, enforcing such rules is much harder to do in practice.
The solution seems to be: the data must stay on the supervisor's machine, not on the student's computer. This might be a bit of an inconvenience, but now the supervisor can watch everything the student does with the data. But what about the supervisor: now he has the ability to misuse the data! Who controls the supervisor? The university? And so on. We call this the recursive enforcement problem. It is also called the recursive oversight problem.
It's one of the most important problems we face. This is the core technical problem of data governance. If you have to put data onto someone's computer, then who makes sure that that someone doesn't misuse it?
This is much harder to do with data. How can multiple people have ownership over a data point, that still has to live on a single machine?
There is a new class of technologies that allows this, and we will learn about it in the next part.
Conclusion
This lesson explored the three major technical problems that underlie the privacy-transparency trade-off. The copy problem, the bundling problem, and the recursive enforcement problem.
In the last two lessons, we learned a lot about the problems of today's information flows. In Part 3, we will begin to learn about solutions!
If you found a paragraph that needs improvement, please let me know in the comment section or on Twitter, I'm @daflowjoe. I'm also happy to hear from you if you found this summary helpful! :)