Читать онлайн "The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World" - Domingos Pedro - RuLit

The last type of data-data you don’t share-also has a problem, which is that maybe you should share it. Maybe it hasn’t occurred to you to do so, maybe there’s no easy way to, or maybe you just don’t want to. In the latter case, you should consider whether you have an ethical responsibility to share. One example we’ve seen is cancer patients, who can contribute to curing cancer by sharing their tumors’ genomes and treatment histories. But it goes well beyond that. All sorts of questions about society and policy can potentially be answered by learning from the data we generate in our daily lives. Social science is entering a golden age, where it finally has data commensurate with the complexity of the phenomena it studies, and the benefits to all of us could be enormous-provided the data is accessible to researchers, policy makers, and citizens. This does not mean letting others peek into your private life; it means letting them see the learned models, which should contain only statistical information. So between you and them there needs to be an honest data broker that guarantees your data won’t be misused, but also that no free riders share the benefits without sharing the data.

In sum, all four kinds of data sharing have problems. These problems all have a common solution: a new type of company that is to your data what your bank is to your money. Banks don’t steal your money (with rare exceptions). They’re supposed to invest it wisely, and your deposits are FDIC-insured. Many companies today offer to consolidate your data somewhere in the cloud, but they’re still a far cry from your personal data bank. If they’re cloud providers, they try to lock you in-a big no-no. (Imagine depositing your money with Bank of America and not knowing if you’ll be able to transfer it to Wells Fargo somewhere down the line.) Some startups offer to hoard your data and then mete it out to advertisers in return for discounts, but to me that misses the point. Sometimes you want to give information to advertisers for free because it’s in your interests, sometimes you don’t want to give it at all, and what to share when is a problem that only a good model of you can solve.

The kind of company I’m envisaging would do several things in return for a subscription fee. It would anonymize your online interactions, routing them through its servers and aggregating them with its other users’. It would store all the data from all your life in one place-down to your 24/7 Google Glass video stream, if you ever get one. It would learn a complete model of you and your world and continually update it. And it would use the model on your behalf, always doing exactly what you would, to the best of the model’s ability. The company’s basic commitment to you is that your data and your model will never be used against your interests. Such a guarantee can never be foolproof-you yourself are not guaranteed to never do anything against your interests, after all. But the company’s life would depend on it as much as a bank’s depends on the guarantee that it won’t lose your money, so you should be able to trust it as much as you trust your bank.

A company like this could quickly become one of the most valuable in the world. As Alexis Madrigal of the Atlantic points out, today your profile can be bought for half a cent or less, but the value of a user to the Internet advertising industry is more like $1,200 per year. Google’s sliver of your data is worth about $20, Facebook’s $5, and so on. Add to that all the slivers that no one has yet, and the fact that the whole is more than the sum of the parts-a model of you based on all your data is much better than a thousand models based on a thousand slivers-and we’re looking at easily over a trillion dollars per year for an economy the size of the United States. It doesn’t take a large cut of that to make a Fortune 500 company. If you decide to take up the challenge and wind up becoming a billionaire, remember where you first got the idea.

Of course, some existing companies would love to host the digital you. Google, for example. Sergey Brin says that “we want Google to be the third half of your brain,” and some of Google’s acquisitions are probably not unrelated to how well their streams of user data complement its own. But, despite their head start, companies like Google and Facebook are not well suited to being your digital home because they have a conflict of interest. They earn a living by targeting ads, and so they have to balance your interests and the advertisers’. You wouldn’t let the first or second half of your brain have divided loyalties, so why would you let the third?

One possible showstopper is that the government may subpoena your data or even preventively jail you, Minority Report-style, if your model looks like a criminal’s. To forestall that, your data company can keep everything encrypted, with the key in your possession. (These days you can even compute over encrypted data without ever decrypting it.) Or you can keep it all in your hard disk at home, and the company just rents you the software.

If you don’t like the idea of a profit-making entity holding the keys to your kingdom, you can join a data union instead. (If there isn’t one in your neck of the cyberwoods yet, consider starting it.) The twentieth century needed labor unions to balance the power of workers and bosses. The twenty-first needs data unions for a similar reason. Corporations have a vastly greater ability to gather and use data than individuals. This leads to an asymmetry in power, and the more valuable the data-the better and more useful the models that can be learned from it-the greater the asymmetry. A data union lets its members bargain on equal terms with companies about the use of their data. Perhaps labor unions can get the ball rolling, and shore up their membership, by starting data unions for their members. But labor unions are organized by occupation and location; data unions can be more flexible. Join up with people you have a lot in common with; the models learned will be more useful to you that way. Notice that being in a data union does not mean letting other members see your data; it just means letting everyone use the models learned from the pooled data. Data unions can also be your vehicle for telling politicians what you want. Your data can influence the world as much as your vote-or more-because you only go to the polls on election day. On all other days, your data is your vote. Stand up and be counted!

So far I haven’t uttered the word privacy. That’s not by accident. Privacy is only one aspect of the larger issue of data sharing, and if we focus on it to the detriment of the whole, as much of the debate to date has, we risk reaching the wrong conclusions. For example, laws that forbid using data for any purpose other than the originally intended one are extremely myopic. (Not a single chapter of Freakonomics could have been written under such a law.) When people have to trade off privacy against other benefits, as when filling out a profile on a website, the implied value of privacy that comes out is much lower than if you ask them abstract questions like “Do you care about your privacy?” But privacy debates are more often framed in terms of the latter. The European Union’s Court of Justice has decreed that people have the right to be forgotten, but they also have the right to remember, whether it’s with their neurons or a hard disk. So do companies, and up to a point, the interests of users, data gatherers, and advertisers are aligned. Wasted attention benefits no one, and better data makes better products. Privacy is not a zero-sum game, even though it’s often treated like one.