The problem I have with the "duplicate the Internet" theory is that it favors the hard solution vs the easy solution.
The hard solution is to secretly duplicate traffic from every data center operated by each of these companies, reverse engineer every HTTP request that goes back and forth so that the data can be parsed, maintain it for every product change that happens at these companies, circumvent HTTPS by compromising the certificate authorities, store it all, and still maintain a massive analytics tool that can make sense of the astounding amount of data coming through.
The easy solution is to avoid all of the technical ugliness of acquiring the data, and just legally make the companies give you the relevant information, neatly structured and packaged. NSLs are the ultimate hack.
It honestly wouldn't surprise me if the gov't has issued a secret subpoena for every PRISM provider's SSL key (e.g. Google/Facebook/Yahoo/etc). That way they get to claim "hey, we're not giving them full access" and the government gets what they want anyway.
As I understand it, they don't have to focus to the data center of those companies when doing the duplication.
For example for emails: emails travel unencrypted through the hops, and they would store them all, and then constantly analyzing them. When something suspicious comes up, they would go to the email provider to ask for more data. So for example if gmail address is there, they would go to Google and use their PRISM interface to get more data associated with that gmail adress, if it will be yahoo email, they will go to Yahoo for more data, etc.
Gmail users sending to each other will only relay inside Google's own private network. If all of my co-conspirators are using Gmail, there are no external relays to be tapped. Someone would have to read all of our SSL/TLS traffic to see what we're writing about.
This is even more complicated when the data centers are in other countries, and none of the data actually enters the US. So if two EU users were accessing Gmail from the EU, the data may never enter the US at all. This means any network tapping would have to be done in the EU as well, requiring cooperation from many international telecom companies.
It's still easiest to just force Google to hand it over via NSL. Google's still legally bound to deliver the data even if it isn't physically stored in the US.
I wouldn't be so sure if that was the easy solution, as it depends on the cooperation of those companies.
They at least have the choice to resist in some way or another.
They also could be using both solutions simulatneously.
From their perspective, why not?
A lot of the communication won't be encrypted anyways, and some of it will be, but they may be able to decrypt it at some point in the future.
The hard solution isn't just a little bit harder ... it's several orders of magnitude harder and more expensive. It's also highly vulnerable to simply using encryption. The easy solution works because the US companies are bound by law to cooperate. There's no reason to believe that legal pressure on these companies has failed to get the government what it wants.
The hard solution is to secretly duplicate traffic from every data center operated by each of these companies, reverse engineer every HTTP request that goes back and forth so that the data can be parsed, maintain it for every product change that happens at these companies, circumvent HTTPS by compromising the certificate authorities, store it all, and still maintain a massive analytics tool that can make sense of the astounding amount of data coming through.
The easy solution is to avoid all of the technical ugliness of acquiring the data, and just legally make the companies give you the relevant information, neatly structured and packaged. NSLs are the ultimate hack.