Inter-Planetary File System

The Inter-Planetary File System (IPFS) is a project to make the web more resilient and decentralized. The shortest description is a distributed CDN based on a combination of git and bit-torrent. Basically, all data is content addressed (stored at the hash of its content, like git). This means everything is immutable, so there's also IPNS, a distributed analog to DNS that allows you to have a symlink that can be moved to point to a new hash.

IPFS came to my attention when Neocities announced they would publish all their sites on it. The system is pull only, so Neocities will still be running servers where IPFS agents can fetch the sites (in the same way they run HTTP servers where clients can get the sites). Making the system pull only solves some issues though, since the first thing I thought of was "What happens if I try to backup all my photos on to the cloud? Does it waste everyone's hard drive space?"

As I mentioned in my last post, I'm a sucker for distributed web projects. Rather than jump in and start implementing features, or implementing a parallel client in another language to test the spec and interoperability, I've tried to think through any strategic issues with this project and lay them out here.

Technical issues

Using the system as a backup

As I mentioned, IPFS is pull only, so the "I want to backup my movies to your free cloud" problem won't happen. They've already discussed the backup issue themselves, and come to the same conclusion. While IPFS could be used for a backup transmission system, it won't do the storage. In this example, I would run an IPFS server on my desktop, serving up encrypted tarballs into the cloud, and my backup server at some other location would subscribe to any files published by my desktop. This would not use storage or bandwidth of anyone else, so IPFS is pretty resilient in this case.

URLs and Rollout

Like many of these projects, there is a chicken and egg issue. Publishing content to IPFS is pointless right now since no one has the client, and noone will install or use the client if there is no content to consume. We need a transitional technology that will let consumers with regular web browsers read content from the IPFS system without doing anything extra.

The IPFS project is running just such a system on their publicly accessible IPFS node. This is a nice shim, since only the publisher need setup an IPFS daemon, allowing them to send out ipfs.io links for people to read. This lowers the barrier to entry in that people can browse around the IPFS space without installing a client, but there are several issues with this being a long-term transitional technology.

I have some concerns about this shim regarding single-point of failure and bandwidth concentration. A single domain (controlled by a single entity) is injected between all clients and the IPFS distributed hosting system. ipfs.io would be tasked with maintaining high enough reliability that a critical mass of content producers are willing to distribute ipfs.io links for their content. All of the traffic will also be routed though this single point, so the distributed nature of IPFS is blocked from helping with both the resiliency and the bandwidth / economic impacts.

Obviously, industry-standard practices like dns load balancing could be used to route the traffic to multiple global locations. They could even follow LinkedIn's idea and switch to anycast server ips. While a single organization owns the ipfs.io domain, the technical assets that provide service to it would be easy to redundantly distribute throughout the globe. IPFS's immutability design would allow each server site to operate as a stateless node, so this may lead to various independent groups all running servers (simply submitting their IP to the IPFS's database of live sites). It seems feasible that the economic / bandwidth issues could be solved by making it as easy as possible for a charitable party to setup a server and register it as a ipfs.io point of presence.

Single origin

The transitional system serves all its content from https://ipfs.io/ipfs/ while the examples for people running the IPFS daemon all serve from http://localhost:4001/ipfs/. In both cases, everyone's content is served from a single domain. This means that if I host a javascript app on IPFS, when I set cookies, use local-storage, or get permission for some HTML5 features (location, microphone, etc), they are all granted to the single host. Then, when someone later deploys another javascript app to IPFS, they can read / write my data and use my permissions.

The IPFS group has already discussed the single-origin problem. One solution is to use a proposed browser feature to define per-page sub-origins. This would solve the issue, but only in newer browsers.

People have also suggested using your own domain, but this means the host page would be served from my own server. When I go out of business and my domain goes silent, IPFS cannot serve it from the same url, so IPFS's resiliency is nullified. Even if my domain stays up, this does not solve the problem for people using IPFS from localhost.

The idea of custom sub-domains such as [hash].gateway.ipfs.io was also put forward. If the hash is something that can be consistent even while changing content (the conversation seemed to link it to IPNS), then this would work, but would need to be made to function properly with the rollout issues (anycast ip or geo-distributed dns routing) in order to be resilient.

Legal Issues

As with any idea to distribute content where the content could be illegal, there are legal and liability concerns to deal with. The scenario brought up in an IPFS ticket about illegal content describes exactly my concern: Someone publishes something illegal through IPFS, and every exit node (ipfs.io's public node) could become a liable party for enabling distribution of the content. The proposed solution is to have shared blacklists. While they are not implemented yet, a shared blacklist would let people re-publish IPFS data with minimal concern of liability. This system is obviously subject to properly maintained and updated blacklists, but I believe it can work.

An interesting situation would be:

  1. Netflix decides to publish encrypted movies through IPFS to save on bandwidth, and then only needs to distribute the decryption keys using their traditional centralized servers.
  2. IPFS nodes freely distribute the movie data for months.
  3. The decryption keys are compromised, and everyone can now watch the movie freely.
  4. Netflix now declares the encrypted movies to be contraband and gets it added to the DMCA shared blacklist
  5. Everyone stops distributing the movie data.

I guess there is no inherent legal problem with Netflix publishing something for free distribution and then later declaring that anyone freely distributing it is liable, so this is likely how it would play out. If not, if they couldn't declare it as contraband, they would probably just never publish it on IPFS in the first place.

Economic Issues

Much of the web consists of for-profit content sites that need to make money to provide more content. Many use paywalls to have a direct two-party transaction: "Give me money and I'll give you content", but most use advertising: "Come for my free content, and I'll sell your eyeballs to advertisers". In both cases, the content is given to the visitors, so there's a risk these visitors could republish it elsewhere, thus cutting off the producer from their chance to make more money. IPFS presents some extra challenges, and can make this republishing infinitely easier, so the big question is: can these models still work?

Advertisement

Advertisement could work similarly to today. While the content would still be served from IPFS (including the javascript adware), the actual ads would be served directly from a server (which would also do the tracking). This might make ad-blocking even easier since ajax calls could be disabled, but that's not much of a difference from today. The upsides would be:

  • Server costs for content would be lowered, since much of the content would be distributed through IPFS
  • Most of the server traffic would hopefully be ajax callbacks for ads and tracking (so only the profitable traffic)
  • If the company ever went out of business and shut down the servers, the content would still appear (just the ads and tracking would fail)

Paywalls

A paywall on a centralized site is easy. Use session-based authentication to know when someone is logged in. If they are, show them the content. If not, tell them to screw off.

In the case where the content is on IPFS and thus always available, you need to create a system where the keys to viewing the content are controlled. Something where signing in means a CORS call to the auth server, which would then return the hash to locate the content. Another alternative is have the content encrypted and have the CORS call return the decryption keys. Both ways would work, and would have the bandwidth-saving advantages for the site. The downside is that, if the company disappeared, the content would stay hidden or encrypted, so the immutability and persistence benefits of IPFS wouldn't shine through.

Is this the new web?

IPFS is an interesting technical project, and has some pretty compelling real-world uses. I still have some questions about what final urls people will be giving out, how the single-origin problem will be solved, and whether video media companies will be able to make good enough paywalls that they use IPFS.

The next step might be to flesh out these solutions by making a proof of concept for an ad site, a paywall site that uses hidden content, and a paywall site that uses encrypted content. I'll put that in the backlog after I look into EveryBit.