Mountains of data, right at your fingertips

Last week, two announcements caught my eye. The first was from Amazon.com, which announced that there is now more than 1 TB of public data available to developers through its Public Data Sets on AWS project. The second was from the New York Times, which announced its Newswire API, providing access all NYTimes articles as they are published.

This is a big deal. Never before has so much data been so readily available to anyone. The AWS data is particularly interesting. All of a sudden, any developer in the world has cost-effective access to all publicly available DNA sequences (including the entire Human Genome), an entire dump of Wikipedia, US Census data, and much more. Perhaps most importantly, the data is in machine-readable formats. It’s relatively easy for developers to tap into the data sources for cross-referencing, statistical analysis, and who knows what else.

The Newswire API is also really intriguing. It’s part of a growing set of APIs that the New York Times has made available. With the Newswire API, developers can get links and metadata for new articles the minute they are published. What will developers do with this data? Again, who knows. Imagination is the only limitation now that everyone can have immediate access.

Both of these projects remove barriers and will help foster invention, innovation, and discovery. I hope they are part of a larger trend, where simple access to data becomes the norm. Google’s mission might be to organize the world’s information and make it universally accessible and useful, but it’s projects like these that are making that vision a reality. I can’t wait to see what comes next!