For the last five years or so, IDC has released an EMC-sponsored study on “The Digital Universe” that looks at how much data is created and replicated around the world. When I last blogged about it back in 2008, the number stood at 281 exabytes per year. Now the latest report is out, and for the first time the amount of data created has surpassed 1 zettabyte! About 1.2 zettabytes were created and replicated in 2010 (that’s 1.2 trillion gigabytes), and IDC predicts that number will grow to 1.8 zettabytes this year. The amount of data is more than doubling every two years!
Here’s what the growth looks like:
How much data is that? Wikipedia has some good answers: exabyte, zettabyte. EMC has also provided some examples to help make sense of the number. 1.8 zettabytes is equivalent in sheer volume to:
- Every person in Canada tweeting three tweets per minute for 242,976 years nonstop
- Every person in the world having over 215 million high-resolution MRI scans per day
- Over 200 billion HD movies (each two hours in length) – would take one person 47 million years to watch every movie 24/7
- The amount of information needed to fill 57.5 billion 32GB Apple iPads. With that many iPads we could:
- Create a wall of iPads, 4,005 miles long and 61 feet high extending from Anchorage, Alaska to Miami, Florida
- Build the Great iPad Wall of China – at twice the average height of the original
- Build a 20-foot high wall around South America
- Cover 86 per cent of Mexico City
- Build a mountain 25 times higher than Mt. Fuji
That’s a lot of data!
EMC/IDC has produced a great infographic that explains more about the explosion of data – see it here in PDF. One of the things that has always been fuzzy for me is the difference between data we’ve created intentionally (like a document) and data we’ve created unintentionally (sharing that document with others). According to IDC, one gigabyte of stored data can generate one petabyte (1 million gigabytes) of transient data!
Cost is one of the biggest factors behind this growth, of course. The cost of creating, capturing, managing, and storing information is now just 1/6th of what it was in 2005. Another big factor is the fact that most of us now carry the tools of creation at all times, everywhere we go. Digital cameras, mobile phones, etc.
You can learn more about all of this and see a live information growth ticker at EMC’s website.
This seems as good a time as any to remind you to backup your important data! It may be easy to create photos and documents, but it’s even easier to lose them. I use a variety of tools to backup data, including Amazon S3, Dropbox, and Windows Live Mesh. The easiest by far though is Backblaze – unlimited storage for $5 per month per computer, and it all happens automagically in the background.
At around 1:30am on August 6th, a hard drive in one of our database servers died. It took down our mail server and WordPress blogs, but everything else (such as Podcast Spot) was unaffected. It sucks, but it happens. We’ve had many drives die over the last few years, unfortunately. All you can do is learn from each experience.
In this case, we had a full image of the server backed up. All we had to do was stick in a new hard drive, and deploy the image. That worked fairly well, though it did take some time to complete. The only problem was that the image was about 24 hours old – fine for system files, but not good for the data files we needed. For the most up-to-date data files, we relied on MozyPro.
(I should point out that we generally configure things so that data files are on separate drives from the system. In this case, we had about 250 MB of data files on the system drive. I have since reconfigured that.)
For the most part, Mozy has worked well for us. We’ve had a few bumps along the way, but no major complaints or problems. Until I tried to restore the data files yesterday, that is. The first problem was that I couldn’t use the Windows interface. The Mozy client would not “see” the last backup, presumably because the image was older than the last backup. You’d think it could connect to Mozy and figure that out, but apparently not. So I tried to use the Web Restore. It eventually worked, but it took about four hours to get the files. I don’t mean to download them, but for Mozy to make them available for download. Thank goodness it was only about 1000 files and 250 MB or it could have taken days!
So I learned that Mozy is reliable, but certainly not quick. If you need to restore something quickly, make sure you have a local backup somewhere. If you’re just looking for reliable, inexpensive, offsite storage then Mozy will probably work fine for you.
My next task is to upgrade this server particular to a RAID configuration, something we had been planning to do anyway. Should have done it sooner!
I typed the title for this post into Windows Live Writer, and a red squiggly appeared under the word “exabytes”. I just added it to the dictionary, but I can’t help but think that it’ll be in there by default before long.
Either it takes three months to crunch the data or March is just the unofficial “how much did we create last year” month, because researchers at IDC have once again figured out how many bits and bytes of data were created in 2007. You’ll recall that in March of last year, they estimated the figure for 2006 to be 161 exabytes. For 2007, that number nearly doubled, to 281 exabytes (which is 281 billion gigabytes):
IDC attributes accelerated growth to the increasing popularity of digital television and cameras that rely on digital storage. Major drivers of digital content growth include surveillance, social networking, and cloud computing. Visual content like images and video account for the largest portion of the digital universe. According to IDC, there are now over a billion digital cameras and camera phones in the world and only ten percent of photos are captured on regular film.
This is obviously a very inexact science, but I suspect their estimates become more accurate with experience.
Interestingly, this is the first time that we’ve created more data than we have room to store (though one wonders if that’s simply due to a lack of historical data than anything else).
Read: ars technica
The computer industry changes so rapidly that it’s easy to forget about the hardware and devices we had just a few years ago. I’ve been cleaning up the office, getting rid of some junk that we’ve had lying around for years, and I’m amazed at some of the hardware I’ve found. Hard drives best demonstrate the difference between then and now – they’ve had the same form factor for years, but the capacities are vastly different.
For instance, the hard drive from an old Toshiba T4900CT laptop is only 810 MB! Technically that’s 770 MB I believe, yes megabytes. I don’t know why I’ve kept this laptop for so long, it hasn’t worked for years. I guess I’m a bit of a digital pack rat. It was the first laptop I ever used. My family used it at the pet store back in Inuvik when I was a kid, and it worked great. I even took it on a field trip back in high school (Dickson reminded me that we played Grand Theft Auto on the bus).
I found this description on the Toshiba Europe site:
The T4900CT and its 75 MHz Pentium processor will give you such speed and power when you’re out on the road that you’ll really move along the data super-highway. Back in the office, there’s hardly a desktop that can keep up with it.
How times have changed! Not only does it weigh about 15 pounds, but it’s a good four inches thick! The last thing that processor makes me think of is speed and power.
Here are a few photos I took tonight: the 810 MB hard drive, a 9.1 GB SCSI hard drive, and a 20.5 GB IDE hard drive.
I wouldn’t consider buying anything smaller than a 300 GB SATA II hard drive now, and I wouldn’t be surprised if that seems tiny in a couple years. Hard to imagine that a hard drive with only 770 MB was ever actually usable!
There’s a new report out from research firm IDC that attempts to count up all the zeroes and ones that fly around our digital world. I remember reading about the last such report, from the University of California, Berkeley. That report found that 5 exabytes of data were created in 2003. The new IDC report says the number for 2006 is 161 exabytes! Why the difference?
[The Berkeley researchers] also counted non-electronic information, such as analog radio broadcasts or printed office memos, and tallied how much space that would consume if digitized. And they examined original data only, not all the times things got copied.
In comparison, the IDC numbers ballooned with the inclusion of content as it was created and as it was reproduced – for example, as a digital TV file was made and every time it landed on a screen. If IDC tracked original data only, its result would have been 40 exabytes.
Even still, that’s an incredible increase in just three years. Apparently we don’t even have enough space to store all that data:
IDC estimates that the world had 185 exabytes of storage available last year and will have 601 exabytes in 2010. But the amount of stuff generated is expected to jump from 161 exabytes last year to 988 exabytes (closing in on 1 zettabyte) in 2010.
Pretty hardcore, huh? You can read about zettabytes at Wikipedia. I’m not too worried about not having enough space though, even if we were attempting to store all that data (which we aren’t). Hard drives are already approaching the terabyte mark, so who knows how big they’ll be in 2010. Then of course there’s also the ever falling costs of DVD-like media.
More importantly, I bet a lot of the storage we “have available” right now is totally underutilized. You’d be hard pressed to find a computer that comes with less than 80 GB of storage these days, and I can assure you there are plenty of users who never even come close to filling it up. Heck, even I am only using about 75% of the storage I have available on my computer (420 GB out of 570 GB) and I bet a lot of it could be deleted (I’m a digital pack rat).
Read: Yahoo! News
It’s hard to imagine that in just three years a single hard drive could store 300 TB, but we’ve been here before. Five years ago, who would have thought we’d have the 750 GB drives that we do today! Seagate claims the larger drives are on the way:
To pull the 300 TB rabbit out of the hat, technology comes to the rescue once again. This time, Seagate will use a technology called heat-assisted magnetic recording (HAMR). These isn’t much detail on exactly how this works, but a single square inch of hard disk space will be able to store 50 TB of data.
It would totally suck to lose 300 TB of data, though like the article says, if they are the norm then buy two and back it all up!
You might wonder how you’d ever fill a 300 TB drive. I used to wonder that about my 200 GB drive, and now I have two of them plus a larger 300 GB drive. We’ll find a way to use the space. Always have, always will.