Confusing Clouds

Amazon announced Glacier, their new archival storage cloud yesterday. This service is targeted at providing long term archive storage for large amounts of data. While some of the use cases are still alittle unclear, there are some that fit pretty well. If you’re a videographer or photographer that has a need to store Terabytes of images or videos from client shoots, it might be a great fit. For one penny per Gigabyte of storage and unlimited upload, you can store 10 Terabytes for about the same amount of money in a year as buying hard drives with that much capacity. Plus, with Glacier you know where your data is. No more trying to figure out which drive the data you need is on, Glacier knows.

Once you have your data out there, it’s really not a lot of use to you if you can’t get it back. Here’s where a lot of people make mistakes. Business stake holders look at online archival, backup, and other types of storage, and think, “Wow that’s inexpensive. I’ll just put my data out there.” Then the need arises to get access to that data. Although retrieving a single file can be fairly painless, it’s the large retrievals that cause us to pause. The pricing model for Glacier is, at best, a little confusing. There’s a good thread going at Hacker News regarding this. I had a conversation with a colleague and contributor to the Hacker News discussion, and we walked through some different scenarios. At the end it was clear that using the Glacier invoked image of your data frozen in ice, you could correlate a retrieval request to a couple of people using ice picks to uncover and deliver your data. To avoid additional costs, it could take hours or days depending upon the size of the archive. If you decided to use a hair dryer instead to speed up the process, that would cost you. Those additional costs are still being debated, and Wired even weighed in with an article yesterday afternoon.

The real problem happens when the business decision makers don’t actually walk through what the end results might look like while using a service like Glacier. Think about the IT Manager in a medium sized business constantly being asked to do more with less money, people, and equipment. If that manager decides a service like Glacier is ideal for storage based upon storage cost alone, then he or she has now put the business in a situation where it could cost tens of thousands of dollars to retrieve what is now considered a relatively average amount of data (3 TB). This IT Manager made the decision based solely on the cost to upload and store the data and the ability to tell their boss that the data is off-site at Amazon. Whew, what a relief. The data is now safe from harm. Then something happens. Somebody corrupts a database that’s been rolled into a large archive on Glacier. Now you have to request the whole archive and wait for it to be delivered. This is where you incur the actual costs of utilizing a service like Glacier. That IT Manger has made a mistake, and now it’s painfully apparent to all the business users that are waiting on the data to come back. The business owner is now being exposed to the cost of a service outage and the potentially large retrieval costs.

The bottom line is to make sure you understand the complete scenario when implementing a cloud solution. Whether it’s some archival storage, or server and infrastructure hosting, it’s important to walk all the way through the process of how toimplement and utilize the cloud service. Many cloud providers have pricing models that are a little difficult to get your hands around. In my experience with storage retrieval, users always end up wanting the data back more quickly than it can actually be delivered. Glacier will probably find its niche, but I think that Amazon has some work to do on the pricing model to make it easier for mainstream adoption.