I dip my fingers into a hot storage solution’s pricing to make sure it doesn’t burn me in the long term.
(Disclaimer - I’m not sorry for that punny introduction.)
TL;DR - if you’re here for the destination, not the journey.
For the past year backup has become one of the leading topic on my mind. So far I’ve settled for using duplicity and a back end provider dedicated to Linux backups called rsync.net. This scheme works well for the couple of critical servers I run, and I’m automating the setup for it via ansible among other projects, which is worth a blog post on it’s own.
However, this scheme doesn’t scale for me. The feature set of rsync.net makes it ideal for incremental storage, while also providing additional ZFS snapshot redundancy, but the price per GB is an order of magnitude higher than comparable ‘cold storage’ solutions from leaders like AWS and Google Cloud.
Researching modern cloud storage
Looking for the best service to move my backups to in the realm of object storage I’ve found Blackblaze’s B2 to be the cheapest, while also being fairly well tested, and with good reputation (Blackblaze is around since 2007, and B2 launched in 2015). While researching my specific use case (incremental daily backups) I’ve discovered this may not necessarily be so simple.
See, today’s cloud storage is often marketed on it’s price per GB. That, and the usual, less measurable metrics like provider’s reputation, locations and additional features create the basis of the decision. In case of the more archival or ‘cold’ storage services there’s also time to retrieve and upload/download per GB pricing (in case of Google’s Coldline one-time full download of your data can overtake yearly storage costs quite easily).
What’s with this pricing
Well, that’s the easy to measure part. Most cloud storage providers charge for almost any user action. Services come with a vendor-specific API and (often web) interfaces, vs. open protocols like FTP/SFTP/rsync. Those API calls are trivial to quantify and set a price on. On top of that the ‘pay as you go’ model, while flexible (you don’t have to overprovision and risk underusing), could lead to heavy spikes in costs, sometimes unrelated to your legitimate usage. Here’s a post on Medium providing more information on this topic.
Incremental backups have little chance of intentional or unintentional abuse (they are private, limited access repositories with single use), it doesn’t mean the costs are trivial to predict. Modern backup software are not zip, encrypt and upload scripts - the repositories require maintenance, comparisons, verification (and all those are IO operations - billable API calls). Here’s another blogpost that directed my attention to those potentially risky costs, and where I first learned of Wasabi.
Not just a condiment
Alright, after this lengthy introduction I’m getting to the meat.
Wasabi is priced different than AWS S3, GCS or B2.
Main differences are:
- Even cheaper storage than B2 ($0.0049 vs $0.005)
- Minimum monthly price of $4.99 (equivalent of ~1TB of data stored)
- No ingress/egress charges (download and upload is free)
- No API call charges
- A minimum 90-day storage charge
Wasabi’s offering appeared to me as very attractive. Giving up on the very low end of price range, while not having to worry about any interaction with the data adding to my monthly bill? Great, when it comes to money I enjoy simplicity. In fact, that’s where Wasabi marketing is based - “We’re cheaper than AWS S3 if you download any of your data and keep it for longer than 15 days”.
Wait, what? Yes, there is a catch. The 90-day minimum storage charge - everything you upload counts towards the overall storage size billed for 90 days regardless if it’s deleted or not. Since storage objects are immutable (modifying a file creates a new one with changes and destroys the old one) this also means no git-like delta changes or deduplication works either. At least, that made my backup retention policy much more robust by default.
wasabicalc - to the point
I really wanted Wasabi to work for me. To calculate the estimated cost for my use case I’ve created a python script simulating incremental backups over a period of time, while calculating the price based on data kept and deleted according to the 90-day policy.
Then I’ve decided I want to graph it (everything is better with graphs), and now it’s hosted over at:
Source is on Github, if you’re interested. I’ve used a framework called Dash, which I didn’t know existed until two weeks ago. Nifty little tool, bar some documentation issues it looks like a great BI data presentation tool. This tutorial (text version in description) helped me break into it.