Global vs. Job Based Deduplication

by Robert Payne on May 13, 2014

By Adam Grare

There is a clear winner and it is… GLOBAL. We are not surprised. 

Recently we (Unitrends / PHD Virtual) have heard a lot of noise around different deduplication methods from various vendors, specifically comparing per-job deduplication with our global deduplication/virtual-full backup model.

It has been claimed that if you have a single job then you effectively have global deduplication.  This may seem to make sense at first, but when you look a little deeper there are a number of big problems with that statement.

The First Week 

We performed internal comparison testing between the upcoming UVB v8 (Unitrends Virtual Backup, formally PHD Virtual Backup and Replication) and competitors’ current offerings.  A number of standard windows servers were deployed totaling 1TB of source data, and 5% changed data written for every subsequent backup.  Both products backed up the exact same VMs using all default settings.  These VMs were in a single job so we were allowing the competitor to perform to their best (so they say).

The first full backup for the competitor wrote 104GB, for a respectable 9.8:1 dedupe ratio.  UVB on its first full backup wrote 71GB for a 14.4:1 dedupe ratio.  After 5% changed data was written to the VMs, they were backed up again, the competitor writing 42GB and UVB writing 28GB for the subsequent backups.  In our testing the competitor needs 50% more storage to back up the same source data.  This is just in our testing but it matches what we hear from our customers who have trialed both, and I encourage you to do the same if you haven’t already.

How does global dedupe compare over time? (It get even better)

Something important to note about standard Full-Incremental models (Including “Synthetic” modes), is that in order to keep a restore/retention point they must maintain a full backup.  This full is not deduped against anything else so the full storage space is used.  With virtual-full every backup of every VM is deduped against everything else so you never need to store the same block more than once.

What this means is that after the first week of running incremental and you have to store a new full, UVB will (using the prior example) store 28GB while the competitors will store the whole 104GB!  Every point that you keep for retention will use the full 104GB.  With virtual-full every backup of every VM is completely detached from every other backup.  This means you can delete any VM from the backup store without affecting any other VM, this is what allows us to offer such granular retention policies without using a ridiculous amount of storage.

We like to talk about the Total Storage Cost of Ownership; when comparing UVB to other backup products take a look at how much storage you will need in the long run.  This effect is of course compounded if you do want to create multiple jobs in order to backup different systems at different times, send email reports to different sets of people in different jobs, etc…  Why limit yourself?

Leave a Comment

Previous post:

Next post: