Showing posts with label Disk. Show all posts
Showing posts with label Disk. Show all posts

Wednesday, March 6, 2013

Data Domain vs. Protectier

Where I am currently employed we are looking to replace our 3592 based library with a deduplication solution. Currently the higher ups are leaning towards IBM ProtecTIER without having thoroughly investigated any other solutions. Having previously used Data Domain solutions at my previous employ I was somewhat concerned that the ProtecTIER solution would be a bad fit for our environment. I have had some run ins with people who have used IBM's ProtecTIER solution and when compared to those who have used Data Domain (including myself) you immediately see the difference in how they talk about the two products. So I was hoping to find a good write-up showing in-depth details comparing the two solutions and it took a blogger like me to provide a great comparison.  If you would like a good overview of how Data Domain and ProtecTIER stack up against one another in technology and performance check out the following link. It's very informative and solidifies why I would prefer using Data Domain.

Deduplication: Data Domain Vs. ProtecTIER Performance

One item that was not covered was the NFS capabilities of both. While I used VTL functionality with Data Domain, I was a HUGE NFS proponent. You can save a lot of money over a TSM TDP + LAN-Free solution using NFS with 10Gb Ethernet for your DB backups  (since IBM's licensing costs are still questionable). When I was first exploring ProtecTIER they did not yet have NFS capabilities, so I'd like to see a NFS performance comparison between the two products.

Tuesday, May 31, 2011

Where, how and why are tapes still used for backup?

Not strictly a TSM only topic, but this fascinating thread has been raging on over at LinkedIn for the last few days and I thought it would be worth sharing here. The provocative question asked was:
"Is tape still used for backup? Why is tape still being used these days when disk and cloud are available?"

Cue an avalanche of fascinating pro- and anti-tape pitches and opinions. Wading through the comments may take a while (I pitched-in with a couple of comments too) but it's a useful read as, frankly, these are the questions that clients/customers/decision-makers - and indeed ourselves - should be asking to ensure that we continue to use the right technology for the right purpose.

Of course, TSM itself is well-positioned for in many ways given its long-running support of disk, both in random access and more recently in sequential file (virtual tapes!).

David Mc
London, UK

Friday, October 19, 2007

FilesX Xpress Restore & TSM

If you click the title of this post you can read the news article that explains this new tool that has been validated to work with TSM. It provides a block level disk backup of application data allowing TSM to work with the Xpress Restore repository enhancing the speed of backups...This is one tool I would like more information on so I will be looking into how it works with TSM.

Thursday, October 18, 2007

Good De-Deuplication Questions

An anonymous reader posted the following comments about the De-Duplication post a few days ago which I thought were good enough to post on the main page and let everyone read them. In the future I hope more of you would post with your name so we can give credit where credit is due. Anyone currently using a De-Duplication product, we would love to hear from you.

I have serveral issues with Backup and DeDupe (most of which are TSM related).

First of, why are people retaining so much data within TSM (i.e. the retention period is increasing). TSM is something that is supposed to be used in response to a data loss event. In other words, data is lost either through hardware failure, logical failure or because of human error and we turn to TSM to recover it.. but an increasing number of people are using it as a filing cabinet for data placing infinite retention on data. I don't think TSM was truly designed to do this. I see this as more of data management function akin to HSM and Archiving.. Yes, TSM has archiving but I think it's pretty weak in terms of functionality, it really needs to be married to an application that can do better indexing and classification in order to make this powerful.

So.. if the data you are storing within TSM cannot truly be used to support a data recovery function, then why keep it? Are you really going to restore a file from 180days ago because someone suddenly discovered that they deleted a file 4 months ago that they now need. I haven't seen much of that, occurences are typically rare.. yet the outlay to stay consistent on such a policy could be expensive. Forget about just the cost of the media - there's much more to it than that.

DeDupe becomes more efficient when you retain more data in the backups, but more versions = bigger TSM DB which often means that you have to spawn another TSM server to keep things well maintained.

In TSM land we're very conscious of the TSM DB.. It's the heart of the system and we go to great lengths to improve it's performance and protect it. In the event that it does become corrupt we can roll it back using TSM DB backup and Reuse delay. The DeDupe engine must also have a index/db.. what do we do if it becomes corrupt? If it does, how do we insure that we can get it synched up with TSM again?

How well will DeDupe work when data is reclaimed? TSM rebuilds aggregates when data is reclaimed, so how much work is that for the DeDupe engine and what's the I/O pattern going to be like on the back end storage.

How does this work in terms of recovery both operationaly and in terms of a disaster? Single file restores, probably great. Recovery of lots of files, probably not too bad.. when recovering lots of small files the client is typically the bottleneck.. not sure that the dedupe engine would impact it much. What about recovery of a large DB? This one I am more skeptical of. We can get great performance from both tape or disk.. potentially the best performance from tape provided that we can get enough mount points and the client isn't bottlenecked in some way. But what if it's deduped on disk.. will the data stream from disk or will we get more random I/O patterns. If it's a 10TB that needs to be recovered, I think that still equates to 10TB that needs to be pushed through TSM, even if it's been deduped to 2TB on the physical disk behind the dedupe engine.

What about DR where you want to recover multiple clients at the same time. Good storage pool design can alleviate some of the issues with tape contention, disk may offer some advantages because the media supports concurrent access (but bear in mind that TSM may not - depending on how you configured it).. If that disk is deduped though, then potentially you have less spindles at your disposal. That could mean more I/O contention and perhaps more difficulty in streaming data.

Tuesday, October 9, 2007

Data DeDuplication - Been There Done That!

I just got off a pretty good NetApp webcast covering their VTL and FAS solutions. One of the items they discussed was the data deduplication feature with their NAS product. When the IBM rep spoke up they discussed TSM's progressive backup terminology and I find it interesting to contrast TSM's process with the growing segment of disk based storage that is the deduplication feature. The feature really helps save TONS of space with the competing backup tools since they usually follow the FULL+INC model causing them to backup files even when they haven't changed. Here deduplication saves them room by removing the duplicate unchanged files, but this shows how superior TSM is, in that it doesn't require this kind of wasted processing. What would be interesting is to see how much space is saved in redundant OS files, but that is still minor compared to the weekly full process that wastes so much space.

This brings us to the next item, disk based backup. This is definitely going to grow over time, but costs are going to have to come down for it to fully replace tape. The two issues I see with disk only based backups is in DRM/portability and capacity/cost. If you cannot afford to have duplicate sites with the data mirrored then you are left having to use a tape solution for offsite storage. Also with portability disk can be an issue. For example we are migrating some servers from one data center to another and we used the export/import feature. We have also moved TSM tapes from one site to another and rebuilt the TSM environment. To do this with disk is a little more time consuming, you would need the same disk solution and the network capacity to mirror the data (time consuming on slow connection) or have to move the whole hardware solution. Tape in this scenario is a lot easier to deal with. Now when it comes to capacity vs. cost there is a definite difference that will keep many on tape for years to come. Many customers want long term retention of their data, say 30+ days for inactive files and TDP backups (sometimes longer with e-mail and SARBOX data). So what is the cost comparison for that type of disk retention (into the PB) compared to tape. Currently it's no contest and tape wins in the cost vs. capacity realm, but hopefully that can someday change. So if any of you have disk based solutions or VTL solutions chime in I'd like to hear what you have to say and how it's worked for you.