Showing posts with label Export. Show all posts
Showing posts with label Export. Show all posts

Friday, May 31, 2013

TSM 7

I recently attended a technical briefing from IBM of various storage related topics which included TSM. While they did have an NDA I can say that some of the items we discussed showed promise. I'll be able to discuss more after IBM Pulse this month, but what I can say is that the new Admin Center is pretty slick. It has some nice features and will finally make up for the folly that was the ISC. IBM stressed that they are listening to users and taking their requests and suggestions to try and develop a tool everyone will find useful. That was surprising news seeing as how the majority of people complained about the ISC and it took 7+ years to finally get a replacement. I will say this in defense of the TSM developers, a lot of the ISC push came from above and they were somewhat forced into that fiasco. TSM 7 DB will scale larger and handle more objects and they are really ramping up the capabilities of the client deployment module. More info to come in the next couple weeks.

One item that did come up was the issue of Export and Backup Set tapes being unencrypted from TSM due to the key issue. What I suggested was that they allow TSM servers to backup each others keys and also utilize them so Exports and Backup-Sets could be encrypted, but still shared between TSM servers. Hope they find some way to add that capability.

We did have a Protectier review and it has a lot of promise. I know I have been a Data Domain fanboy for some time. While I didn't see anything that integrated Protectier DeDupe with TSM directly it did show some nice growth capabilities. I'm excited to see how well it works, but I'm fighting a study that shows tape still is the more cost effective backup solution.

I'll post more once PULSE is complete (mid-June) so stay tuned!

Friday, May 22, 2009

Calculating Active Data

I was recently asked to calculate the amount of active data in TSM storage for file system backups, not TDP's and had some interesting results. If you search TSM active data in Google the first result will be this IBM support doc that explains how to calculate active data to help size an Active Data Storage Pool. IBM recommends that you use the EXPORT NODE command with PREVIEW=YES to determine the amount of active data. This, in theory, should work well but for TSM to process this request it has to analyze the backups table and who knows what else to gather the data. I have 10 instances I needed to gather the information from, they all vary in TSM DB size, and the amount of managed data stored. My smallest DB is a new instance that is 25GB and my largest is 155GB and size did not matter when it came to how fast the information was calculated. The TSM instance with the largest DB completed the taks in over two days (YES TWO DAYS!). Two TSM instances were still running the EXPORT NODE query after THREE DAYS and they had moderate to large sized DB's.

So what caused this problem? It all comes down to the number of files TSM has to inspect. The two instances that never completed the query have large numbers of Windows nodes and have the most registered nodes overall. These two instances seemed to be crawling through the process and where they should have calculated into the ten to twenty TB, as the next largest instance did after just over two days, the problem two were still in the 7 to 6 TB range and increasing slowly. My only explanation (and this is a guess) is that due to the fact that Windows servers tend to have hundred of thousands if not millions of files which TSM gets bogged down trying to inspect them all. I didn't notice a performance impact but IBM claims that it is a resource intensive task and should be run during non-peak hours.  How can you do that when it runs 24hrs or more?

Finally after three days, and no end in sight, I canceled the processes and now have to figure out some other way to calculate the amount of active data they have stored. I could calculate (i.e. guesstimate) by summing the amount of space used per file space.

Example:

select cast(sum(capacity*(pct_util/100)) AS decimal (18,2)) As Total_Used_Space from filespaces where node_name in (select node_name from nodes where domain_name like '%STD%') 

The problem is that this will not be accurate and will probably cause me to oversize any Active Data Pool I create. Now that's not a horrible thing (more space is always better than to little) but the whole process seems to time consuming for something that on the surface should be fairly easy to calculate.  This where I hope the new DB2 database can help, but until someone has it up and running and can try this process we can't say if there is a reasonable solution to find the total active data in TSM within larger instances.


Addendum

Are wondering why I used the '%STD%' filter in the select statement above? The team here at Infocrossing separates the file system backups and application (TDP) backups with different domain names using a standardized naming process. This is great because it also allowed us to run the EXPORT NODE command using wild cards for the specific domains to include in the export query.

EXPORT NODE * DOMAINS=*STD* FILEDATA=ALLACTIVE PREVIEW=YES

I highly recommend you follow a similar process when creating domains and even schedules to make it easier to process groups of nodes. So you could create a WIN-STD-DOM domain or like we do for our TDP's WIN-APP-DOM. These are examples but they can make life easier. 

Tuesday, October 9, 2007

Data DeDuplication - Been There Done That!

I just got off a pretty good NetApp webcast covering their VTL and FAS solutions. One of the items they discussed was the data deduplication feature with their NAS product. When the IBM rep spoke up they discussed TSM's progressive backup terminology and I find it interesting to contrast TSM's process with the growing segment of disk based storage that is the deduplication feature. The feature really helps save TONS of space with the competing backup tools since they usually follow the FULL+INC model causing them to backup files even when they haven't changed. Here deduplication saves them room by removing the duplicate unchanged files, but this shows how superior TSM is, in that it doesn't require this kind of wasted processing. What would be interesting is to see how much space is saved in redundant OS files, but that is still minor compared to the weekly full process that wastes so much space.

This brings us to the next item, disk based backup. This is definitely going to grow over time, but costs are going to have to come down for it to fully replace tape. The two issues I see with disk only based backups is in DRM/portability and capacity/cost. If you cannot afford to have duplicate sites with the data mirrored then you are left having to use a tape solution for offsite storage. Also with portability disk can be an issue. For example we are migrating some servers from one data center to another and we used the export/import feature. We have also moved TSM tapes from one site to another and rebuilt the TSM environment. To do this with disk is a little more time consuming, you would need the same disk solution and the network capacity to mirror the data (time consuming on slow connection) or have to move the whole hardware solution. Tape in this scenario is a lot easier to deal with. Now when it comes to capacity vs. cost there is a definite difference that will keep many on tape for years to come. Many customers want long term retention of their data, say 30+ days for inactive files and TDP backups (sometimes longer with e-mail and SARBOX data). So what is the cost comparison for that type of disk retention (into the PB) compared to tape. Currently it's no contest and tape wins in the cost vs. capacity realm, but hopefully that can someday change. So if any of you have disk based solutions or VTL solutions chime in I'd like to hear what you have to say and how it's worked for you.

Wednesday, July 5, 2006

Import/Export Question!

Can someone explain to me why moving data (exporting/importing) from one TSM server to another TSM server is such a pain in the neck? Here is my scenario, I have some servers that moved from one location to another, network wise, and they now need to backup to a different TSM server. Both servers use the same media type (LTO-3). So why is it I have to either copy all the data across the network to the new server as it creates new tapes, or dump it all to tape(s) and then rewrite it to new tape(s) when imported? My question is this - why can’t I just export the DB info and pointers for the already existing tape to the new TSM server from the old? Why can’t the old server “hand-over” the tape to the new TSM server?  It seems like a lot of wasted work to constantly have to copy the data (server to server export/import) or dump it to tape and then write the data to new tapes. I think the developers ought to work on a way of doing this.  I would also think that this process could be done on a DB level so you could in a sense reorg the DB without the long DB Dump/Load and audit process, and the tapes would be handed over to the new TSM server. No rewrites necessary.