TSMAdmin: TSM DB

Showing posts with label TSM DB. Show all posts

Friday, May 25, 2012

TSM 6.2.3 and Lower DB Reorg Issue

One of our TSM servers started to experience large numbers of "ANR0530W - internal server error detected messages." With further investigation we identified that these were related to ANR0162W which are DB deadlock or timeout problems. These errors were causing our DB TDP backups to fail and I eventually called support. I provided our db2diag.log file and a dump of our actlog for the last 24 hrs. and they found the issue to be the DB2 reorg process was locking records and tables and creating the deadlock situation. The problem was compounded in that with TSM versions 6.2.3 or lower the DB2 reorg process cannot be schedule so it can kick off during backups or processes that can create these deadlocks. So to resolve the issue I was told I needed to upgrade our TSM server to 6.2.4 or higher. With TSM 6.2.4 and higher you can schedule the reorg process using the REORGBEGINTIME and REORGDURATION parameters to schedule the reorg within a window. You can see the details of the APAR here.

Tuesday, October 18, 2011

Simple TSM 6.2 Server Restore

I just completed a DR test and we had to restore one of our TSM servers from a Data Domain replicated copy. This was our first time restoring a TSM server from a replicated DD copy and after importing the replicated volumes and defining our initiators we set about restoring the TSM database. Our AIX server had been restored from an image (SysBack) and we had a current volhist and devconfig file so we began our restore. If you think that the restore from a Data Domain is not relevant to your environment because you use tape, think again. The Data Domain mimics an STK library with IBM drives and so we had to follow the same directions as anyone using tape backup.

To restore the TSM 6.x DB from tape you must have your volhist and devconfig files. You will need to modify the devconfig so that the only lines are those defining the devclass, server name, and server password; all other lines should be deleted. Then you need lines defining a manual library, a tape drive, and a line defining a path to the drive (which for us was an LTO3 drive).

DEFINE LIBRARY MANLIB LIBTYPE=MANUAL
DEFINE DRIVE MANLIB DRIVE1 ONLINE=YES
DEFINE PATH TSMSERV1 DRIVE1 SRCT=SERVER DESTT=DRIVE LIBR=MANLIB DEVICE=/dev/rmt1 ONLINE=YES

Note: Do not define an element address or serial with the drive, TSM will detect these when you run the DSMSERV RESTORE DB command.

When running the DSMSERV RESTORE DB command TSM will start up and query the devconfig file to retrieve the information on the devclass, drive, library type, server name, and password. Once TSM has successfully queried the tape drive it will query the volhist file for the most current DB backup volume depending on whether you are restoring to the most current date or to a specific point in time. When TSM has identified the volume to use it will prompt you to mount the tape. When I saw the mount I went into the Data Domain web based GUI and moved the DB backup volume from its "virtual slot" to the drive that is /dev/rmt1. Once the tape was mounted, TSM was able to recognize the tape had been loaded and began restoring the DB. If more than one tape is required to complete the restore TSM will prompt you for each tape. With the library web GUI available you can move the tapes as needed and accomplish the restore. Once the restore completes you can bring TSM back up and audit/fix anything that could be out of sync. With the switch to DB2 I was expecting a little more work to get TSM back up and running, but surprisingly it was quite simple.

Now if you don't have a SysBack of your TSM server the rebuild can take a lot longer and requires you to recreate some of the DB2 dependent files. I might have to do a BRM restore without an image in the near future and if I do I'll post a step by step process for everyone. If anyone has already done this and would like to post the process on TSMAdmin let me know.

Thursday, September 15, 2011

Double The Trouble

So the TSMDBMGR stanza was accidentally removed from a TSM 6.x server and the log space filled because DB backups could not be run. So upon bringing TSM back up (which is a nightmare with DB2) TSM support let my coworker know she had to take two DB backups to clear the logs. OK, so I have heard this since 6.x came out but it turns out the reason is due to a bug in TSM that they have not cleared up. So IBM's solution for now is that two DB backups clear the logs. You think that would be fixed by now.....go figure!

Wednesday, November 3, 2010

TSM 6.1...Finally!!!

So I have landed on my feet as a contractor and where I am currently working they are migrating to 6.1.4 TSM from 5.x versions. In their testing they found a 36GB DB took 14 hours for the complete migration of the DB. This time includes completing storage pool migration, stopping all processes, and bringing down the original TSM server clean. So I am not looking forward to the large 100+GB DB's. I also find it interesting how many companies have not migrated to version 6. The only reason they are going to 6.1.x is that it has been tested and 6.2 has not (their testing process is long and tedious). So what has everyone else seen as their average migration time from 5.x to 6.x?

Thursday, May 14, 2009

TSM 6.1 Upgrade - Need To Know!

So In researching the TSM 5.5 to 6.1 upgrade I have come across a number of issues that should have been compiled into a complete list to keep admins informed. So here it goes.

Things to know about TSM 6.1:

Although stated that the 6.1 DB should be the same size as the 5.5 DB the TSM community is claiming as much as 4x the space is required
It does not support RAW volumes for the TSM DB and Log
It has added archive logging to the existing log process (i.e. more disk space required to run)
It cannot generate or define backupsets at this time
It does not support NAS or Backupset TOC's at this time
It will not allow a DB upgrade if TOC's or Backupsets exist on the current TSM server
NQR (No Query Restore) is now slower due to how 6.1 processes the data before sending it to the client

I have been hearing of upgrades taking an extremely long time so be aware that if doing an upgrade the TSM source instance has to be down to allow the upgrade when doing it across the LAN or on the same server. Even with the media method your source instance has to be down to perform the DB dump since 6.1 cannot use a standard TSM DB backup.

Tuesday, May 5, 2009

TSM 6.1 Upgrade - FYI

For those of you looking to upgrade your current TSM instances to 6.1 take note of this issue with upgrading the DB.

At this time a database containing backup sets or tables of contents (TOCs) cannot be upgraded to V6.1. The database upgrade utilities check for defined backup sets and existing TOCs. If either exists, the upgrade stops and a message is issued saying that the upgrade is not possible at the time. In addition, any operation on a V6.1 server that tries to create or load a TOC fails.

When support is restored by a future V6.1 fix pack, the database upgrade and all backup set and TOC operations will be fully enabled.

I haven't heard if this will be fixed in the first patch of 6.1 but keep it in mind when considering upgrading your system rather than starting from scratch.

Tuesday, April 14, 2009

Don't Miss - TSM V6.1 DB Upgrade Webcast

If you are looking to move to TSM V6.1 in the near future don't miss this webcast on April 28th at 11:00am EST. This is part one in what I believe is a two part webcast (could be more than two). You can either click on the title of this post or find the sign up here.

Friday, December 14, 2007

60 seconds of tips

Good morning comrades

After running a database restore you need to run an audit on your storage pools so they are consistent. Failure to do so may results in lost space and errors in your log.
Read more

How do you know from day to day the throughput of your client backups? Here we take a glimps at a method which can help you review performance with next to no effort.
Read more

Enjoy the weekend

Thursday, October 18, 2007

Good De-Deuplication Questions

An anonymous reader posted the following comments about the De-Duplication post a few days ago which I thought were good enough to post on the main page and let everyone read them. In the future I hope more of you would post with your name so we can give credit where credit is due. Anyone currently using a De-Duplication product, we would love to hear from you.

I have serveral issues with Backup and DeDupe (most of which are TSM related).

First of, why are people retaining so much data within TSM (i.e. the retention period is increasing). TSM is something that is supposed to be used in response to a data loss event. In other words, data is lost either through hardware failure, logical failure or because of human error and we turn to TSM to recover it.. but an increasing number of people are using it as a filing cabinet for data placing infinite retention on data. I don't think TSM was truly designed to do this. I see this as more of data management function akin to HSM and Archiving.. Yes, TSM has archiving but I think it's pretty weak in terms of functionality, it really needs to be married to an application that can do better indexing and classification in order to make this powerful.

So.. if the data you are storing within TSM cannot truly be used to support a data recovery function, then why keep it? Are you really going to restore a file from 180days ago because someone suddenly discovered that they deleted a file 4 months ago that they now need. I haven't seen much of that, occurences are typically rare.. yet the outlay to stay consistent on such a policy could be expensive. Forget about just the cost of the media - there's much more to it than that.

DeDupe becomes more efficient when you retain more data in the backups, but more versions = bigger TSM DB which often means that you have to spawn another TSM server to keep things well maintained.

In TSM land we're very conscious of the TSM DB.. It's the heart of the system and we go to great lengths to improve it's performance and protect it. In the event that it does become corrupt we can roll it back using TSM DB backup and Reuse delay. The DeDupe engine must also have a index/db.. what do we do if it becomes corrupt? If it does, how do we insure that we can get it synched up with TSM again?

How well will DeDupe work when data is reclaimed? TSM rebuilds aggregates when data is reclaimed, so how much work is that for the DeDupe engine and what's the I/O pattern going to be like on the back end storage.

How does this work in terms of recovery both operationaly and in terms of a disaster? Single file restores, probably great. Recovery of lots of files, probably not too bad.. when recovering lots of small files the client is typically the bottleneck.. not sure that the dedupe engine would impact it much. What about recovery of a large DB? This one I am more skeptical of. We can get great performance from both tape or disk.. potentially the best performance from tape provided that we can get enough mount points and the client isn't bottlenecked in some way. But what if it's deduped on disk.. will the data stream from disk or will we get more random I/O patterns. If it's a 10TB that needs to be recovered, I think that still equates to 10TB that needs to be pushed through TSM, even if it's been deduped to 2TB on the physical disk behind the dedupe engine.

What about DR where you want to recover multiple clients at the same time. Good storage pool design can alleviate some of the issues with tape contention, disk may offer some advantages because the media supports concurrent access (but bear in mind that TSM may not - depending on how you configured it).. If that disk is deduped though, then potentially you have less spindles at your disposal. That could mean more I/O contention and perhaps more difficulty in streaming data.

Monday, June 18, 2007

Oracle RMAN Catalogue Cleanup

This is an update to a story posted back in August 2005

Why do people love Oracle? When I hear mention of Oracle I think of Luke Skywalker when he saw the Millennium Falcon, "What a piece of junk!" Like the Falcon it looks clunky, breaks down easily, and has the most temperamental behavior. When it's running, however, it screams. The problem is that DBA's and/or the RMAN catalogue sometimes don't do appropriate cleanup. If the DBA's are doing their job they should be using the tdposync syncdb command to do cleanup with TSM. If you cannot get them to do this or it doesn't seem to be working correctly you can use a manual process on the TSM side. If you want to make sure that a particular node is performing cleanup within TSM run the following select command -

select object_id from backups where node_name=[TDP NODENAME] and
backup_date < '2007-06-01 00:00:00'

This can be redirected to a file then used later to delete with an undocumented delete command. If there is data going further back than your retention requirements state then you have a problem with the DB cleanup.

I am posting the undocumented TSM individual backup object delete command here, but remember any deletion from the TSM DB is done AT YOUR OWN RISK!

delete object 0 [Object ID Number]

It's unsupported because Tivoli doesn't trust you to not screw stuff up, and although I don't think you will, I understand their concern. Put this into a shell script and you can process thousands of objects and do a large amount of cleanup in a short ammount of time.

Wednesday, July 5, 2006

Import/Export Question!

Can someone explain to me why moving data (exporting/importing) from one TSM server to another TSM server is such a pain in the neck? Here is my scenario, I have some servers that moved from one location to another, network wise, and they now need to backup to a different TSM server. Both servers use the same media type (LTO-3). So why is it I have to either copy all the data across the network to the new server as it creates new tapes, or dump it all to tape(s) and then rewrite it to new tape(s) when imported? My question is this - why can’t I just export the DB info and pointers for the already existing tape to the new TSM server from the old? Why can’t the old server “hand-over” the tape to the new TSM server? It seems like a lot of wasted work to constantly have to copy the data (server to server export/import) or dump it to tape and then write the data to new tapes. I think the developers ought to work on a way of doing this. I would also think that this process could be done on a DB level so you could in a sense reorg the DB without the long DB Dump/Load and audit process, and the tapes would be handed over to the new TSM server. No rewrites necessary.

Wednesday, December 14, 2005

Unload/Load of DB pitfall

I am writing this article to save others from the situation we had at one of our customers. In the beginning a request came for replacing the RAID array which hosts the primary storage pools. I taked that as an excellent opportunity for reorganizing the DB which was 66GB with cca 50% utilization (after a massive deletion in last month or so). There was a planned downtime needed for copying the data from one RAID to another so I thought I had a plenty of time for unload/load.
The "dsmserv unloaddb" utility went smoothly - taking about 4hrs, creating cca 20GB dump (which was expected as the ESTIMATE DBREORGSTAT revealed we can save cca 7GB). Then I proceed with "dsmserv loadformat" and "dsmserv loaddb" (taking approx. 5hrs). So far everything was seemingly OK. I was able to start the server, reduce the DB and run some tests. The problem appeared when I tried to create a new stgpool

12/13/05 12:42:21 ANR2017I Administrator HARRY issued command: DEFINE STGPOOL archivedisk disk (SESSION: 3884)
12/13/05 12:42:21 ANR0102E sspool.c(1648): Error 1 inserting row in table "SS.Pool.Ids". (SESSION: 3884)
12/13/05 12:42:21 ANR2032E DEFINE STGPOOL: Command failed - internal server error detected. (SESSION: 3884)

Google revealed that this error is a known one and is fixed in 5.3.2.1 (we was on 5.3.2.0 - upgraded to this level just a few days before the fix appeared .. bad luck). Basically - there is an error in LOADFORMAT/LOADDB corrupting some DB values

http://www-1.ibm.com/support/docview.wss?uid=swg1IC47516

I did not want to go for loaddb again (there were changes already made to the DB, some migrations were run (luckily for me they were from stgpool with caching set on) .. etc.
So I tried to run the "dsmserv auditdb inventory fix=yes" - IBM says it can help if you do it after a loaddb - long story short - after 8hrs of audit (with message that some vaules were corrected) the problem was still there ...
So the only option was to apply the patch and do use loaddb again - so another 4hrs of waiting - and now it seems to work (still running tests). So watch for this problem and check your TSM level before reorganizing your DB.

Sunday, October 30, 2005

Managing RAW Volumes

Well I was recently called by another IBMer and asked how to use RAW volumes. The person called to ask why sometimes DSMFMT will format quite fast on one machine then take forever on another machine. Well one thing you have to understand about DSMFMT is that it's taking that file and making the space within it RAW. If you've ever looked inside an unused TSM volume you'll see it is a text file filled with ADSM over and over (they might have changed the fill but when I was teaching TSM thats what we saw). So why use RAW instead of actual files (other than files being a redundant process)? FAST! EASY! and when speed in DR is key it's the only way to go. It's actually easier than one would think and with a little script you can manage your RAW volumes and hdisks easily. I'll post the script along with a script to querry the serial and WWN of your tape drives. These two scripts come courtesy of Hari Patel my co-worker who is a PERL mad man. (Download tar/zip)

Saturday, June 11, 2005

The Case For Raw Volumes!

If you are serious about TSM server rebuild times and want the quickest way to get up and running then I suggest you look into raw logical volumes for all your TSM DB, Log, and storage needs. Of course if you are running on NT I can't say I know of any way TSM can use raw, but in our AIX shop we live by raw volumes. The creation time is quick and with a little script I can have my volumes created and ready for the DB restore in no time. I have been down the road of DSMFMT and know how long large volumes can take to create and since TSM does not like more than 16 volumes some older Unix servers can take time to format. The other nice thing about raw volumes is if the server crashes its rare, except for disk failure, for volume corruption to occur. I have had too many dirty super blocks to deal with in my time, and I don't miss them. Remember, all TSM is really doing with DSMFMT is creating a file and in a way converting it back into raw. So why do the extra steps, save yourself some time if you ever are in a true DR situation.