I was having performance issues with a couple TSM 6.2 servers and could not find anything that pointed to the issue. I'm not one to call support unless I'm totally stumped and cannot find help through the web, but this time I finally relented and made the call. The issue was problems with backups failing repeatedly and when researched we were getting internal server errors along with DB table errors. IBM support asked for some DB2 log files and within 30 or so minutes had identified the problem.
TSM has a server option I have never used or heard of that had somehow been set that adversely affected all backups. Somehow the option DBMEMPERCENT was set in the dsmserv.opt file. This option tells TSM what percentage of the overall server's memory it can allocate for use. The default is AUTO and would have been fine, but somehow DBMEMPERCENT was set to 10 in the dsmserv.opt. Which means out of 16GB of RAM I was only using 1.6GB?!?!? How'd that happen? I didn't set it, none of my coworkers remember setting it, so where did it come from? IBM support stated the default was AUTO so the option was manually set. Since I had never used this option and its 6.x specific, I never would have looked for it. Good thing I called support.
Monday, April 30, 2012
Friday, April 27, 2012
db2adutl Error
I recently had an issue with a client and storage agent upgrade that resulted in problems with the db2adutl utility being unable to return any data. Here's the errors:
I pretty much knew what caused the error, the problem was how to fix it. The cause was due to an upgrade of the TSM client on the DB2 server that (after further investigation) could not support the more current TSM Storage Agent. An OS patch would have to be applied, however, that could not be done without an outage. Our only option was to roll the client back to a supported TSM client / storage agent level. The problem was that while attempting to figure out a better solution than rolling back the client the DB2 database had run a backup. When the client API was rolled back it could not "interpret" the new API's backup causing the db2adutl errors.
Support suggested renaming the node or the file space (file space is better since you don't have to stop and start db2 to reset the password as you would with the new node name). I didn't want to have to do either. The backups taken since the rollback were good, but db2adutl couldn't return the list of backups as long as the objects done with the newer API were still present. Luckily I have been dealing with Oracle admins long enough to have a solid grasp on manually deleting objects on the TSM server. When Oracle DBA's neglect their RMAN duties, I pulled out my trusty delete object command and I was able to remove the backup objects from the period of time that the new API had been used. Once completed db2adutl was able to immediately see it's backups and return a list of what was available.
Retrieving
Error: Begin query image failed with TSM return code 136
Error: Get next image failed with TSM return code 2041
I pretty much knew what caused the error, the problem was how to fix it. The cause was due to an upgrade of the TSM client on the DB2 server that (after further investigation) could not support the more current TSM Storage Agent. An OS patch would have to be applied, however, that could not be done without an outage. Our only option was to roll the client back to a supported TSM client / storage agent level. The problem was that while attempting to figure out a better solution than rolling back the client the DB2 database had run a backup. When the client API was rolled back it could not "interpret" the new API's backup causing the db2adutl errors.
Support suggested renaming the node or the file space (file space is better since you don't have to stop and start db2 to reset the password as you would with the new node name). I didn't want to have to do either. The backups taken since the rollback were good, but db2adutl couldn't return the list of backups as long as the objects done with the newer API were still present. Luckily I have been dealing with Oracle admins long enough to have a solid grasp on manually deleting objects on the TSM server. When Oracle DBA's neglect their RMAN duties, I pulled out my trusty delete object command and I was able to remove the backup objects from the period of time that the new API had been used. Once completed db2adutl was able to immediately see it's backups and return a list of what was available.
Thursday, April 26, 2012
TSM Power Admin
I was just made aware of TSM Power Admin by a fellow adsm.org contributor and must say I like some of the features available. I hope to be able to test it soon, but just the ability to run commands against all the servers from the command line (without setting up a server group) is a nice touch. If I do test Power Admin I'll post a review like I did for TSMManager years ago. (Wow it's been that long?!)
Monday, April 23, 2012
TSM 6.1 & 6.2 DB2 Issue
I had a TSM server crash mutliple times over the course a week and after working with Tivoli support and sending them the core files, it was determined that the following error was the cause. Interesting, in that I never thought about the connections from TSM to the DB2 DB. So to summarize, the current connection from TSM to DB2 is not a TCP based but IPC and AIX has a limitation of 1024 IPC connections to DB2 otherwise the application in question (TSM in this case) can crash. The following link has directions on how to convert TSM to DB2 connections to TCP to eliminate this issue.
Friday, March 30, 2012
TSM Server 5.5, 6.1, & 6.2 Bug
I figured I best share with you all a bug I have experienced with TSM Server version 6.2.2 that has resulted in the TSM instance core dumping. The problem stems from TSM being unable to mount a scratch tape with, in our case, a TSM storage agent. When the mount fails it seems TSM lists an internal server error then core dumps a few seconds later. The fix is in 6.2.3 and higher. It also affects 5.5 and 6.1 versions, check this technote for further details.
Warning: This update has a bug also concerning Data Domain that causes errors if the defined VTL has more than 4400 slots. So when we upgraded the server to fix the LAN-Free bug we inadvertently encountered this bug.
Update: Here is the APAR listing for the Data Domain VTL slot bug in TSM 6.2.x
Warning: This update has a bug also concerning Data Domain that causes errors if the defined VTL has more than 4400 slots. So when we upgraded the server to fix the LAN-Free bug we inadvertently encountered this bug.
Update: Here is the APAR listing for the Data Domain VTL slot bug in TSM 6.2.x
Wednesday, February 1, 2012
TSM Server 6.x - Move OS's
An interesting question was asked on ADSM.org on whether TSM 6.x could be moved from Windows to AIX since the DB is now DB2. I know DB2 has the db2move utility, but would TSM support this? Could you run that and copy all your needed config files? My thinking is that TSM still has too many distinct nuances that set it apart from real DB2 that it would not work, but I haven't heard whether it is supported. Anyone out there tried it?
Monday, January 23, 2012
TSM Backup Issue
Anyone had an issue where their backups were extremely slow and their Interrupts were huge? I've got 400GB DB's taking 40hrs to backup over a 4 port Ether-channel connection. No errors in my AIX errpt and the network guys are telling me they don't think it's them. Any suggestions on what to look at are appreciated. Below is an example when I run entstat.
ETHERNET STATISTICS (en8) :
Device Type: IEEE 802.3ad Link Aggregation
Hardware Address: 00:14:5e:e7:26:41
Elapsed Time: 9 days 19 hours 20 minutes 35 seconds
Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 5470416553 Packets: 24510516113
Bytes: 440661650021 Bytes: 32245892708954
Interrupts: 0 Interrupts: 6027433898
Transmit Errors: 0 Receive Errors: 691
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 298
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 355
Broadcast Packets: 8786 Broadcast Packets: -1346793420
Multicast Packets: 225928 Multicast Packets: 136913
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 691
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors: 0
Deferred: 141004 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 355
General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 1701737521
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
PrivateSegment LargeSend DataRateSet
Subscribe to:
Posts (Atom)