Monday, January 23, 2012

TSM Backup Issue

Anyone had an issue where their backups were extremely slow and their Interrupts were huge? I've got 400GB DB's taking 40hrs to backup over a 4 port Ether-channel connection. No errors in my AIX errpt and the network guys are telling me they don't think it's them. Any suggestions on what to look at are appreciated.  Below is an example when I run entstat.

ETHERNET STATISTICS (en8) :
Device Type: IEEE 802.3ad Link Aggregation
Hardware Address: 00:14:5e:e7:26:41
Elapsed Time: 9 days 19 hours 20 minutes 35 seconds

Transmit Statistics:                          Receive Statistics:
--------------------                          -------------------
Packets: 5470416553                           Packets: 24510516113
Bytes: 440661650021                           Bytes: 32245892708954
Interrupts: 0                                 Interrupts: 6027433898
Transmit Errors: 0                            Receive Errors: 691
Packets Dropped: 0                            Packets Dropped: 0
                                              Bad Packets: 0
Max Packets on S/W Transmit Queue: 298
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 355

Broadcast Packets: 8786                       Broadcast Packets: -1346793420
Multicast Packets: 225928                     Multicast Packets: 136913
No Carrier Sense: 0                           CRC Errors: 0
DMA Underrun: 0                               DMA Overrun: 691
Lost CTS Errors: 0                            Alignment Errors: 0
Max Collision Errors: 0                       No Resource Errors: 0
Late Collision Errors: 0                      Receive Collision Errors: 0
Deferred: 141004                              Packet Too Short Errors: 0
SQE Test: 0                                   Packet Too Long Errors: 0
Timeout Errors: 0                             Packets Discarded by Adapter: 0
Single Collision Count: 0                     Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 355

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 1701737521
Driver Flags: Up Broadcast Running
        Simplex 64BitSupport ChecksumOffload
        PrivateSegment LargeSend DataRateSet



Thursday, January 19, 2012

TSM Device Handling in Windows

I have to say that TSM on Windows is good for small to medium size solutions and I'm not ANTI-Windows. I just cringe when dealing with devices in Windows. I hate its driver handling and most of all I hate how Windows presents library and tape drives. So I was working with a TSM server where the tape library would not initialize. It was an older SCSI library, not Fiber. I tried restarting the library, the TSM server, reloading drivers, and updating the drivers and nothing worked.

Duh! <Head Slap!!!>

That's because on the initial reboot that caused the library to stop communicating the device ID's changed. So the library went from LB1.0.0.2 to LB1.0.0.3. Nobody touched the SCSI card or library but the device definition changed! Seriously?  All the drives changed to mtX.X.X.3 also. Now I don't use Windows all that much but luckily I remembered the TSMDLST program that is installed with the TSM server. It's under C:\Porgram Files\tivoli\tsm\console and will pull the information from Windows for you in a readable format. So next time your library goes offline make sure you use it to compare the device definitions and serials with what is defined in TSM. It will save you a lot of time and headache. You can find more information on issues like this here.

Wednesday, January 18, 2012

TSM Client Scheduler Issue

I just recently had an issue with a handful of TSM clients that would not run their backups. The clients all backup to a TSM 5.5.2 server and were all running Windows 2008. The clients use TSM version 6.2.3. The five clients had all been missing their backups for days and what makes the situation more interesting is that there are other Windows 2008 servers with this version of TSM installed and they are all running their schedules without issue.

When reviewing the TSM Schedule log the scheduler listed that it had received the schedule info and was waiting for the TSM server to initiate the schedule. The TSM server never made an attempt to contact the clients in question and never showed any errors other than ANR2578W stating the client missed its schedule. There were no errors in the error log and not much to go by from the TSM server activity log. Even though the TSM client backs up over the public network I switched to polling mode to see if client based initiation of the backup would work. It didn't! The TSM client scheduler would receive the schedule upon polling the TSM server but would never execute it. So now what? I added the TCPCLIENTADDRESS and TCPCLIENTPORT and switched back to SCHEDMODE PROMPTED, still the scheduler would not run backups.

Now I was getting frustrated. I removed the scheduler service and redefined it using dsmcutil and voila, the schedule ran...ONCE! After the initial schedule ran the previous problem returned. Schedules were not running and the TSM server would not show any errors saying it could not contact the client. It just would not run the schedule. Well that left me no choice but to call support. IBM support's response was to make sure the TCPCLIENTADDRESS and TCPCLIENTPORT were defined in the dsm.opt and also to define the client HLADDRESS and LLADDRESS on the TSM server? Define the HL and LL addess? TSM gets that when the client connects doesn't it? Yes and No! It appears that without the optional setting the TSM server can have issues contacting some clients. Why? No idea, but adding the HL and LL address did the trick and the backups have been running without issue since.

How many of you define the HL and LLADDRESS when registering nodes? I've never suspected it was needed until now.

Monday, December 19, 2011

DB2 Doesn't Make A Difference

I've been working with some IBM reps/consultants lately, and I find it kind of funny how they talk about TSM. We were discussing the issue with some queries to the TSM DB being so hard to process that many times they don't return any data, when the IBM rep said "With DB2 that wont happen." I laughed and said, "DB2 didn't help that much." For example try something like this and see how long it takes to get a response.


select  cast(sum(b.file_size/1073741824) as decimal(18,2)) AS GB_SIZE from backups a, contents b where a.node_name in ('DEV01_ORA','DEV02_ORA','DEV03_ORA','PRD01_ORA','PROD02_ORA') and a.backup_date < '2011-11-01 00:00:00' and a.object_id=b.object_id

I'm running this query to determine the amount of space I would free up if I deleted old oracle backup objects that they DBA's never reconciled through RMAN. I ran it over 30 minutes ago.....still waiting! The problem is the schema has not changed enough in the TSM table structure to make some select statements run any better than in pre-DB2 days. Anyone else seen this? 


(Yes! I know if I used a specific NODE_NAME then TSM would probably return some data, but handles queries 1000x times more complex than these in the non-TSM world)

Tuesday, December 6, 2011

TSM 6.x Client Deployment

Has anyone used TSM's client deployment process/function?  How well does it work? Is it worth the effort? I have a lot of servers we need to install TSM too and would like to utilize it if it will work.

Wednesday, November 16, 2011

Tivoli Storage Manager Reporting and Monitoring v6.3

This is a query from the TSM v6.3 agent:

select node_name, count(distinct volume_name) from volumeusage a, stgpools b where (a.stgpool_name=b.stgpool_name) and devclass in (select DEVCLASS_NAME from devclasses where devtype in ('3570','3590','3592','4MM','8MM','DLT','DTF','ECARTRIDGE','GENERICTAPE','LTO','QIC')) group by node_name

Could you run it on a TSM v5.x.x.x productive system for me!

Do you get any result in 10 minutes?

Tuesday, October 18, 2011

Simple TSM 6.2 Server Restore

I just completed a DR test and we had to restore one of our TSM servers from a Data Domain replicated copy. This was our first time restoring a TSM server from a replicated DD copy and after importing the replicated volumes and defining our initiators we set about restoring the TSM database. Our AIX server had been restored from an image (SysBack) and we had a current volhist and devconfig file so we began our restore. If you think that the restore from a Data Domain is not relevant to your environment because you use tape, think again. The Data Domain mimics an STK library with IBM drives and so we had to follow the same directions as anyone using tape backup.

To restore the TSM 6.x DB from tape you must have your volhist and devconfig files. You will need to modify the devconfig so that the only lines are those defining the devclass, server name, and server password; all other lines should be deleted. Then you need lines defining a manual library, a tape drive, and a line defining a path to the drive (which for us was an LTO3 drive).


DEFINE LIBRARY MANLIB LIBTYPE=MANUAL
DEFINE DRIVE MANLIB DRIVE1 ONLINE=YES 
DEFINE PATH TSMSERV1 DRIVE1 SRCT=SERVER DESTT=DRIVE LIBR=MANLIB DEVICE=/dev/rmt1 ONLINE=YES

Note: Do not define an element address or serial with the drive, TSM will detect these when you run the DSMSERV RESTORE DB command.

When running the DSMSERV RESTORE DB command TSM will start up and query the devconfig file to retrieve the information on the devclass, drive, library type, server name, and password. Once TSM has successfully queried the tape drive it will query the volhist file for the most current DB backup volume depending on whether you are restoring to the most current date or to a specific point in time. When TSM has identified the volume to use it will prompt you to mount the tape. When I saw the mount I went into the Data Domain web based GUI and moved the DB backup volume from its "virtual slot" to the drive that is /dev/rmt1. Once the tape was mounted, TSM was able to recognize the tape had been loaded and began restoring the DB. If more than one tape is required to complete the restore TSM will prompt you for each tape. With the library web GUI available you can move the tapes as needed and accomplish the restore. Once the restore completes you can bring TSM back up and audit/fix anything that could be out of sync. With the switch to DB2 I was expecting a little more work to get TSM back up and running, but surprisingly it was quite simple.

Now if you don't have a SysBack of your TSM server the rebuild can take a lot longer and requires you to recreate some of the DB2 dependent files. I might have to do a BRM restore without an image in the near future and if I do I'll post a step by step process for everyone.  If anyone has already done this and would like to post the process on TSMAdmin let me know.