Wednesday, December 5, 2012

TSM Explorer v6.3

Siberia Software has released TSMExplorer (ver 6.3). TSMExplorer and TSMExplorer Collector are an easy and powerful product for managing and monitoring IBM Tivoli Storage Manager. The product allows you to manage many TSM servers from a single sign-on and supports Linux, Windows, and Mac OS X.

Check out the demos:

or go to the main site:
http://www.s-iberia.com

Thursday, November 15, 2012

STORServer Console 3.0 Released

If you are looking for a TSM management console STORServer has released v3.0 of their console. I have not had the opportunity to use it, but if any reader out their has experience with the product and would like to post a review I will be more than willing to allow it. You can read more about the STORServer Console here.

Tuesday, October 30, 2012

Time For New Phone

My phone contract is up and I want to replace my iPhone 4 with something new. I was going to just replace it with an iPhone 5 but with TSMManager having an Android app for mobile administration I am tempted to move to a Galaxy S III. I have even considered a Galaxy Note 2. I have 3 Android tablets and I am very familiar with ColckworkMod Recovery and playing with different devs. Any suggestions are welcome.

Monday, October 8, 2012

TSM 6.2.4 Binding Issue

We've identified that our TSM 6.2.4 server is not binding files to the default management class but is assigning the node data to the management class with the highest retention setting, as if the default management class was not set. Attempts to update the policy's default management class, reactivate the policy, and rerun a backup have not yielded any rebinding on the TSM side. What does work is placing an INCLUDE * FSMGMT at the top of the include/exclude list. Pretty weird behavior and would explain why we seem to be eating through more storage than we should be. Anyone else seeing this? It might be an isolated incident, but we are checking all our TSM 6.2.x servers to verify. The best way to check is to identify the management class with the largest retention setting for backups and query a nodes backup in that domain to see if it is using it to bind the data.

SELECT * FROM BACKUPS WHERE NODE_NAME='<NAME HERE>' and CLASS_NAME='<MGMT NAME WITH HIGHEST RET IN DOMAIN>'

I have a PMR open with IBM and will post what I find.

Tuesday, September 4, 2012

Solaris LAN-Free



I recently had to configure a Solaris box for LAN-Free and had to dig up my old documentation. Here's what I did to get LAN-Free to work after loading the drivers and the devices not being found in the tape list file. These directions are for IBM LTO drives only.

http://www-01.ibm.com/support/docvie...S7002972&aid=1
Adobe Reader page 137 (actual doc page 117)
We need to make sure the native "st" driver is not loaded. Run

rem_drv st

to unload. And comment out everything in /kernel/drv/st.conf
then we need to run the following:

rm /dev/rmt/*                 
removes any tape drive definitions in the rmt folder. Do this only if the IBM tape drives are the only drives used on the server

/opt/IBMtape/tmd –s           
Stops the Tape Monitor Daemon
/usr/sbin/rem_drv IBMtape     
Removes the IBMtape driver
The commands to reload the device driver are:

/usr/sbin/add_drv -m ’* 0666 bin bin’ IBMtape
This reloads the driver but does not set the correct driver type

/usr/sbin/update_drv -av -i ’"scsiclass,01.vIBM.pULTRIUM-TD3"’ IBMtape
This will add the drive type to the /etc/driver_aliases file.
/opt/IBMtape/tmd
Reloads the IBM Tape Monitor Daemon

Then run/opt/IBMtape/tapelist –Ac to see if the drives are discovered correctly.

Wednesday, August 8, 2012

NFS Mount Issue

I recently had a number of AIX server backups miss due to the backup hanging when doing its initial filesystem listing. At some point within the last day the mount's source was rebooted and all the mount points went bad. The problem is that TSM sits there trying to query it even though the default action is to not backup NFS mount points. So I had to log into each server, umount -f the file system, remount it, and then TSM was able to run successfully. TSM does not allow a DOMAIN -ALL-NFS so no matter what I do TSM is going to hang on the listing of file systems. Of course doing a df on the server hangs also, so it's not just a TSM issue. Anyone else ran across this issue?

Tuesday, July 31, 2012

TSM 5.5 to 6.2.4 Upgrade

I recently did a network based upgrade of a 270GB DB. Previously TSM could take days to due the upgrade of a DB so large, but my experience with performing upgrades for other companies and some suggestions from my Tivoli Consultant had me convinced it could be done in under 24 hrs. The network method is kind of a misnomer here since we used the loop-back address, so the upgrade was done in place on the server. After allocating additional disk space (a lot of disk space) and defining the user ID TSM would run under, I started the upgrade process by running the dsmupgrd preparedb process. The upgrade took less than an hour and completed successfully. I copied our devconfig and volhist files to an alternate location and then started the insert under the id of the new TSM 6.2.4 instance. I then switched back to root, cd'd to the upgrade directory, set my environmental variables accordingly, and then started the extract.

The extract took 5 hours and ran without issue. The insert ran for 11 hours and completed without errors. The overall time from start of the upgrade to end was 13 hours. My reason for testing was to verify the time frame needed so the applications that rely on TSM for log offloading could add to their additional log space ahead of time. If anyone has suggestions on how I can make the extract/insert performance even better feel free to post a comment.

Friday, May 25, 2012

TSM 6.2.3 and Lower DB Reorg Issue

One of our TSM servers started to experience large numbers of "ANR0530W - internal server error detected messages."  With further investigation we identified that these were related to ANR0162W which are DB deadlock or timeout problems. These errors were causing our DB TDP backups to fail and I eventually called support. I provided our db2diag.log file and a dump of our actlog for the last 24 hrs. and they found the issue to be the DB2 reorg process was locking records and tables and creating the deadlock situation. The problem was compounded in that with TSM versions 6.2.3 or lower the DB2 reorg process cannot be schedule so it can kick off during backups or processes that can create these deadlocks. So to resolve the issue I was told I needed to upgrade our TSM server to 6.2.4 or higher. With TSM 6.2.4 and higher you can schedule the reorg process using the REORGBEGINTIME and REORGDURATION parameters to schedule the reorg within a window. You can see the details of the APAR here.

Thursday, May 10, 2012

Client Lockdown

A directive came down for 75+ windows servers to be "locked down" when it came to accessing their backup data. The TSM client will allow anyone to open the client GUI and restore files with which they have permission. So I considered the various ways to keep anyone from accessing TSM and restoring data; we could set permissions on the GUI and command line to not allow executing unless in the admin group, We could delete the command line and GUI executable, or we could simply set the SESSIONINIT option on the server.  After weighing the options the SESSIONINIT was the easier and most direct way to keep anyone but a TSM admin from restoring data.  Once SESSIONINIT is set the TSM client GUI, command line, and web GUI will not be allowed to initiate a session. All restores will have to be executed through a schedule from the TSM server. Of course you can temporarily turn SESSIONINIT off, but only a TSM admin can do so, making it easier to track who's accessed the data.


(Note: SESSIONINIT does not support the CAD)


The problem was how to update 75+ Windows servers options file and then restart the TSM Scheduler. So you can change any MANAGEDSERVICES options to WEBCLIENT using a client option set, but SESSIONINIT is another problem. As it turns out, if you set SESSIONINIT on the TSM server you have to put the HLAddress and LLAddress in the node definition on the server.  The client dsm.opt must have the TCPCLIENTADDRESS and TCPCLIENTPORT in the dsm.opt. What we didn't know was that we also had to put SESSIONINIT  SERVERONLY in the dsm.opt also. If you set SESSIONINIT on the TSM server and not on the client and the scheduler was defined with /validate:yes then you will get "password" errors and the scheduler will crash. The reason for this is that TSM does not allow client initiated sessions, but the scheduler when started is trying to validate its password. Since the scheduler can't validate the password it fails similar to when the password has not been set when using PASSWORDACCESS GENERATE


We had the TCP Client settings, but adding SESSIONINIT to all 75+ servers would have been a chore...unless you know how to use the Windows command prompt and dsmcutil. Here's how I added the SESSIONINIT option to all 75 servers.

Example of how to remotely add a line to the dsm.opt


c:\echo SESSIONINIT SERVERONLY >> \\WINSERVPRD20\C$\progra~1\tivoli\tsm\baclient\dsm.opt

I put the command for all 75 into a batch file and ran it looking for errors (not all our servers had TSM installed in the default location). Then I used the dsmcutil command to stop and start the TSM scheduler remotely.

Example of stopping and starting the TSM Scheduler service

c:\C:\progra~1\tivoli\tsm\baclient\dsmcutil stop /name:"TSM Client Acceptor" /machine:WINSERVPRD20
c:\C:\progra~1\tivoli\tsm\baclient\dsmcutil start /name:"TSM Client Acceptor" /machine:WINSERVPRD20

I needed to restart the scheduler on all 75 servers so once again I created a Windows command line batch file with the following commands in a list and it restarted all 75 quickly and easily from my own desktop. 


 

Monday, April 30, 2012

DBMEMPERCENT...Where'd That Come From?

I was having performance issues with a couple TSM 6.2 servers and could not find anything that pointed to the issue. I'm not one to call support unless I'm totally stumped and cannot find help through the web, but this time I finally relented and made the call. The issue was problems with backups failing repeatedly and when researched we were getting internal server errors along with DB table errors. IBM support asked for some DB2 log files and within 30 or so minutes had identified the problem.

TSM has a server option I have never used or heard of that had somehow been set that adversely affected all backups. Somehow the option DBMEMPERCENT was set in the dsmserv.opt file. This option tells TSM what percentage of the overall server's memory it can allocate for use. The default is AUTO and would have been fine, but somehow DBMEMPERCENT was set to 10 in the dsmserv.opt. Which means out of 16GB of RAM I was only using 1.6GB?!?!? How'd that happen? I didn't set it, none of my coworkers remember setting it, so where did it come from? IBM support stated the default was AUTO so the option was manually set. Since I had never used this option and its 6.x specific, I never would have looked for it. Good thing I called support.

Friday, April 27, 2012

db2adutl Error

I recently had an issue with a client and storage agent upgrade that resulted in problems with the db2adutl utility being unable to return any data. Here's the errors:


Retrieving

Error: Begin query image failed with TSM return code 136


Error: Get next image failed with TSM return code 2041



I pretty much knew what caused the error, the problem was how to fix it. The cause was due to an upgrade of the TSM client on the DB2 server that (after further investigation) could not support the more current TSM Storage Agent. An OS patch would have to be applied, however, that could not be done without an outage. Our only option was to roll the client back to a supported TSM client / storage agent level. The problem was that while attempting to figure out a better solution than rolling back the client the DB2 database had run a backup.  When the client API was rolled back it could not "interpret" the new API's backup causing the db2adutl errors.

Support suggested renaming the node or the file space (file space is better since you don't have to stop and start db2 to reset the password as you would with the new node name). I didn't want to have to do either. The backups taken since the rollback were good, but db2adutl couldn't return the list of backups as long as the objects done with the newer API were still present. Luckily I have been dealing with Oracle admins long enough to have a solid grasp on manually deleting objects on the TSM server. When Oracle DBA's neglect their RMAN duties, I pulled out my trusty delete object command and I was able to remove the backup objects from the period of time that the new API had been used. Once completed db2adutl was able to immediately see it's backups and return a list of what was available.

Thursday, April 26, 2012

TSM Power Admin

I was just made aware of TSM Power Admin by a fellow adsm.org contributor and must say I like some of the features available. I hope to be able to test it soon, but just the ability to run commands against all the servers from the command line (without setting up a server group) is a nice touch. If I do test Power Admin I'll post a review like I did for TSMManager years ago. (Wow it's been that long?!)

Monday, April 23, 2012

TSM 6.1 & 6.2 DB2 Issue

I had a TSM server crash mutliple times over the course a week and after working with Tivoli support and sending them the core files, it was determined that the following error was the cause. Interesting, in that I never thought about the connections from TSM to the DB2 DB. So to summarize, the current connection from TSM to DB2 is not a TCP based but IPC and AIX has a limitation of 1024 IPC connections to DB2 otherwise the application in question (TSM in this case) can crash. The following link has directions on how to convert TSM to DB2 connections to TCP to eliminate this issue.

Friday, March 30, 2012

TSM Server 5.5, 6.1, & 6.2 Bug

I figured I best share with you all a bug I have experienced with TSM Server version 6.2.2 that has resulted in the TSM instance core dumping. The problem stems from TSM being unable to mount a scratch tape with, in our case, a TSM storage agent. When the mount fails it seems TSM lists an internal server error then core dumps a few seconds later.  The fix is in 6.2.3 and higher. It also affects 5.5 and 6.1 versions, check this technote for further details.

Warning: This update has a bug also concerning Data Domain that causes errors if the defined VTL has more than 4400 slots. So when we upgraded the server to fix the LAN-Free bug we inadvertently encountered this bug.

Update: Here is the APAR listing for the Data Domain VTL slot bug in TSM 6.2.x

Wednesday, February 1, 2012

TSM Server 6.x - Move OS's

An interesting question was asked on ADSM.org on whether TSM 6.x could be moved from Windows to AIX since the DB is now DB2. I know DB2 has the db2move utility, but would TSM support this? Could you run that and copy all your needed config files? My thinking is that TSM still has too many distinct nuances that set it apart from real DB2 that it would not work, but I haven't heard whether it is supported. Anyone out there tried it?

Monday, January 23, 2012

TSM Backup Issue

Anyone had an issue where their backups were extremely slow and their Interrupts were huge? I've got 400GB DB's taking 40hrs to backup over a 4 port Ether-channel connection. No errors in my AIX errpt and the network guys are telling me they don't think it's them. Any suggestions on what to look at are appreciated.  Below is an example when I run entstat.

ETHERNET STATISTICS (en8) :
Device Type: IEEE 802.3ad Link Aggregation
Hardware Address: 00:14:5e:e7:26:41
Elapsed Time: 9 days 19 hours 20 minutes 35 seconds

Transmit Statistics:                          Receive Statistics:
--------------------                          -------------------
Packets: 5470416553                           Packets: 24510516113
Bytes: 440661650021                           Bytes: 32245892708954
Interrupts: 0                                 Interrupts: 6027433898
Transmit Errors: 0                            Receive Errors: 691
Packets Dropped: 0                            Packets Dropped: 0
                                              Bad Packets: 0
Max Packets on S/W Transmit Queue: 298
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 355

Broadcast Packets: 8786                       Broadcast Packets: -1346793420
Multicast Packets: 225928                     Multicast Packets: 136913
No Carrier Sense: 0                           CRC Errors: 0
DMA Underrun: 0                               DMA Overrun: 691
Lost CTS Errors: 0                            Alignment Errors: 0
Max Collision Errors: 0                       No Resource Errors: 0
Late Collision Errors: 0                      Receive Collision Errors: 0
Deferred: 141004                              Packet Too Short Errors: 0
SQE Test: 0                                   Packet Too Long Errors: 0
Timeout Errors: 0                             Packets Discarded by Adapter: 0
Single Collision Count: 0                     Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 355

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 1701737521
Driver Flags: Up Broadcast Running
        Simplex 64BitSupport ChecksumOffload
        PrivateSegment LargeSend DataRateSet



Thursday, January 19, 2012

TSM Device Handling in Windows

I have to say that TSM on Windows is good for small to medium size solutions and I'm not ANTI-Windows. I just cringe when dealing with devices in Windows. I hate its driver handling and most of all I hate how Windows presents library and tape drives. So I was working with a TSM server where the tape library would not initialize. It was an older SCSI library, not Fiber. I tried restarting the library, the TSM server, reloading drivers, and updating the drivers and nothing worked.

Duh! <Head Slap!!!>

That's because on the initial reboot that caused the library to stop communicating the device ID's changed. So the library went from LB1.0.0.2 to LB1.0.0.3. Nobody touched the SCSI card or library but the device definition changed! Seriously?  All the drives changed to mtX.X.X.3 also. Now I don't use Windows all that much but luckily I remembered the TSMDLST program that is installed with the TSM server. It's under C:\Porgram Files\tivoli\tsm\console and will pull the information from Windows for you in a readable format. So next time your library goes offline make sure you use it to compare the device definitions and serials with what is defined in TSM. It will save you a lot of time and headache. You can find more information on issues like this here.

Wednesday, January 18, 2012

TSM Client Scheduler Issue

I just recently had an issue with a handful of TSM clients that would not run their backups. The clients all backup to a TSM 5.5.2 server and were all running Windows 2008. The clients use TSM version 6.2.3. The five clients had all been missing their backups for days and what makes the situation more interesting is that there are other Windows 2008 servers with this version of TSM installed and they are all running their schedules without issue.

When reviewing the TSM Schedule log the scheduler listed that it had received the schedule info and was waiting for the TSM server to initiate the schedule. The TSM server never made an attempt to contact the clients in question and never showed any errors other than ANR2578W stating the client missed its schedule. There were no errors in the error log and not much to go by from the TSM server activity log. Even though the TSM client backs up over the public network I switched to polling mode to see if client based initiation of the backup would work. It didn't! The TSM client scheduler would receive the schedule upon polling the TSM server but would never execute it. So now what? I added the TCPCLIENTADDRESS and TCPCLIENTPORT and switched back to SCHEDMODE PROMPTED, still the scheduler would not run backups.

Now I was getting frustrated. I removed the scheduler service and redefined it using dsmcutil and voila, the schedule ran...ONCE! After the initial schedule ran the previous problem returned. Schedules were not running and the TSM server would not show any errors saying it could not contact the client. It just would not run the schedule. Well that left me no choice but to call support. IBM support's response was to make sure the TCPCLIENTADDRESS and TCPCLIENTPORT were defined in the dsm.opt and also to define the client HLADDRESS and LLADDRESS on the TSM server? Define the HL and LL addess? TSM gets that when the client connects doesn't it? Yes and No! It appears that without the optional setting the TSM server can have issues contacting some clients. Why? No idea, but adding the HL and LL address did the trick and the backups have been running without issue since.

How many of you define the HL and LLADDRESS when registering nodes? I've never suspected it was needed until now.