Thursday, August 10, 2006
Leaving IBM
Friday will be my last day with IBM. It’s been a great 5 years, but I realized that the only way my career would grow was to make a change. I have learned a lot and am thankful for the opportunity IBM provided. My new employer will be Infocrossing, and I look forward to the challenges that I will encounter. For one it won’t be a majority IBM hardware user so I will be able to get my hands dirty with ACSLS and STK libraries. I love IBM equipment so it will be interesting to see what it’s like in the non-IBM world. I will still be doing TSM so this blog is not going anywhere.
Friday, August 4, 2006
TSM V5.4 Enhancements
TSM V5.4 Enhancements Selected for Beta Account Validation
Availability:
Cluster Failover for TSM HSM for Windows (available October, 2006)
Capability:
TSM Express to TSM Enterprise upgrade
Backupset Enhancements
Compatibility:
Support for Windows Vista (available November, 2006)
Support for Intel based Macintosh systems
Interoperability:
Using TSM Server as an NDMP Tape Server (Filer to Server to backup)
Create Media for offsite vaulting of NDMP Generate Data
Performance:
Reconciliation for TSM HSM for Windows (available October, 2006)
Scalability:
Improved BA client memory usage
Administrator Center Enhancements Part 3
Availability:
Cluster Failover for TSM HSM for Windows (available October, 2006)
Capability:
TSM Express to TSM Enterprise upgrade
Backupset Enhancements
- Generate to a specified point-in-time
- Improve tracking,
- Stack multiple nodes' data on a single set volumes,
- Enable the client to display contents and allow selection of individual files for restore,
- Support image data,
- And more
Compatibility:
Support for Windows Vista (available November, 2006)
Support for Intel based Macintosh systems
Interoperability:
Using TSM Server as an NDMP Tape Server (Filer to Server to backup)
Create Media for offsite vaulting of NDMP Generate Data
Performance:
Reconciliation for TSM HSM for Windows (available October, 2006)
Scalability:
Improved BA client memory usage
- Note: Using the disk cache setting will require additional disk space. The disk cache file can exceed 4 gigabytes for a very large filesystem incremental backup. Therefore, if the filesystem does not support large files, the user will need to enable large file support for the filesystem or use the diskcachelocation setting to point to a filesystem that can support large files.
- Enhancement provides a new type of storage pool for storing only active versions of backup client data.
Administrator Center Enhancements Part 3
- Update administrator password, cmd line applet, AUDIT LIBRARY/VOLUME, QUERY NODEDATA/MEDIA, MOVE NODEDATA/MEDIA
Tuesday, July 18, 2006
Have You Checked Your Drive Firmware Lately?
Well the weekend is over and I thought I would update you on the problem SAP system. In the last update we had discussed how this one TSM server was consistently having its mounts go into a RESERVED and hanging. The only thing we could do was cycle the TSM server. So after talking to support they stated it was a known problem listed in APAR IC49066 and fixed in the TSM server 5.3.3.2 release. So we upgraded the TSM server, updated ATAPE on the problem server to the same level as on the controller, and turned on SANDISCOVERY. The system came back up and started to mount tapes, albeit slowly. So we let it run and all seemed well for about the first 8 hours then all hell broke loose. TSM started taking longer and longer to mount tapes until it couldn’t mount anything. We began receiving the following errors also:
7/13/2006 9:38:45 AM ANR8779E Unable to open drive /dev/rmt25, error number=16.
7/13/2006 9:38:45 AM ANR8311E An I/O error occurred while accessing drive DR25 (/dev/rmt25) for SETMODE operation, errno = 9.
We reopened the problem ticket and talked to a number of Tivoli support reps and almost had to force them to have us run a trace. The original problem fix began Friday and here we were still trying to fix it well into early Sunday morning. So after getting the trace to Tivoli they looked it over and seemed perplexed at the errors. The errors seemed as if Tivoli was trying to mount tapes for drives that the library client was not pathed for. Also it seemed TSM was polling the library for drives but was unable to get a response so it went into a polling loop until it found a drive. This caused our mount queue to get as high as 100+ tape mounts waiting and the mounts that did complete sometimes took 30-45 minutes to do so. It was NUTS! So Sunday morning a new Tivoli Rep was assigned (John Wang). As we discussed the problem and Tivoli was trying to get a developer to analyze the trace John mentioned how he had seen this type of error before in a call about an LTO-1 library. He stated that it took 3 weeks to determine the problem but that the end result was that the firmware on the drives was down-level and causing the mount issues. So I checked my 3584’s web interface (I feel bad for all those people out there without web interfaces on their libraries) and found the drive firmware level at 57F7. This seemed down-level from what little information we had so I had my oncall person call 1-800-IBM-SERV and place a SEV-1 service call. The CE called me and we discussed the firmware level. When he saw how down-level it was he was surprised and lectured me on making sure we always check with CE’s before we do any changes to the environment. The CE then gathered his needed software and came to the account to update the drives. I brought all the TSM servers down and after 30 minutes the library had dismounted all tapes from the drives. The CE then proceeded to update the firmware, which actually only took 15-20 minutes. I expected longer. So we went from firmware level 57F7 to 64D0. Huge jump! So after the firmware upgrade I audited the library and brought the controller back up. Viola! It started mounting tapes as soon as the library initialized and the response was back to what it should have been. It’s now Tuesday morning and there have been no problems. So before you upgrade TSM be sure to have checked your libraries firmware (both library and drives). It could mean the difference between sink and swim!
7/13/2006 9:38:45 AM ANR8779E Unable to open drive /dev/rmt25, error number=16.
7/13/2006 9:38:45 AM ANR8311E An I/O error occurred while accessing drive DR25 (/dev/rmt25) for SETMODE operation, errno = 9.
We reopened the problem ticket and talked to a number of Tivoli support reps and almost had to force them to have us run a trace. The original problem fix began Friday and here we were still trying to fix it well into early Sunday morning. So after getting the trace to Tivoli they looked it over and seemed perplexed at the errors. The errors seemed as if Tivoli was trying to mount tapes for drives that the library client was not pathed for. Also it seemed TSM was polling the library for drives but was unable to get a response so it went into a polling loop until it found a drive. This caused our mount queue to get as high as 100+ tape mounts waiting and the mounts that did complete sometimes took 30-45 minutes to do so. It was NUTS! So Sunday morning a new Tivoli Rep was assigned (John Wang). As we discussed the problem and Tivoli was trying to get a developer to analyze the trace John mentioned how he had seen this type of error before in a call about an LTO-1 library. He stated that it took 3 weeks to determine the problem but that the end result was that the firmware on the drives was down-level and causing the mount issues. So I checked my 3584’s web interface (I feel bad for all those people out there without web interfaces on their libraries) and found the drive firmware level at 57F7. This seemed down-level from what little information we had so I had my oncall person call 1-800-IBM-SERV and place a SEV-1 service call. The CE called me and we discussed the firmware level. When he saw how down-level it was he was surprised and lectured me on making sure we always check with CE’s before we do any changes to the environment. The CE then gathered his needed software and came to the account to update the drives. I brought all the TSM servers down and after 30 minutes the library had dismounted all tapes from the drives. The CE then proceeded to update the firmware, which actually only took 15-20 minutes. I expected longer. So we went from firmware level 57F7 to 64D0. Huge jump! So after the firmware upgrade I audited the library and brought the controller back up. Viola! It started mounting tapes as soon as the library initialized and the response was back to what it should have been. It’s now Tuesday morning and there have been no problems. So before you upgrade TSM be sure to have checked your libraries firmware (both library and drives). It could mean the difference between sink and swim!
Saturday, July 15, 2006
Shared Library Problem Solved?
Well after trying numerous things to get the problematic shared library working I finally got support on the line to fix this problem. The other day the problem reoccurred after about a week. I thought I had fixed the problem by changing the IP/VLAN the servers communicated over, and I thought it was until it all came crashing down. So I upgraded the problem instance to the same release level as the controller (controller has to be at or higher then the client servers). That didn’t fix it! I turned RESETDRIVES to NO and that didn’t do it either (Thanks for the advice Scott). So we got Tivoli on the line and Andy Ruhl with Tivoli support worked with us to identify the problem. First thing we did was check the ATAPE level and it turned out the controller was at a higher version. So we updated it to the same level as the controller. Then Andy found this APAR IC49066 which interestingly enough describes my problem. Turns out there was a known issue with this type of behavior and although it was due to be fixed in 5.3.4 it was added to the 5.3.3.2 update. Supposedly the explanation states it occurs when the client is accessing two different controllers, but Andy stated it was a misprint and it applies to single controller instances as well. So we updated all 5 TSM servers to the fixed level and hope we don’t experience any more issues. So far so good!
Sunday, July 9, 2006
Help! Problem With Shared Library Client!
Ok, here is my problem. I have a large shared library that supports 5 TSM servers. All of the TSM servers, save one, run without any problems. The one with problems is pathed to 32 of the 5 drives (they are designated just for its use) and periodically its mount requests go into a RESERVED state but never mount. I have tried FORCESYNC'ing both sides to see if it was communication but that didn't work. I have tried setting paths offline then back online and that didn't work either. The only thing that works is to cycle the TSM server then it comes back up and starts mounting tapes like nothing was wrong. This happens on average every 24-48 hours. This is a large DB client TSM instance so the reboots are hurting critical business apps backups. I don't see any glaring errors in the activity log or in the AIX errpt. Any suggestions would be gladly received.
Wednesday, July 5, 2006
Import/Export Question!
Can someone explain to me why moving data (exporting/importing) from one TSM server to another TSM server is such a pain in the neck? Here is my scenario, I have some servers that moved from one location to another, network wise, and they now need to backup to a different TSM server. Both servers use the same media type (LTO-3). So why is it I have to either copy all the data across the network to the new server as it creates new tapes, or dump it all to tape(s) and then rewrite it to new tape(s) when imported? My question is this - why can’t I just export the DB info and pointers for the already existing tape to the new TSM server from the old? Why can’t the old server “hand-over” the tape to the new TSM server? It seems like a lot of wasted work to constantly have to copy the data (server to server export/import) or dump it to tape and then write the data to new tapes. I think the developers ought to work on a way of doing this. I would also think that this process could be done on a DB level so you could in a sense reorg the DB without the long DB Dump/Load and audit process, and the tapes would be handed over to the new TSM server. No rewrites necessary.
Thursday, June 22, 2006
The Linux Big But!
“I love Linux, I support Linux, I would love to use Linux but….” You wont hear this just from me. The problem is a frustrating one. I for one have a number of Linux TSM servers and they have become somewhat of a problem. For example we had a situation where we decided to deploy a Linux TSM server and proceeded to setup the site. Problems arose when we attempted to connect the Library. As it turned out the version of SUSE was down level and the driver would not work. So we upgraded Linux and the library worked fine, but (there it is again) the RAID controller for the server would not longer work and there was no updated one. This has been and continues to be the problem with Linux. Sure companies beat their chests and yell, “We support Linux!” The problem is that it’s limited support and we the users who implement get left with what turns out to be a patchwork solution. I’m sure many will say, “But I have Linux working and it works fine.” Great! The problem is in the choices available to you hardware wise when architecting the solution. I can be pretty sure that I wont run into to many problems when solutioning for AIX, HP/UX, Solaris, or Windows. With Linux you have to do twice the homework and hope the hardware company keeps its drivers up to date. Just to risky for the data. Don’t get me wrong it has gotten better but there is a long way to go before I would commit to using Linux again.
Subscribe to:
Posts (Atom)