I am writing this article to save others from the situation we had at one of our customers. In the beginning a request came for replacing the RAID array which hosts the primary storage pools. I taked that as an excellent opportunity for reorganizing the DB which was 66GB with cca 50% utilization (after a massive deletion in last month or so). There was a planned downtime needed for copying the data from one RAID to another so I thought I had a plenty of time for unload/load.
The "dsmserv unloaddb" utility went smoothly - taking about 4hrs, creating cca 20GB dump (which was expected as the ESTIMATE DBREORGSTAT revealed we can save cca 7GB). Then I proceed with "dsmserv loadformat" and "dsmserv loaddb" (taking approx. 5hrs). So far everything was seemingly OK. I was able to start the server, reduce the DB and run some tests. The problem appeared when I tried to create a new stgpool
12/13/05 12:42:21 ANR2017I Administrator HARRY issued command: DEFINE STGPOOL archivedisk disk (SESSION: 3884)
12/13/05 12:42:21 ANR0102E sspool.c(1648): Error 1 inserting row in table "SS.Pool.Ids". (SESSION: 3884)
12/13/05 12:42:21 ANR2032E DEFINE STGPOOL: Command failed - internal server error detected. (SESSION: 3884)
Google revealed that this error is a known one and is fixed in 5.3.2.1 (we was on 5.3.2.0 - upgraded to this level just a few days before the fix appeared .. bad luck). Basically - there is an error in LOADFORMAT/LOADDB corrupting some DB values
http://www-1.ibm.com/support/docview.wss?uid=swg1IC47516
I did not want to go for loaddb again (there were changes already made to the DB, some migrations were run (luckily for me they were from stgpool with caching set on) .. etc.
So I tried to run the "dsmserv auditdb inventory fix=yes" - IBM says it can help if you do it after a loaddb - long story short - after 8hrs of audit (with message that some vaules were corrected) the problem was still there ...
So the only option was to apply the patch and do use loaddb again - so another 4hrs of waiting - and now it seems to work (still running tests). So watch for this problem and check your TSM level before reorganizing your DB.
Any TSM Server patch levels ending with a ~.0 always has some exposures/bugs. Always go for a version that does not end with ~.0 as this can be a beta release ie. not properly regression tested. AC 2006.
ReplyDelete