Thursday, August 7, 2014

Poor Performance Followup

As a follow-up to the previous poor performance post I thought I'd post what the outcome was. As it turns out we checked performance tuning settings in TSM and AIX and no performance increase was seen. We asked the DB2 admins to review any of their settings and they could not find any tunables that had not already been implemented. We sent in servermon.pl output and although they saw performance that was sub-par, they couldn't designate what was causing it. There are no server/adapter/switch/disk/tape errors so nothing emerged as the culprit for our poor throughput performance.

So we reviewed the backup time of each TSM storage agent server used to backup this 101 TB SAP database. At the time the storage agents that perform the backup consisted of 5 LPARS, 4 of those in a single frame each with their own assigned I/O drawer. The 5th was in a separate 740 frame with its own I/O drawer. The 5th storage agent was completing the backup in a fraction of the time of the other 4 so we concluded we must be overloading the CEC on the 740. We moved one of the four storage agents out of the frame to a secondary frame and the results were awesome. See below:


You'll notice that the backup time didn't change with the update of the tape drives from E06 to E07. Hardware layout matters more than the performance of the tape drives. When a vendor tells you just updating the hardware to newer iterations will increase performance take it will a grain of salt. In our case we did testing of the new tape drives and no performance gains were seen but the go ahead was given to upgrade to the newer hardware and as you'll see we didn't gain anything until we reworked the environment. Our task now is to identify how to increase TSM internal job performance (i.e. migration and storage pool backup) which has not seen significant performance gains from the tape upgrades.

Wednesday, April 30, 2014

Sony Develops 185TB Tape

Sony announced they have developed a tape medium and write process that can support 185TB per tape. Whoa, That's huge! Now if we can see it hit the market before some other storage strategy catchphrase becomes the "it" thing.  Check out the link below...."To the cloud!"

http://www.extremetech.com/computing/181560-sony-develops-tech-for-185tb-tapes-3700-times-more-storage-than-a-blu-ray-disc

Wednesday, April 23, 2014

Friday, March 28, 2014

Poor Performance

Currently I work in an environment where we have a specific TSM instance for a large SAP DB (99TB currently). We just upgraded the drives in the tape library (yes we use tape! I know...I know....) from MagStar 3592 TS1130 (E06) drives to TS1140 (E07) drives. The upgrade was pushed in hopes of a jump in write/backup performance, but I was skeptical. TSM adds so much overhead you cannot use the RAW tape read/write numbers from any manufacturer. Typically IBM is somewhat reasonable with their numbers, but in this case I have seen NO performance increase what-so-ever.  Here is a query of the processes for storage pool backup.

UPDATE (04/04/2014):  Let me give you some more specs, we have the 99TB DB split between 4 TSM Storage Agents each having 4 8Gb HBA's. Each storage agent runs 4 sessions (allocates 4 drives) for their backup process. So all 4 storage agents account for 16 simultaneous sessions and it still takes over 24 hours to perform the 99TB backup. The backups are averaging around 70-78MB/sec. Is this a TSM overhead issue or do I have a tuning issue with the TDP and TSM? I'm getting less than 50% of the throughput I should see.

Here's the command that is run to execute the DB backup:

ksh -c export DB2NODE=7 ; db2 "backup db DB8   LOAD /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a OPEN 4 SESSIONS OPTIONS /db2/DB8/dbs/tsm_config/vendor.env.7 WITH 14 BUFFERS BUFFER 1024 PARALLELISM 8 WITHOUT PROMPTING" ; echo BACKUP_RC=$?

PROCESS_NUM: 2667
    PROCESS: Backup Storage Pool
 START_TIME: 03-27 23:21:54
   DURATION: 00 23:20:13
      BYTES: 6.0TB
 AVG_THRPUT: 75.87 MB/s

PROCESS_NUM: 2668
    PROCESS: Backup Storage Pool
 START_TIME: 03-27 23:21:55
   DURATION: 00 23:20:12
      BYTES: 6.2TB
 AVG_THRPUT: 78.48 MB/s

PROCESS_NUM: 2669
    PROCESS: Backup Storage Pool
 START_TIME: 03-27 23:21:55
   DURATION: 00 23:20:12
      BYTES: 6.2TB
 AVG_THRPUT: 77.99 MB/s

PROCESS_NUM: 2670
    PROCESS: Backup Storage Pool
 START_TIME: 03-27 23:21:55
   DURATION: 00 23:20:12
      BYTES: 6.4TB
 AVG_THRPUT: 80.13 MB/s

I average anywhere from 75 to 80 MB/sec.  Here is the Magstar performance chart. I am using JB media, not JC so I do take a little hit in performance for that.










So with JB media I could get as high as 200MB/sec but I am not even 50% of that number.  Is there any specific tuning parameter I should look at that could be hindering the performance? 

FYI - The backup of the 99TB DB runs LAN-Free using 16 tape drives over 26 hrs.

Friday, January 10, 2014

New TSM Admin In The House!

Just thought I should let everyone know that my wife and I had a son December 3rd. The holidays and lead up to his being born have kept me busy. My son makes 8 kids total and I'm a very busy man. So don't worry I shall return but the last 9 months have been a blur.

Sunday, December 8, 2013

Full TSMExplorer for TSM version 5 is free now

Got this info from Dmitry Dukhov - creator of TSMExplorer

Procedure of required registration for getting free license and TSMExplorer for TSM version 5  are available on http://www.s-iberia.com/download.html

Tuesday, October 22, 2013

Archive Report

Where I work we have a process that bi-monthly generates a mksysb then archives it to TSM. Recently an attempt to use an archived mksysb found that sometimes the mksysb process does not create a valid file, but it is still archived to TSM. So the other AIX admins asked me to generate a report that would show the amount of data that was archived and on what date it occurred. Now I would have told them it was impossible if they had asked for data from the backup table, but our archive table is not as large as the backups so I gave it a go.

First problem was determining the best table(s) to use. I could use the summary table, but it doesn't tell me what schedule ran and some of these UNIX servers do have archive schedules other than the mksysb process. The idea I came up with was to query the contents table and join it with the archive table using the object_id field.  Here's an example of the command:

select a.node_name, a.filespace_name, a.object_id, cast((b.file_size/1048576)as integer(9,2))AS SIZE_MB , cast((a.ARCHIVE_DATE)as date) as ARCHIVE from archives a, contents b where a.node_name=b.node_name and a.filespace_name='/mksysb_apitsm' and a.filespace_name=b.filespace_name and a.object_id=b.object_id and a.node_name like 'USA%'

This select takes at least 20 hours to run across 6 TSM servers. I guess that I should be happy it returns at all, but TSM is DB2! It should be a lot faster, so I am wondering if I could clean up the script or add something that would make the index the data faster??? I am considering dropping the "like" and just matching node_name between the two tables. Would putting node_name matching first then matching object_id be faster? Would I be better off running it straight out of DB2? Suggestions appreciated.