ES47 i/o way too slow

From: Jens Kieffer-Olsen (JKO@dst.dk)
Date: Tue May 16 2006 - 07:50:47 EDT


 A couple of years ago we bought a darned expensive ES47
 with 4 processors and 8 GB ( now 16 GB ) of RAM. To our
 surprise it arrived as a set of two boxes equipped with
 a weird, thick 'umbilical cord' between them for
 communication purposes.

 We had previous very good experience with 4-way ES40 and ES45
 servers. However, from day one the ES47 underperformed.

 It was installed with Tru64 Unix 5.1B-3 by the same HP guy who
 had installed our dozen other Alpha servers. It was installed
 with Oracle 9i the same as our other ES40 and ES45 servers.
 It was equipped with a FCA 2384 HBA using Hitachi HDS 9980V
 SAN disks initially - just like the other servers.

 Immediately the users complained about poor response, but
 managed to compensate by parallellizing their application
 using all available RAM on the server in the process.

 We replaced the HDS disks with HP EVA-5000 disks, but to no
 avail. We upgraded to Oracle 10g using EXPORT/IMPORT, but the
 performance problem persisted. We contacted a HP Denmark expert
 who analyzed runtime logs and suggested changes to cpus_in_rad
 and sched_distance. We installed a second HBA and patched up to
 pk5.

 The poor performance, however, persisted regardless. From inside
 Oracle it is detectable through the file statistics view v$filestat.
 This view is defined as follows:

 FILE# NUMBER Number of the file
 PHYRDS NUMBER Number of physical reads done
 PHYWRTS NUMBER Number of times DBWR is required to write
 PHYBLKRD NUMBER Number of physical blocks read
 PHYBLKWRT NUMBER Number of blocks written to disk, which may be
                       the same as PHYWRTS if all writes are single blocks
 SINGLEBLKRDS NUMBER Number of single block reads
 READTIM NUMBER Time (in hundredths of a second) spent doing reads
 WRITETIM NUMBER Time (in hundredths of a second) spent doing writes
 SINGLEBLKRDTIM NUMBER Cumulative single block read time (in hundredths of a second)
 AVGIOTIM NUMBER Average time (in hundredths of a second) spent on I/O
 LSTIOTIM NUMBER Time (in hundredths of a second) spent doing the last I/O
 MINIOTIM NUMBER Minimum time (in hundredths of a second) spent on a single I/O
 MAXIORTM NUMBER Maximum time (in hundredths of a second) spent doing a single read
 MAXIOWTM NUMBER Maximum time (in hundredths of a second) spent doing a single write

 It is typical of the ES47 that maxiortm and maxiowtm amount to over 30 seconds,
 whereas the ES40 and ES45 seldom record numbers greater than 3 seconds.

 One day I extracted some typical average file i/o information from our ES47,
 Itanium rx5670 ( HP-Unix ) and ES40 servers:

 select sum(phyblkrd),round(readtim/phyblkrd,0) from v$filestat
 group by round(readtim/phyblkrd,0) order by 2;

 ES47:

 SUM(PHYBLKRD) ROUND(READTIM/PHYBLKRD,0)
 ------------- -------------------------
      54867038 0
        273877 1
          2007 2
          7978 3
          4271 4
                                                                               
 Itanium rx5670:

 SUM(PHYBLKRD) ROUND(READTIM/PHYBLKRD,0)
 ------------- -------------------------
      23540498 0
         55355 1

 ES40:

 SUM(PHYBLKRD) ROUND(READTIM/PHYBLKRD,0)
 ------------- -------------------------
      73153356 0
         66515 1
           738 2
           378 3
           283 4

 select sum(phyblkwrt),round(writetim/phyblkwrt,0) from v$filestat
 group by round(writetim/phyblkwrt,0) order by 2;

 ES47:

 SUM(PHYBLKWRT) ROUND(WRITETIM/PHYBLKWRT,0)
 -------------- ---------------------------
          88495 0
         466159 1
        1006540 2
         935804 3
         468485 4
         144463 5
           2097 7
           2524 9
                                                                         
 Itanium rx5670:

 SUM(PHYBLKWRT) ROUND(WRITETIM/PHYBLKWRT,0)
 -------------- ---------------------------
        6296297 0

 ES40:

 SUM(PHYBLKWRT) ROUND(WRITETIM/PHYBLKWRT,0)
 -------------- ---------------------------
          12795 0
         364374 1
           2491 2


 From the above it is evident that our ES47 i/o performance is
 poorer than that of the old ES40, and much, much, much poorer
 than that of the Itanium server. In fact, the typical block
 write time is 2 centiseconds for the ES47, but just 1 centisecond
 for the ES40.

 I have no clue to what is 'killing' the ES47, but due to our
 vast number of servers with identical configuration I suspect
 that some ES47-only 'umbilical cord' default buffer size is
 inadequate for our use?

 By the way, the other day I moved a database project across from
 the ES47 to the Itanium. A job that had taken 5 hours on the
 ES47, now executed in 1 hour and 15 minutes.

 Yours sincerely
 Jens Kieffer-Olsen
 Statistics Denmark
 e-mail: jko@dst.dk


---------------------------------------------------------------------------
Danmarks Statistik (Statistics Denmark)
Sejrøgade 11, DK-2100 København Ø
Tel. +45 39173917, Fax +45 39173999
dst@dst.dk, www.dst.dk
---------------------------------------------------------------------------



This archive was generated by hypermail 2.1.7 : Sat Apr 12 2008 - 10:50:30 EDT