Relative CPU speed

On my job I often have to move a database to a newer hardware.
Sometimes company decides to do a tech refresh, sometimes it is a new company spin-off or a data center consolidation. When you get to work on a new hardware, it is always interesting to see what it is capable of and how does it compare to the old system.

Back when I worked for Oracle as an instructor, we would frequently go to a customer location and install database software for the class on customers hardware. We would use a “quick and dirty” method to get a feeling on just how fast the systems CPU was. The method was to run same simple sql and measure execution time.
The sql was

select count(*) from emp,emp,emp,emp,emp

The beauty of this sql was that it required almost no IO.
There were only 14 records in EMP table and it took only 1 IO to read the block into buffer cache.
After that the 5-way cartesian join execution time would only depend on the CPU speed.
Every database we installed had Scott schema with EMP table, which made this simple sql a universally available CPU speed test tool.

These days I work for a company which does not install sample SCOTT schema.
So I decided to do something similar without EMP table.

The sql I came up with was this:

with t as ( SELECT rownum FROM dual CONNECT BY LEVEL <= 100 )
select /*+ ALL_ROWS */ count(*) from t,t,t,t
/
COUNT(*)
----------
100000000

This sql does not need any table and therefore will run on any Oracle database.
It generates 100 records by selecting from dual and then joins the resultset to itself 4 times, producing count of 100 millions. Being select from dual, this sql needs miniscule IO and its elapsed time mostly depends on CPU speed. Since the sql does not use parallel queries, the execution time can be a measure of single-process CPU speed.


The tricky part is to make sure that the sql uses same execution plan and execution stats when comparing different systems.
This is easy to check by setting autotrace in sqlplus.
The sample output is here.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

set linesize 120
set autotrace on
set timing on

with t as ( SELECT rownum FROM dual CONNECT BY LEVEL <= 100 )
select /*+ ALL_ROWS */ count(*) from t,t,t,t
/
/

COUNT(*)
----------
100000000

Elapsed: 00:00:23.77

Execution Plan
----------------------------------------------------------
Plan hash value: 4060156826

--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | | | | | |
| 3 | COUNT | | | | | |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | SORT AGGREGATE | | 1 | | | |
| 7 | MERGE JOIN CARTESIAN | | 1 | | 8 (0)| 00:00:01 |
| 8 | MERGE JOIN CARTESIAN | | 1 | | 6 (0)| 00:00:01 |
| 9 | MERGE JOIN CARTESIAN | | 1 | | 4 (0)| 00:00:01 |
| 10 | VIEW | | 1 | | 2 (0)| 00:00:01 |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D75B4_B6CDB5E | 1 | 13 | 2 (0)| 00:00:01 |
| 12 | BUFFER SORT | | 1 | | 4 (0)| 00:00:01 |
| 13 | VIEW | | 1 | | 2 (0)| 00:00:01 |
| 14 | TABLE ACCESS FULL | SYS_TEMP_0FD9D75B4_B6CDB5E | 1 | 13 | 2 (0)| 00:00:01 |
| 15 | BUFFER SORT | | 1 | | 4 (0)| 00:00:01 |
| 16 | VIEW | | 1 | | 2 (0)| 00:00:01 |
| 17 | TABLE ACCESS FULL | SYS_TEMP_0FD9D75B4_B6CDB5E | 1 | 13 | 2 (0)| 00:00:01 |
| 18 | BUFFER SORT | | 1 | | 6 (0)| 00:00:01 |
| 19 | VIEW | | 1 | | 2 (0)| 00:00:01 |
| 20 | TABLE ACCESS FULL | SYS_TEMP_0FD9D75B4_B6CDB5E | 1 | 13 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter(LEVEL<=100)

Note
-----
- 'PLAN_TABLE' is old version

Statistics
----------------------------------------------------------
2 recursive calls
8 db block gets
14 consistent gets
1 physical reads
604 redo size
515 bytes sent via SQL*Net to client
488 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
1 rows processed

It turns out that the execution plan is exactly the same on 10g and 11g on every system that I ran this test on.

Dividing 100 million record count by elapsed time gives a “Records/Sec” which I used to compare different systems.

And here are the results …

The interesting part of the above graph is how slow UltraSPARC T2 processors are for running one process comparing to other CPU architectures. T2 processors use Chip Multi Threading (CMT) to create massive amount of virtual processors out of the few physical processor cores. The end result is that single process gets only portion of available CPU cycles and thus runs pathetically slow. Sun came up with the idea of CMT on the assumption that CPU are waiting most of the time for memory access and that instead of this wait the CPU cycles could be more productively utilized by running more virtual CPUs. The main audience for CMT was Web servers with thousands of simultaneous processes. This may work in these environments, but as we can see in databases CMT results in pathetic execution times all while there remains lots of unused CPU cycles.

The T2 based systems were manufactured in 2007-2008. I am told that since then Oracle(Sun) came up with newer T4 processors which provide much better single-process performance. I can not wait till I get a chance to test such a system. Until that time – if anybody can run the test on T4 and contact me with the results – this will be really appreciated.

1 Comment

Leave a Comment