Home > CPU speed, Oracle 10g, Oracle 11g > System Throughput Comparison

System Throughput Comparison

CPU Throughput Comparison

Recently I have compared single-process speeds of systems with different CPUs by running a simple test SQL on Oracle database. The SQL was simple and universal in that it produced same execution plans and execution statistics on both 10g and 11g and on all architectures I was able to get my hands on – Intel Xeon, IBM Power7, HP Itanium2, Sun UltraSPARC. This makes it possible to quickly compare relative CPU speed for a single database process. The result of the comparison was a surprisingly slow UltraSPARC T2 performance. These processors implement Chip Multi Threading where few physical cores run massive number of threads, with each thread presented as virtual CPU to the OS. The Sun assumption was that with CPU frequencies in Giga Hz the CPU spends most of the time waiting on RAM access and that this wait could be better utilized by running virtual threads. This sounds good in theory, but it seems like UltraSPARC T2 went overboard with virtualization. For example, one of my T5240 boxes has 2 physical CPUS, 12 cores and 96 virtual CPUs (threads). When I make a connection to database on this server and run something continuously and observe CPU utilization, the utilization never goes above 1-2%. It takes extraordinary efforts to get all the virtual CPUs working. For example, you have to run RMAN backup with parallel degree in hundreds to get into 30-50% cpu utilization range. The whole system acts as one with huge number of slow CPUs.

When I brought these observations up to our hardware planners, their counter-argument was – ok, single process may be somewhat slower, but with hundreds of database connections the total system throughput should be great.

This got me into thinking of how can I run same SQL test with multiple parallel processes.

And here is the script I came up with

for PARALLEL in 1 2 3 4 8 16 32 64; do
    echo Running Parallel Processes: $PARALLEL
  while [ $i -lt $PARALLEL ] ; do 
  sqlplus -silent / as sysdba <<-EOF 1>/dev/null & 
  column cnt noprint 
  with t as ( SELECT rownum FROM dual CONNECT BY LEVEL <= 100 ) 
  select /*+ ALL_ROWS */ count(*) as cnt from t,t,t,t ; 
    time wait

In the heart of the script is same simple select from dual generating 100 million records. The script starts 1,2,3,etc sqlplus sessions in background, each executing this test sql and then waits for all of them to finish to print out timing.

Here is sample output:

Running Parallel Processes: 1 real 0m4.61s user 0m0.01s sys 0m0.00s Running Parallel Processes: 2 real 0m4.65s user 0m0.02s sys 0m0.03s Running Parallel Processes: 3 real 0m4.94s user 0m0.04s sys 0m0.03s Running Parallel Processes: 4 real 0m4.64s user 0m0.05s sys 0m0.05s Running Parallel Processes: 8 real 0m5.44s user 0m0.09s sys 0m0.08s Running Parallel Processes: 16 real 0m10.04s user 0m0.19s sys 0m0.20s Running Parallel Processes: 32 real 0m19.33s user 0m0.45s sys 0m0.39s Running Parallel Processes: 64 real 0m38.05s user 0m0.85s sys 0m0.76s

The “real” time in the output is wall clock time and this is what I am interested in.
To get a measure of throughput I simply multiply 100 million records counted in one SQL by the number of spawned parallel processes and then divide by the wall clock time. This gives me “Records/Sec” as a measure of throughput. By increasing the number of parallel processes I can see when the throughput starts to level off.

Having access to many different systems at work gives a unique opportunity to bring the charts together.

Below are the results.

The graphs are pretty fascinating.
One take away is that UltraSPARC T2 and T2+ processors (from the era of 2007-2008) are simply not in the same league as modern Xeons, Power7 or Itanium.
Another surprising result is that even similarly equipped in terms of CPU systems show distinctly different performance. Just look at Dell R710 and HP BL460c. Both have same Xeon CPU and yet Dell is noticeable faster. Apparently, the test measures not only CPU speed. Contributing factors may be memory speed or system board architecture.

I am waiting to get access to new Oracle SPARC T4 system which is said to provide 5x higher performance than its predecessors. If anybody can run this benchmark test on T4 and share the results – this would be really appreciated.

  1. D.E.Veloper
    July 7, 2012 at 12:15 pm

    But have you tested out the new freshly released Resys Enterprise Edition for Oracle RAC on any of this hardware. Cause that’s the only thing that really counts in the end of a long hard day.

  1. January 5, 2017 at 5:44 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: