Extracting data from OEM 12c repository for database Capacity Planning

January 8, 2013 5 comments

Recently I was asked to give recommendation for next year business case on buying additional storage to accommodate database growth.

With several dozen of databases to support, there is simply no time to instrument each db with custom-built capacity planning scripts.

Since I had already invested considerable amount of time and energy in setting up EM 12c system to monitor my databases, I turned to OEM metrics to see if there is something useful being collected there.

I encountered two issues when looking at the EM.

First, to my surprise i could not find database size metrics anywhere in the web interface. There were tablespace-level used space metrics, but not full database size. This is such an obvious and important metrics that I hope that it is just me missing something.

Next issue was that metrics pages never display data beyond last 31 days. It turned out that OEM 12c has a bug in which when you go to a metric page it does not display historical data beyond last 31 days. This is confirmed as bug on support.oracle.com (need to find the bug #). The repository has the data, but the web page rendering simply does not display it. Needless to say that 31 days is not anywhere near the time frame needed for long-term capacity planning.

I decided to look directly into the repository and bypass the web interface.

Below are 3 simple repository queries.

Read more…

System Throughput Comparison

April 1, 2012 2 comments

CPU Throughput Comparison

Recently I have compared single-process speeds of systems with different CPUs by running a simple test SQL on Oracle database. The SQL was simple and universal in that it produced same execution plans and execution statistics on both 10g and 11g and on all architectures I was able to get my hands on – Intel Xeon, IBM Power7, HP Itanium2, Sun UltraSPARC. This makes it possible to quickly compare relative CPU speed for a single database process. The result of the comparison was a surprisingly slow UltraSPARC T2 performance. These processors implement Chip Multi Threading where few physical cores run massive number of threads, with each thread presented as virtual CPU to the OS. The Sun assumption was that with CPU frequencies in Giga Hz the CPU spends most of the time waiting on RAM access and that this wait could be better utilized by running virtual threads. This sounds good in theory, but it seems like UltraSPARC T2 went overboard with virtualization. For example, one of my T5240 boxes has 2 physical CPUS, 12 cores and 96 virtual CPUs (threads). When I make a connection to database on this server and run something continuously and observe CPU utilization, the utilization never goes above 1-2%. It takes extraordinary efforts to get all the virtual CPUs working. For example, you have to run RMAN backup with parallel degree in hundreds to get into 30-50% cpu utilization range. The whole system acts as one with huge number of slow CPUs.

When I brought these observations up to our hardware planners, their counter-argument was – ok, single process may be somewhat slower, but with hundreds of database connections the total system throughput should be great.

This got me into thinking of how can I run same SQL test with multiple parallel processes.

Read more…

Relative CPU speed

April 1, 2012 1 comment

On my job I often have to move a database to a newer hardware.
Sometimes company decides to do a tech refresh, sometimes it is a new company spin-off or a data center consolidation. When you get to work on a new hardware, it is always interesting to see what it is capable of and how does it compare to the old system.

Back when I worked for Oracle as an instructor, we would frequently go to a customer location and install database software for the class on customers hardware. We would use a “quick and dirty” method to get a feeling on just how fast the systems CPU was. The method was to run same simple sql and measure execution time.
The sql was

select count(*) from emp,emp,emp,emp,emp

The beauty of this sql was that it required almost no IO.
There were only 14 records in EMP table and it took only 1 IO to read the block into buffer cache.
After that the 5-way cartesian join execution time would only depend on the CPU speed.
Every database we installed had Scott schema with EMP table, which made this simple sql a universally available CPU speed test tool.

These days I work for a company which does not install sample SCOTT schema.
So I decided to do something similar without EMP table.

The sql I came up with was this:

with t as ( SELECT rownum FROM dual CONNECT BY LEVEL <= 100 )
select /*+ ALL_ROWS */ count(*) from t,t,t,t
/
COUNT(*)
----------
100000000

This sql does not need any table and therefore will run on any Oracle database.
It generates 100 records by selecting from dual and then joins the resultset to itself 4 times, producing count of 100 millions. Being select from dual, this sql needs miniscule IO and its elapsed time mostly depends on CPU speed. Since the sql does not use parallel queries, the execution time can be a measure of single-process CPU speed.

Read more…

Empirical Oracle 11g Process RAM Usage

August 15, 2011 1 comment

As we were starting to get ready for 10g to 11g database upgrade, I got curious – how much more RAM the 11g may require ? I have heard a story from one of my colleagues about a busy system which ran into severe paging after 9i to 10g upgrade. It turned out that on average every 10g database server process was taking more memory than 9i. This extra memory was private process memory, different from SGA and PGA. With several thousands database connections the upgraded system ran into RAM shortage of several GB. They ended up with having to buy more RAM in a hurry.

How can I estimate the required RAM ahead of time?

Many Oracle docs mostly deal with SGA and PGA sizing. The usual prescription is this: leave 20% to the OS kernel and other system processes, and give the 80% to SGA (and PGA). Apparently, this is not the whole picture and we have to account for Oracle processes private memory.

So the formula becomes:

Oracle RAM = SGA + PGA + PM

How do we estimate PM ?

I will be using empirical approach. I will start a database session and run OS utilities on the server process. Read more…

Categories: Oracle RAM usage

High-Performance Batch LOB Insert. Part 2. Subpartitions

In my recent post High-Performance Batch LOB Insert. Part 1 I described attempts to transfer high number of records from one table with LOB column into another table. To my surprise my preferred LOB storage option of “disable storage in row nocache nologging” resulted in insert being too long to be practical. I arrived at following results when doing 100,000 record insert test:

-- Test Summary for 100,000 records:

                       Option 1              Option 2        Option 3
 Sizes                 disable               disable          enable
                   storage in row         storage in row   storage in row
                   nocache nologging          cache       (cache) nologging

Elapsed Time,min    >45(interrupted)         10:59            2:28
Table, MB            N/A                      75              315
LOB Segment,MB       N/A                      313              39
LOB Index,MB         N/A                      7                 1
Redo,MB              N/A                      337             0.1

Extrapolating timings we can arrive at following runtime estimates for 12,000,000 records we were expecting to transfer: 22 hrs for Option 2 and 5 hrs for Option 3. These estimates seemed to leave only Option 3 “enable storage in row” as being able to fit in one-night maintenance window.

Read more…

High-Performance Batch LOB Insert. Part 1

March 2, 2011 1 comment

One would think that when you have a multi-CPU server it should be pretty straightforward in Oracle 10g to copy records from one table with LOB column to another in parallel nologging using all available CPU power.

As it turns out, it is not that easy.

One of my projects after number of years accumulated a big table which we wanted to trim. The table structure was rather simple – few regular columns, one LOB, several indexes. The table had 23 million records. The table, indexes, LOB segment and LOB index all together occupied 82 GB.

We needed to purge about half of the records.

With 82GB in size at hand the regular sql deletes were immediately ruled out due to undo and redo overhead. Instead of deletes we decided to copy the needed records to another table in parallel nologging using Create Table As Select ( CTAS ) or INSERT with Parallel/Append hint and then rename the table.

The database server had 24 CPUs , 56gb of RAM and high-performance EMC storage array. I expected the copy process to be vigorous and quick.

I was in for quite a surprise.

Read more…

Using Grid Control 10g to Build a Cumulative Counter Based Graph

November 17, 2010 Leave a comment

redo write speed graphOur redolog disk devices were experiencing periodical sudden slowdowns, but diagnosing the issue proved to be tricky. Yes it is possible to use Unix tools such as iostat –x or sar –d to collect device service times, but in our case the storage is a NAS array with 190 devices mounted by a Volume Manager. This  makes it difficult to map exactly what devices are used for redologs.

We wanted to get an idea of what redo device speed looks like from the database perspective and how it changes over time.

Unfortunately EM 10g does not have a built-in metric to monitor redo write speed.

The system view v$sysstat maintains cumulative counters ‘redo write time’ and ‘redo writes’. The average LGWR write speed since instance startup can be calculated as


SQL> select /*+ all_rows */ t.value*10/w.value as ms_per_write
from
(select value from v$sysstat
where statistic# = (select statistic# from v$statname where name = 'redo writes')
) w,
(select value from v$sysstat
where statistic# = (select statistic# from v$statname where name = 'redo write time')
) t;
MS_PER_WRITE
------------
9.4988199536

The average redo speed since startup, while helpful, is still far from what we need to verify a suspected temporary slowdowns.

For this purpose we need a time graph, and a time graph is something that can be derived from cumulative counters using deltas.

Read more…