Home > AWS, CPU speed, EC2, Linux Scheduler, Virtualization > AWS EC2 CPU (in)consistency – Part 3. Simple Shell test.

AWS EC2 CPU (in)consistency – Part 3. Simple Shell test.


In this blog post I will describe a simple shell script for measuring CPU consistency.

In my prior tests I used sqlplus sessions running a CPU-intensive SQL in parallel background processes. That approach revealed that there was something strange in the way vCPU behaved in AWS EC2 environment. I noticed that whenever number of parallel sqlplus sessions became one more than vCPU/2 there were always two unlucky sessions which ran substantially slower than the rest of sessions – up to 50% slower. Obviously this was not good because SQL elapsed time consistency is very important.  I also noticed that while these parallel processes were running – their per CPU assignment in “top” never changed. Since we know that AWS EC2 presents Hyperthread as vCPU and that Intel Xeon has two Hyperthreads per physical Core – it is clear that if two sessions share one core they will run slower than a session running on dedicated core. Question is – why processes do not move between cores ? Is not this a job of OS scheduler to give all processes fair share of CPU usage ? This observation meant that there was something fundamental related to OS scheduling which lead to inconsistent SQL elapsed time.

If this was true, then this behavior must reveal itself in other situations outside of SQL and RDBMS.

I decided to try and use only shell script to generate CPU load.

I found this suggestion : http://superuser.com/questions/443406/how-can-i-produce-high-cpu-load-on-a-linux-server

and modified it to print counts. The shell test is basically to start N background subshells doing integer increments in a loop then interrupt them after X seconds and print accumulated counts.

* based on
* http://superuser.com/questions/443406/how-can-i-produce-high-cpu-load-on-a-linux-server
* improved version with count printout
-- below should run in ksh or bash
-- in bash if you are getting
-- bash: !": event not found
-- then redefine history substitution char with :

lc() {
    echo loading $cpus CPUs for $seconds seconds
    trap 'for p in $pids; do kill $p; done' 0
    for ((i=0;i<cpus;i++)); do
        ( trap 'echo $cnt; exit' TERM
          cnt=1 ; while : ; do ((cnt++)) ; done
         ) &
         pids="$pids $!"
    sleep $seconds

lc 9 30
     loading 9 CPUs for 30 seconds
     5245844    <-- 50% reduced count
     5247578    <-- 50% reduced count

Above output was on EC2 c4.4xlarge instance. This instance class has 16 vCPU but 8 actual cores.
As we can see, this shell function execution exhibits exactly same behavior as 9 parallel sqlplus sessions – two unlucky subshells counted much less than the others. When the “lc” function is called with 8 sessions, then all 8 numbers are very close – it is same behavior as was in tests with 8 sqlplus sessions.

What was also interesting was that while above 9 parallel subshells were running, the processes CPU assignment did not change as was seen in ps output – again, same was observed in sqlplus test.

-- PSR assignment did not change :
watch "ps -e -o user,pid,psr,comm,s | grep bash| grep R"
Every 2.0s: ps -e -o pid,psr,comm,s |...  Fri Jan 13 11:18:05 2017

 49711  12 bash            R
 49712  13 bash            R
 49713  14 bash            R
 49714  15 bash            R
 49715   8 bash            R
 49716   9 bash            R
 49717  10 bash            R
 49718   4 bash            R
 49719   3 bash            R

So this showed that shell test can be used to study CPU consistency just as sqlplus test, all while being much simpler and applicable to broader class of environments not necessarily having RDBMS installed.

Having established this, I was curious to see how the system behaved with increasing number of parallel sub-shells. So I ran following loop

-- to collect data for chart :
for i in {1..32} ; do
lc $i 10
sleep 3
done | tee parallel_shell.log

and then plotted numbers on excel chart.

The results were so amazing that I decided to run similar test on on-premise physical non-virtualized system with similar OS version. The difference was striking. See for yourself :

AWS Comments:

Range 1: N <= vCPU/2        - all parallel subshells perform same amount of work
Range 2: vCPU/2 < N < vCPU  - some sub-shells perform twice less work  
Range 3: N > vCPU           - amount of work fluctuates

Boundary 1: N=vCPU          - all parallel subshells perform same amount of work
Boundary 2: N=2*vCPU        - all parallel subshells perform same amount of work


All above AWS behavior can be explained by observed fact that OS scheduler does not rebalance processes once they start running until after N > vCPU, as can be seen in PSR field with watch “ps -e -o user,pid,psr,comm,s | grep bash| grep R”.

In contrast, same test on premise shows that OS scheduler starts rebalancing processes after N > CPU/2, i.e. when N becomes greater than number of cores. On non-virtualized system with Hyperthreading the OS scheduler was smart enough to understand that not all logical CPU are equal and that when two logical CPU start running on same core, they are slower than another logical CPU running on another core alone. So OS scheduler rightly rebalances processes between cores when number of active processes is more than number of cores. On contrary, in virtualized AWS environment the same OS apparently does not see the difference between vCPU and treats them as all having equal processing speed ignoring their Hyperthreading nature and decides not to rebalance processes all the way up to number of vCPU. As a result we see huge disparity in the amount of work different processes are able to perform in the range vCPU/2-vCPU.

What this discovery means for AWS EC2 instance sizing is that in order to maintain response time consistency we need to over allocate instances so that expected number of active processes never goes above vCPU/2.

The above chart was built using a custom AMI.

One remaining hope was that Amazon native AMI may have been better optimized to work with Xen hypervisor so that its OS scheduler would make better rebalancing in vCPU/2-vCPU range.

I asked my good friend Oleg Moiseyenko to run the test on these two Amazon provided AMIs :

Amazon Linux AMI 2016.09.1 (HVM), SSD Volume Type – ami-9be6f38c
Red Hat Enterprise Linux 7.3 (HVM), SSD Volume Type – ami-b63769a1

Here are the results :


As we can see, Amazon Redhat AMI displays similar pattern, except here there is mysterious intermediate level with vCPU*3/4 <= N < vCPU.

Here is Amazon Linux AMI:


As we can see situation is only marginally better, as there is still huge disparity in vCPU/2 < N < vCPU range.

One can argue that this test is unrealistic and that in real life workloads CPU demands are intermixed with IO events so once a process running on CPU starts waiting on IO, it becomes candidate for preemption so the scheduler will have chance to rebalance. This may be true. However this means that system now depends on processes “good manners”. If you happen to get processes starting to spin on CPU due to a software bug – the system will quickly become jumpy. Another scenario – in a database environment optimizer may suddenly change SQL execution plan in a way that makes it to mostly spend time on CPU – for example doing hash joins – and then multiple sessions start running this SQL. In both these scenarios the system response time will become jumpy very early – when number of “bad” processes is only equal to half of available vCPU plus one. This may catch an unsuspecting customer by surprise because the customer who sized his instance based on vCPU is not expecting any CPU related problems until all his vCPU are used up.

In my next post I will describe a cure I have found for this issue.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: