High CPU consumption #192

guillemlc · 2016-04-28T08:50:21Z

Hello.

We have set-up eYAML as a second backend to hiera in an environment with about 6000 hosts and 8 puoppet masters.

The hosts are Centos 5/6, Puppet version used is 3.7 and hiera 1.3.1.

eyaml version is: 2.0.8

With yaml as the only back-end the puppetmasters rarely go over 60% utilization, and most of the time stay around 35-40% CPU utilization.

When enabling the eYAML backend ( which works perfectly ) the puppetmasters CPU utilization fluctuates a lot and spends a lot of time in the 85-90% user and a 5-10% system, leaving the CPU almost never idle.

That alone it would not be necessarily a problem, but we see that puppet runs take much longer to complete, and they occasionally time out, which is not good.

:hierarchy:
  - %{sh_envloc}/%{sh_envnum}/%{sh_system_role}/%{sh_rolenum}
  - %{sh_envloc}/%{sh_envnum}/%{sh_system_role}/defaults
  - %{sh_envloc}/%{sh_envnum}/defaults
  - %{sh_envloc}/defaults
  - %{sh_system_role}
  - defaults
:backends:
  - yaml
  - eyaml
:yaml:
  :datadir: '/nas/pup/etc/puppet/environments/%{environment}/hieradata'
:eyaml:
  #:datadir: '/nas/pup/etc/puppet/environments/%{environment}/hieradata'
  :datadir: '/nas/pup/etc/puppet/environments/%{environment}/ehieradata'
  :pkcs7_private_key: /etc/puppet/secure/keys/private_key.pkcs7.pem
  :pkcs7_public_key:  /etc/puppet/secure/keys/public_key.pkcs7.pem

As you see we tried by using a different location for eYAML data, as we thought it could be taking time to go through the same files twice. Also I tried putting eYAML in a second place. The results are the same. The only way to make the masters continue working as they were is to disable eYAML backend :-(

An obvious way to deal with this would be to throw more puppet masters to the mix, buti suspect there will be something else going on here, for two reasons:

The surge in CPU goes very quickly up to 100% and while the PMs are busy it does not go down
Even when testing in a non used PM, I get this difference:

eYAML disabled:

# time hiera repos::epel::year environment=gliartecano_test -d
DEBUG: Thu Apr 28 08:30:59 +0000 2016: Hiera YAML backend starting
DEBUG: Thu Apr 28 08:30:59 +0000 2016: Looking up repos::epel::year in YAML backend
DEBUG: Thu Apr 28 08:30:59 +0000 2016: Looking for data source defaults
DEBUG: Thu Apr 28 08:30:59 +0000 2016: Found repos::epel::year in defaults
2016

real    0m0.070s
user    0m0.054s
sys 0m0.014s

With eYAML enabled:

# time hiera repos::epel::year environment=gliartecano_test -d
DEBUG: Thu Apr 28 08:33:06 +0000 2016: Hiera YAML backend starting
DEBUG: Thu Apr 28 08:33:06 +0000 2016: Looking up repos::epel::year in YAML backend
DEBUG: Thu Apr 28 08:33:06 +0000 2016: Looking for data source defaults
DEBUG: Thu Apr 28 08:33:06 +0000 2016: Found repos::epel::year in defaults
2016

real    0m0.110s
user    0m0.082s
sys 0m0.024s

In neither case it actually needed to look into eYAML, but just having eYAML enabled is enough to add that extra time. As it happens in every query, those milliseconds get added to each one, increasing dramatically user and system CPU time.

It seems that everytime theer is a query, it needs to load the keys, which I image it will be the most CPU intensive part.

We have hundreds of variables in external data in our manifests and this is meant to increase with eYAML.

Do we have any obvious mistake in our assumptions and is there any advise we can take from you?

I am generating strace and attaching it later.

Thank you!

The text was updated successfully, but these errors were encountered:

guillemlc · 2016-04-28T08:54:28Z

Added straces files, with and without eyam enabled.

eyaml_192_strace.zip

r4v5 · 2016-04-29T19:06:50Z

Wondering if this is fixed or at least ameliorated by not double-decrypting blocks, as mentioned in #182? If not, caching as per #191 might help.

guillemlc · 2016-05-03T17:57:45Z

@r4v5 I will check that out.

I will package a new version and test it. Thanks.

guillemlc · 2016-05-04T10:06:47Z

It seems that with version 2.1.0 there is some inprovement, at least in an idle host:

eyaml off:


time hiera repos::epel::year environment=gliartecano_test -d
DEBUG: Wed May 04 09:57:11 +0000 2016: Hiera YAML backend starting
DEBUG: Wed May 04 09:57:11 +0000 2016: Looking up repos::epel::year in YAML backend
DEBUG: Wed May 04 09:57:11 +0000 2016: Looking for data source defaults
DEBUG: Wed May 04 09:57:11 +0000 2016: Found repos::epel::year in defaults
2016

**real  0m0.067s**
user    0m0.055s
sys 0m0.012s

eyaml on

DEBUG: Wed May 04 09:58:12 +0000 2016: Hiera YAML backend starting
DEBUG: Wed May 04 09:58:12 +0000 2016: Looking up repos::epel::year in YAML backend
DEBUG: Wed May 04 09:58:12 +0000 2016: Looking for data source defaults
DEBUG: Wed May 04 09:58:12 +0000 2016: Found repos::epel::year in defaults
2016

**real  0m0.087s**
user    0m0.068s
sys 0m0.015s

sihil · 2016-05-04T11:18:29Z

@guillemlc can you clarify what you're testing there?

guillemlc · 2016-05-05T12:54:01Z

@sihil in previous versions the CPU time for hiera to retrieve a value was quite high when eyaml was enabled. The latest version 2.1.0 has reduced that time considerably.

When using eyaml backend with tens of computers, I believe this to be quite irrelevant as if you have a few hiera lookups for 20 computers the aggregate extra time may be negligible.

We use hiera in a network of more than 5000 computers and with previous versions, when enabling eyaml backend, the puppetmasters would go toast, staying almost all the time at 100%, increasing dramatically the time for a puppet run to complete and making some of them time-out.

version 2.1.0 is the first one that we are likely to leave on as, while still uses a lot more CPU that expected it is already usable.

To give you some idea, with version 2.0.8, after 5 minutes in a 8 core server, you would have a run queue of more than 400 process. With 2.1.0 the run queue stays in a more tolerable 112.

So yo answer your question; I am testing hiera with eyaml on and off, checking the time it takes for hiera to lookup a key's value, as you can see in my first post the 'real' time with eyam on was 0.110 s in version 2.0.8, which went down to 0.84 in version 2.1.0. Given the fact that this needs to be performed hundreds of times per second, in several puppetmasters serving thousands of hosts, a 15-20 millisecond improvement is quite important.

Bottom line of all this: eyaml backend is likely to be used in large networks in order to keep sensitive data protected. These networks will have thousands of nodes. Adding a bakend to hiera comes with a performance penalty. A backeneds should be as performant as possible.

I hope it makes sense.

jonnaladheeraj · 2017-03-20T21:28:21Z

@guillemlc Have you tried testing it by just enabling yaml as your backend and inside eyaml block defining extension as yaml ? We are dealing with the same issue and would like to get more insight on this.

The default eyaml file extension is .eyaml, however this can be configured in the :eyaml block to set :extension

:eyaml:
:extension: 'yaml'

sammcj · 2018-05-16T06:22:21Z

How did everyone get on with this?

We're trying to nail down some hiera performance issues at the moment...

Dan33l · 2018-09-26T12:07:59Z

it can be reopened if it's reproducible on the latest Puppet 4 or newer

Dan33l closed this as completed Sep 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU consumption #192

High CPU consumption #192

guillemlc commented Apr 28, 2016

guillemlc commented Apr 28, 2016

r4v5 commented Apr 29, 2016 •

edited

Loading

guillemlc commented May 3, 2016

guillemlc commented May 4, 2016

sihil commented May 4, 2016

guillemlc commented May 5, 2016

jonnaladheeraj commented Mar 20, 2017

sammcj commented May 16, 2018

Dan33l commented Sep 26, 2018

High CPU consumption #192

High CPU consumption #192

Comments

guillemlc commented Apr 28, 2016

guillemlc commented Apr 28, 2016

r4v5 commented Apr 29, 2016 • edited Loading

guillemlc commented May 3, 2016

guillemlc commented May 4, 2016

sihil commented May 4, 2016

guillemlc commented May 5, 2016

jonnaladheeraj commented Mar 20, 2017

sammcj commented May 16, 2018

Dan33l commented Sep 26, 2018

r4v5 commented Apr 29, 2016 •

edited

Loading