Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU consumption #192

Closed
guillemlc opened this issue Apr 28, 2016 · 9 comments
Closed

High CPU consumption #192

guillemlc opened this issue Apr 28, 2016 · 9 comments

Comments

@guillemlc
Copy link

Hello.

We have set-up eYAML as a second backend to hiera in an environment with about 6000 hosts and 8 puoppet masters.

The hosts are Centos 5/6, Puppet version used is 3.7 and hiera 1.3.1.

eyaml version is: 2.0.8

With yaml as the only back-end the puppetmasters rarely go over 60% utilization, and most of the time stay around 35-40% CPU utilization.

When enabling the eYAML backend ( which works perfectly ) the puppetmasters CPU utilization fluctuates a lot and spends a lot of time in the 85-90% user and a 5-10% system, leaving the CPU almost never idle.

That alone it would not be necessarily a problem, but we see that puppet runs take much longer to complete, and they occasionally time out, which is not good.

:hierarchy:
  - %{sh_envloc}/%{sh_envnum}/%{sh_system_role}/%{sh_rolenum}
  - %{sh_envloc}/%{sh_envnum}/%{sh_system_role}/defaults
  - %{sh_envloc}/%{sh_envnum}/defaults
  - %{sh_envloc}/defaults
  - %{sh_system_role}
  - defaults
:backends:
  - yaml
  - eyaml
:yaml:
  :datadir: '/nas/pup/etc/puppet/environments/%{environment}/hieradata'
:eyaml:
  #:datadir: '/nas/pup/etc/puppet/environments/%{environment}/hieradata'
  :datadir: '/nas/pup/etc/puppet/environments/%{environment}/ehieradata'
  :pkcs7_private_key: /etc/puppet/secure/keys/private_key.pkcs7.pem
  :pkcs7_public_key:  /etc/puppet/secure/keys/public_key.pkcs7.pem

As you see we tried by using a different location for eYAML data, as we thought it could be taking time to go through the same files twice. Also I tried putting eYAML in a second place. The results are the same. The only way to make the masters continue working as they were is to disable eYAML backend :-(

An obvious way to deal with this would be to throw more puppet masters to the mix, buti suspect there will be something else going on here, for two reasons:

  • The surge in CPU goes very quickly up to 100% and while the PMs are busy it does not go down
  • Even when testing in a non used PM, I get this difference:

eYAML disabled:

# time hiera repos::epel::year environment=gliartecano_test -d
DEBUG: Thu Apr 28 08:30:59 +0000 2016: Hiera YAML backend starting
DEBUG: Thu Apr 28 08:30:59 +0000 2016: Looking up repos::epel::year in YAML backend
DEBUG: Thu Apr 28 08:30:59 +0000 2016: Looking for data source defaults
DEBUG: Thu Apr 28 08:30:59 +0000 2016: Found repos::epel::year in defaults
2016

real    0m0.070s
user    0m0.054s
sys 0m0.014s

With eYAML enabled:

# time hiera repos::epel::year environment=gliartecano_test -d
DEBUG: Thu Apr 28 08:33:06 +0000 2016: Hiera YAML backend starting
DEBUG: Thu Apr 28 08:33:06 +0000 2016: Looking up repos::epel::year in YAML backend
DEBUG: Thu Apr 28 08:33:06 +0000 2016: Looking for data source defaults
DEBUG: Thu Apr 28 08:33:06 +0000 2016: Found repos::epel::year in defaults
2016

real    0m0.110s
user    0m0.082s
sys 0m0.024s

In neither case it actually needed to look into eYAML, but just having eYAML enabled is enough to add that extra time. As it happens in every query, those milliseconds get added to each one, increasing dramatically user and system CPU time.

It seems that everytime theer is a query, it needs to load the keys, which I image it will be the most CPU intensive part.

We have hundreds of variables in external data in our manifests and this is meant to increase with eYAML.

Do we have any obvious mistake in our assumptions and is there any advise we can take from you?

I am generating strace and attaching it later.

Thank you!

@guillemlc
Copy link
Author

Added straces files, with and without eyam enabled.

eyaml_192_strace.zip

@r4v5
Copy link

r4v5 commented Apr 29, 2016

Wondering if this is fixed or at least ameliorated by not double-decrypting blocks, as mentioned in #182? If not, caching as per #191 might help.

@guillemlc
Copy link
Author

@r4v5 I will check that out.

I will package a new version and test it. Thanks.

@guillemlc
Copy link
Author

It seems that with version 2.1.0 there is some inprovement, at least in an idle host:

eyaml off:


time hiera repos::epel::year environment=gliartecano_test -d
DEBUG: Wed May 04 09:57:11 +0000 2016: Hiera YAML backend starting
DEBUG: Wed May 04 09:57:11 +0000 2016: Looking up repos::epel::year in YAML backend
DEBUG: Wed May 04 09:57:11 +0000 2016: Looking for data source defaults
DEBUG: Wed May 04 09:57:11 +0000 2016: Found repos::epel::year in defaults
2016

**real  0m0.067s**
user    0m0.055s
sys 0m0.012s

eyaml on

DEBUG: Wed May 04 09:58:12 +0000 2016: Hiera YAML backend starting
DEBUG: Wed May 04 09:58:12 +0000 2016: Looking up repos::epel::year in YAML backend
DEBUG: Wed May 04 09:58:12 +0000 2016: Looking for data source defaults
DEBUG: Wed May 04 09:58:12 +0000 2016: Found repos::epel::year in defaults
2016

**real  0m0.087s**
user    0m0.068s
sys 0m0.015s

@sihil
Copy link
Collaborator

sihil commented May 4, 2016

@guillemlc can you clarify what you're testing there?

@guillemlc
Copy link
Author

@sihil in previous versions the CPU time for hiera to retrieve a value was quite high when eyaml was enabled. The latest version 2.1.0 has reduced that time considerably.

When using eyaml backend with tens of computers, I believe this to be quite irrelevant as if you have a few hiera lookups for 20 computers the aggregate extra time may be negligible.

We use hiera in a network of more than 5000 computers and with previous versions, when enabling eyaml backend, the puppetmasters would go toast, staying almost all the time at 100%, increasing dramatically the time for a puppet run to complete and making some of them time-out.

version 2.1.0 is the first one that we are likely to leave on as, while still uses a lot more CPU that expected it is already usable.

To give you some idea, with version 2.0.8, after 5 minutes in a 8 core server, you would have a run queue of more than 400 process. With 2.1.0 the run queue stays in a more tolerable 112.

So yo answer your question; I am testing hiera with eyaml on and off, checking the time it takes for hiera to lookup a key's value, as you can see in my first post the 'real' time with eyam on was 0.110 s in version 2.0.8, which went down to 0.84 in version 2.1.0. Given the fact that this needs to be performed hundreds of times per second, in several puppetmasters serving thousands of hosts, a 15-20 millisecond improvement is quite important.

Bottom line of all this: eyaml backend is likely to be used in large networks in order to keep sensitive data protected. These networks will have thousands of nodes. Adding a bakend to hiera comes with a performance penalty. A backeneds should be as performant as possible.

I hope it makes sense.

@jonnaladheeraj
Copy link

@guillemlc Have you tried testing it by just enabling yaml as your backend and inside eyaml block defining extension as yaml ? We are dealing with the same issue and would like to get more insight on this.

The default eyaml file extension is .eyaml, however this can be configured in the :eyaml block to set :extension

:eyaml:
:extension: 'yaml'

@sammcj
Copy link

sammcj commented May 16, 2018

How did everyone get on with this?

We're trying to nail down some hiera performance issues at the moment...

@Dan33l
Copy link
Member

Dan33l commented Sep 26, 2018

it can be reopened if it's reproducible on the latest Puppet 4 or newer

@Dan33l Dan33l closed this as completed Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants