Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance degradation for extra fields #1117

Closed
navtej opened this issue Nov 12, 2017 · 8 comments
Closed

performance degradation for extra fields #1117

navtej opened this issue Nov 12, 2017 · 8 comments
Labels
bug Indicates an unexpected problem or unintended behavior component: core performance
Milestone

Comments

@navtej
Copy link
Contributor

navtej commented Nov 12, 2017

I have noticed severe performance degradation in bots caused by commit 50eb548 to file intelmq/lib/message.py

Here are two runs with commit 50eb548 and prior commit 6ccaf3b

Logging is set as follows

    "log_processed_messages_count": 10000,
    "log_processed_messages_seconds": 300,

First run is with commit 6ccaf3b and the bot is run for 5 minutes

ubuntu@ubuntu:~/src/intelmq$ git checkout 6ccaf3b82b051749164b0884607b5c161b142056 -- intelmq/lib/message.py
ubuntu@ubuntu:~/src/intelmq$ sudo -H pip3 install -e .
--pip output--
ubuntu@ubuntu:~/src/intelmq$ intelmqctl start taxonomy-expert
ubuntu@ubuntu:~/src/intelmq$ tail -f /opt/intelmq/var/log/taxonomy-expert.log
2017-11-12 13:29:32,022 - taxonomy-expert - INFO - TaxonomyExpertBot initialized with id taxonomy-expert and intelmq 1.1.0.alpha1 and python 3.4.2 (default, Oct  8 2014, 10:45:20) as process 19841.
2017-11-12 13:29:32,022 - taxonomy-expert - INFO - Bot is starting.
2017-11-12 13:29:32,023 - taxonomy-expert - INFO - Pipeline ready.
2017-11-12 13:30:01,141 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:30:30,353 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:30:59,151 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:31:28,417 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:31:56,735 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:32:25,360 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:32:54,706 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:33:23,586 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:33:52,999 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:34:21,959 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:34:53,043 - taxonomy-expert - INFO - Processed 10000 messages since last logging.
2017-11-12 13:35:22,710 - taxonomy-expert - INFO - Processed 10000 messages since last logging.```

The bot processed approx 1,20,000 messages in 5 minutes.

Next run is with commit 50eb548 and the bt is run for 30 minutes

ubuntu@ubuntu:~/src/intelmq$ git checkout 50eb548f57444e87063b456903b38eb9bbb23ed8 -- intelmq/lib/message.py
ubuntu@ubuntu:~/src/intelmq$ sudo -H pip3 install -e .
--pip output--
ubuntu@ubuntu:~/src/intelmq$ intelmqctl start taxonomy-expert
ubuntu@ubuntu:~/src/intelmq$ tail -f /opt/intelmq/var/log/taxonomy-expert.log
2017-11-12 13:40:15,904 - taxonomy-expert - INFO - TaxonomyExpertBot initialized with id taxonomy-expert and intelmq 1.1.0.alpha1 and python 3.4.2 (default, Oct  8 2014, 10:45:20) as process 20422.
2017-11-12 13:40:15,904 - taxonomy-expert - INFO - Bot is starting.
2017-11-12 13:40:15,905 - taxonomy-expert - INFO - Pipeline ready.
2017-11-12 13:45:15,938 - taxonomy-expert - INFO - Processed 9027 messages since last logging.
2017-11-12 13:50:15,980 - taxonomy-expert - INFO - Processed 3406 messages since last logging.
2017-11-12 13:55:16,005 - taxonomy-expert - INFO - Processed 2345 messages since last logging.
2017-11-12 14:00:16,150 - taxonomy-expert - INFO - Processed 1851 messages since last logging.
2017-11-12 14:05:16,216 - taxonomy-expert - INFO - Processed 1778 messages since last logging.
2017-11-12 14:10:16,335 - taxonomy-expert - INFO - Processed 1647 messages since last logging.

This time the bot only processed 20,000 messages in 30 minutes.

I haven't yet dissected deeper, I hope you can reproduce the results and pinpoint the cause of this issue.

@ghost ghost added this to the 1.1.0 milestone Nov 13, 2017
@ghost ghost added bug Indicates an unexpected problem or unintended behavior component: core performance labels Nov 13, 2017
@ghost
Copy link

ghost commented Nov 13, 2017

Uh. I expected that the speed would be s bit lower but that is really slow. Thanks for identifying the commit.

@navtej
Copy link
Contributor Author

navtej commented Nov 13, 2017

Any possibility you can help me with a quick resolution on this?

@ghost
Copy link

ghost commented Nov 15, 2017

Unfortunately not yet.

It only affects extra fields and I suspect that the slow part is somewhere in the add-method (actually: some methods used by it):

>>> timeit.timeit('msg.MessageFactory.from_dict({"__type": "Event", "extra.foo": "bar"})', setup='import intelmq.lib.message as msg', number=1000)
7.618276967999918
>>> timeit.timeit('msg.MessageFactory.from_dict({"__type": "Event", "source.abuse_contact": "bar"})', setup='import intelmq.lib.message as msg', number=1000)
0.1765257789993484
>>> timeit.timeit("ev = msg.Event(); ev.add('extra.foo', 'bar')", setup='import intelmq.lib.message as msg', number=100)
0.7326651559997117
>>> timeit.timeit("ev = msg.Event(); ev.add('source.abuse_contact', 'bar')", setup='import intelmq.lib.message as msg', number=100)
0.019149830000969814
>>> timeit.timeit("ev._Message__is_valid_key('extra.foo')", setup='import intelmq.lib.message as msg; ev = msg.Event()', number=100)
0.1854049479989044
>>> timeit.timeit("ev._Message__is_valid_key('source.abuse_contact')", setup='import intelmq.lib.message as msg; ev = msg.Event()', number=100)
0.0003554070008249255
>>> timeit.timeit("ev._Message__get_type_config('extra.foo')", setup='import intelmq.lib.message as msg; ev = msg.Event()', number=100)
0.07898436200048309
>>> timeit.timeit("ev._Message__get_type_config('source.abuse_contact')", setup='import intelmq.lib.message as msg; ev = msg.Event()', number=100)
0.0006972609990043566

@ghost ghost changed the title Severe performance degradation of bots Severe performance degradation for extra fields Nov 15, 2017
@navtej
Copy link
Contributor Author

navtej commented Nov 15, 2017

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   251201 1202.798    0.005 2502.155    0.010 message.py:163(add)
   251115  876.961    0.003  901.906    0.004 message.py:291(__is_valid_key)
   251200  309.809    0.001  346.010    0.001 message.py:301(__is_valid_value)
    67165   73.630    0.001   73.630    0.001 {method 'recv' of '_socket.socket' objects}
  1021561   34.987    0.000   95.798    0.000 message.py:350(hash)
 13540536   17.743    0.000   29.494    0.000 utils.py:90(encode)
 27013688   12.840    0.000   12.840    0.000 {method 'update' of '_hashlib.HASH' objects}
13954100/13940189    7.924    0.000    8.112    0.000 {method 'encode' of 'str' objects}
  6753471    6.933    0.000    6.933    0.000 {built-in method builtins.repr}
  1021585    6.751    0.000    6.782    0.000 {built-in method builtins.sorted}
15604264/15604263    5.255    0.000    5.255    0.000 {built-in method builtins.isinstance}
  1021561    3.926    0.000   99.724    0.000 message.py:347(__hash__)
  1021561    2.517    0.000    2.517    0.000 {method 'hexdigest' of '_hashlib.HASH' objects}
    67165    2.129    0.000    4.485    0.000 connection.py:632(pack_command)
  1021562    2.052    0.000    2.052    0.000 {built-in method _hashlib.openssl_sha256}
    67165    1.972    0.000    1.972    0.000 {method 'sendall' of '_socket.socket' objects}
    30768    1.835    0.000    3.610    0.000 _strptime.py:302(_strptime)
    67165    1.200    0.000   75.427    0.001 connection.py:166(_read_from_socket)
    16847    1.194    0.000 2481.536    0.147 message.py:92(__init__)

Here is cProfile run, add, __is_valid_key and __is_valid_value are the culprits

@navtej
Copy link
Contributor Author

navtej commented Nov 16, 2017

I did the timeit test with repeat and it led to interesting result

timeit.repeat("ev = msg.Event(); ev.add('extra.foo', 'bar')", setup='import intelmq.lib.message as msg', number=1000, repeat=10)
[1.447678582975641, 2.464164612116292, 3.438820904120803, 4.1598789119161665, 5.545920182950795, 6.403191218851134, 7.351596172899008, 8.178117901086807, 8.996471178950742, 10.242410328937694]

Notice the linear increase in time. This forced me to look away from the methods listed in above comment. I started to look at functool decorator lru_cache, here are the results with lru_cache decorator commented out

timeit.repeat("ev = msg.Event(); ev.add('extra.foo', 'bar')", setup='import intelmq.lib.message as msg', number=1000, repeat=10)
[0.23544840887188911, 0.2455875410232693, 0.22563187615014613, 0.22631947207264602, 0.2341673190239817, 0.23926178296096623, 0.22105620708316565, 0.22270849510096014, 0.21890562190674245, 0.2204773100093007]

The only explanation I can think of is that lru_cache relies on hash and somehow this doesnt works well in current setup.

I have submitted the PR #1119 for fix, in which I reordered the call to _get_type_config in add, and passed the result to __is_valid_key and __is_valid_value in params, so that these functions dont have to call _get_type_config

@ghost
Copy link

ghost commented Dec 12, 2017

I reworked JSONDict a bit (without any changes at in these validation methods in the message class yet) and for the timeit-benchmark I am now as fast as on 1.0.x (actually ~5% faster, but that's nitpicking).

@ghost ghost self-assigned this Mar 9, 2018
@ghost ghost changed the title Severe performance degradation for extra fields performance degradation for extra fields Jun 14, 2018
@ghost
Copy link

ghost commented Jun 28, 2018

see also #1252 for a possible fix

@ghost
Copy link

ghost commented Jul 19, 2018

Closing here, further improvements are on the roadmap

@ghost ghost closed this as completed Jul 19, 2018
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior component: core performance
Projects
None yet
Development

No branches or pull requests

1 participant