AlienVault OTX API #298

robcza · 2015-08-25T15:01:17Z

Creating this pull request to discuss several points:

I've added third party script directly to this branch https:/AlienVault-Labs/OTX-Python-SDK/blob/master/OTXv2.py not sure if this is smart as:
- it will not get updated as the source changes
- there is no license file
Created new collector - seems weird, however didn't see any other simple way. Should I create new specialized collectors for any other API sources?
The parser itself raises many questions
- There is no straightforward method to determine classification.type of the event (AlienVault uses tags, however they are user defined and are close to random)
- Included some metadata regarding the IoCs (author, name, description) and dumped them into the "comment" attribute. What format is preferable to store multiple pieces of information in an attribute like this?

Thank for your comments

sebix · 2015-08-25T15:04:07Z

It's not compatible to PEP8.

robcza · 2015-08-25T15:29:02Z

Ok, will study and check that .... should be ok by now

…to alienvault-otx

sebix · 2015-08-25T15:42:24Z

intelmq/bots/parsers/alienvault/parser-otx.py

+ raw_report = utils.base64_decode(report.value("raw"))
+
+ for pulse in json.loads(raw_report):
+ comment = "author: " + pulse['author_name'] + "; name: " + \


using string formatting us usually more clear. Example:

comment = ("author: {}; name: {}; description: {}" "".format(pulse['author_name'], pulse['name'], pulse['description']))

The resulting string can be seen immediately.

sebix · 2015-08-25T15:44:13Z

Several lines are longer than 80 chars. You can use a tuple with one element for wrapping. This will be reduced to the element itself, e.g.:

myvar = ("long"
         "string")

In function calls, this is not necessary as the ending scope is clearly defined.

sebix · 2015-08-26T09:06:29Z

For the license of OTX-Python-SDK, I opened an issue there: AlienVault-OTX/OTX-Python-SDK#5
I am not the one who decides this case, but in my opinion we can't include that code as such. Additionally the license has to be compatible with Gnu Affero Public License.

I think the collector is fine.
For the classification I'm afraid we need a mapping.

sebix · 2015-08-26T18:23:57Z

OTX-Python-SDK is now licensed under Apache License 2.0: https:/AlienVault-Labs/OTX-Python-SDK/blob/master/LICENSE

SYNchroACK · 2015-09-01T23:57:42Z

@robcza is it finished?

sebix · 2015-09-02T07:41:43Z

It should be at least mentioned in the COPYRIGHT(.md) files, that the included file is Apache licensed.

As the Apache license is deemed compatible with the (L)GPL v3, you can use Apache licensed code in your LGPL project.

If you use the Apache licensed code only in the form of a library, the conventional way is

keep the Apache licensed library in a separate directory/sub-tree of the project

mention in your documentation (readme) that you use such-and-such library and under which license it is being distributed.

In your source code, you don't need to mention that a function/class comes from a separate library. The fact that it has a different license should be obvious from the copyright statements in the respective source files.

programmers@SE: How to use apache license in my project which will be LGPL

sebix · 2015-09-02T09:19:05Z

intelmq/bots/parsers/alienvault/parser-otx.py

+ def process(self):
+ report = self.receive_message()
+ if (report is None or not report.contains("raw") or
+ len(report.value("raw").strip()) == 0):


The last check should probably be done in Bot.receive_message() and results in None. Was there any special message which caused problems and you fixed it that way?

Not really, will get rid of that check

An empty raw (after strip) will be rejected anyway: https:/certtools/intelmq/blob/master/intelmq/lib/harmonization.py#L59

…ienvault-otx Conflicts: intelmq/bots/BOTS

sebix · 2015-09-02T12:13:59Z

intelmq/bots/parsers/alienvault/parser-otx.py

+
+ def process(self):
+ report = self.receive_message()
+ if (report is None or not report.contains("raw"):


This is a SyntaxError

robcza · 2015-09-02T12:21:05Z

Uraaa! I just merged something and it worked :) Anyway, sorry for "premature push", I will check the code, prepare tests and will get back to you to discuss a few issues. So don't waste your time looking at the code right now.

sebix · 2015-09-02T12:23:48Z

Forgot that we have a surveillance state? I will watch at every commit you do!!!!!1!111

sebix · 2015-09-02T12:25:24Z

Here is a good summary with all steps needed in a development process in git (branches, rebasing etc): https://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html

… with test itself

robcza · 2015-09-02T15:50:06Z

Well, ok, now it seems a lot better. However I'm aware of some issues:

there is no definition for malware.hash_type in harmonization.conf - I would like to add that, however I could be missing something
classification is still other/blacklist - I am not really happy with this, but the source is close to random regarding the tags they use for their IoCs. I believe, I could change the classification for some of the tags like "malware" and keep other/blacklist for the rest of them.
I tried to provide test data, however the test itself is still failing, too cryptic for me to solve

ERROR: test_event (__main__.TestAlienVaultOTXParserBot)
Test if correct Event has been produced.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "intelmq/tests/bots/parsers/alienvault/test_parser_otx.py", line 48, in test_event
    self.assertMessageEqual(0, EXAMPLE_EVENT)
  File "/usr/local/lib/python2.7/dist-packages/intelmq-1.0.0-py2.7.egg/intelmq/lib/test.py", line 333, in assertMessageEqual
    del expected_message['time.observation']
KeyError: u'time.observation'

----------------------------------------------------------------------
Ran 12 tests in 0.096s

FAILED (errors=1)

sebix · 2015-09-03T07:14:43Z

You are using the wrong format to add grouped fields:

"time": {
    "source": "2015-09-02T09:21:53+00:00",
    "observation": "2015-09-02T14:17:58+00:00"}}

should be:

"time.source": "2015-09-02T09:21:53+00:00",
"time.observation": "2015-09-02T14:17:58+00:00"

same applies to source, classification, feed and all others.

SYNchroACK · 2015-09-03T09:59:39Z

@sebix can you explain a little bit more?

sebix · 2015-09-03T10:45:22Z

Have a look at the tests code: https:/robcza/intelmq/blob/alienvault-otx/intelmq/tests/bots/parsers/alienvault/test_parser_otx.py#L32

SYNchroACK · 2015-09-03T13:56:41Z

yap, i got it! ;)

robcza · 2015-09-09T14:07:03Z

I've fixed the issue regarding the grouped attributes and added "additional" attribute to the event generated by the parser.
Also, I defined malware.hash_type in the harmonizotaion.conf and used it in the parser.

I see no easy way to make the classification more precise at the moment, however I will keep eye on changes introduced by AlienVault, hopefully they will come up with something to narrow it.

Feels like complete right now, however feel free to comment further. And btw. @sebix those tests are pretty useful, thanks for them

sebix · 2015-09-09T14:13:29Z

intelmq/bots/collectors/alienvault_otx/collector.py

+from OTXv2 import OTXv2
+import json
+
+from intelmq.bots.collectors.http.lib import fetch_url


fetch_url is not used.

sebix · 2015-09-09T14:27:32Z

The hash type is usually stored along with the hash value as prefix. E.g $1$ for MD5, $$2a$ etc for Blowfish, $5$ for SHA-256.
See also WP: Crypt (C)

SYNchroACK · 2015-09-09T14:46:30Z

@sebix so, malware.hash will also include the hash_type?!?!?!

sebix · 2015-09-09T14:51:53Z

On 09/09/2015 04:46 PM, Tomás Lima wrote:

@sebix https:/sebix so, malware.hash will also include
the hash_type?!?!?!

All comments made by me do represent my opinion and I don't want to
patronize neither you or any other "authoritative" project lead. I
indicate that the hash can also include the hash_type and a separate
filed is not necessary. Nothing else. Just that.

aaronkaplan · 2015-09-09T14:53:48Z

On Sep 9, 2015, at 4:51 PM, Sebastian [email protected] wrote:

On 09/09/2015 04:46 PM, Tomás Lima wrote:

@sebix https:/sebix so, malware.hash will also include
the hash_type?!?!?!

All comments made by me do represent my opinion and I don't want to
patronize neither you or any other "authoritative" project lead. I
indicate that the hash can also include the hash_type and a separate
filed is not necessary. Nothing else. Just that.

Agreed, it makes more sense to rely on that.

SYNchroACK · 2015-09-09T15:03:58Z

Agreed. It will require to add a util to convert a hash into to hash type along with the hash value as prefix.

sebix · 2015-09-09T16:27:45Z

I propose to replace this code:

HASHES = {
    'FileHash-SHA256': 'SHA-256',
    'FileHash-SHA1': 'SHA-1',
    'FileHash-MD5': 'MD5'
}

if indicator["type"] in [
        'FileHash-SHA256',
        'FileHash-SHA1',
        'FileHash-MD5']:
    event.add(
        'malware.hash',
        indicator["indicator"],
        sanitize=True)
    event.add(
        'malware.hash_type', HASHES[
            indicator["type"]], sanitize=True)

with this:

HASHES = {
    'FileHash-SHA256': '$5$',
    'FileHash-SHA1': '$sha1$',
    'FileHash-MD5': '$1$'
}
if indicator["type"] in HASHES:
    event.add('malware.hash',
              HASHES[indicator["type"]] + indicator["indicator"])

robcza · 2015-09-09T18:09:00Z

Included proposed changes, adjusted the description of malware.hash and made the parser code a bit more readable.

sebix · 2015-09-11T08:01:17Z

I just checked out your code locally. Had to rebase on master and fix several merge conflicts (please do not use merge to get changes from upstream ...) and adapted the files to current changes. Also fixed some style issues and made sure, the JSON dumps are always equal (the order of items in a dictionary is not always the same, thus we sort by keys now, also fixes Python 3 tests). Have a look here: sebix@8e2e225

sebix · 2015-09-11T08:31:30Z

OTXv2.py is ugly code and not compatible to py3. There is a PR on some issues, I commented there: AlienVault-OTX/OTX-Python-SDK#2
Hopefully they fix the issues upstream, otherwise I will fork the abandoned project and make the fixes myself. That's not blocking this PR, just wanted to rant...

robcza · 2015-09-11T08:51:05Z

Thank you @sebix, not using merge to get changes from upstream ever since you told me, so it shouldn't happen again. Key sorting makes sense, I'm totally ok with the changes you introduced.

SYNchroACK · 2015-09-11T09:49:37Z

intelmq/bots/parsers/alienvault/parser_otx.py

+ event.add('comment', pulse['description'])
+ event.add('additional', additional, sanitize=True)
+ event.add('classification.type', 'blacklist', sanitize=True)
+ event.add('time.observation', report.value(


Use the following example to fill the event with time.observation, feed.url e feed.name

https:/certtools/intelmq/blob/master/intelmq/bots/parsers/autoshun/parser.py#L38

@robcza You can cherry-pick my commit after adding my repo as remote to your local branch:

git remote add sebix [email protected]:sebix/intelmq.git git remote update git cherry-pick 8e2e225beda1b0ff5b417649f44396aca1e962d8

@SYNchroACK there seem to be a typo in that examplereports instead of report report

@sebix The proposal of yours also takes the "raw" field from the report making it impossible to add the "raw" record for the event itself later, failing later like this:
ERROR - Bot has found a problem.\nTraceback (most recent call last):\n File "/usr/local/lib/python2.7/dist-packages/intelmq-1.0.0-py2.7.egg/intelmq/lib/bot.py", line 117, in start\n self.process()\n File "/usr/local/lib/python2.7/dist-packages/intelmq-1.0.0-py2.7.egg/intelmq/bots/parsers/alienvault/parser_otx.py", line 75, in process\n event.add("raw", json.dumps(indicator), sanitize=True)\n File "/usr/local/lib/python2.7/dist-packages/intelmq-1.0.0-py2.7.egg/intelmq/lib/message.py", line 77, in add\n raise exceptions.KeyExists(key)\nKeyExists: key u\'raw\' already exists

Should we replace the raw field instead of adding (does not feel like a proper way copying the whole raw report again and again) it or change the "constructor" to ignore the "raw" attribute from report (sorry for the cpp terminology).

See the code in lib/message.py: https:/certtools/intelmq/blob/master/intelmq/lib/message.py#L181
The Event-constructor differentiates between a Report and other parameter types like dict. Report is derived from Message which is in turn derived from dict. For a report instance, it only takes feed.(name|url) and time.observation, not raw. For everything else, like dicts and messages, the parameter is given to the constructor of the parent class, which is in the end dict, and all fields are copied. If this is really the case, it does mean that receive_message / the pipeline does not correctly unserialize the message!
So first please check the type of report e.g. self.logger.debug('report type {!r}'.format(type(report))), and/or some issubclass calls.

Signed-off-by: Sebastian Wagner <[email protected]>

…ienvault-otx

robcza · 2015-09-11T12:41:09Z

well, well, problem was actually in my outdated repo, tests are passing now with the cherry-picked changes from @sebix

SYNchroACK · 2015-09-11T12:53:21Z

@sebix and @robcza , is it ok to merge it now?

sebix · 2015-09-11T12:55:27Z

Yes. I hope the merge commit from today 95afec8 does not break anything....

robcza · 2015-09-11T13:13:32Z

I'm not aware of any problems right now. I should focus on the classification later, but merging this PR is ok.

SYNchroACK · 2015-09-11T13:16:11Z

ok. Thank you

AlienVault OTX API

sebix · 2015-09-16T07:48:22Z

intelmq/bots/collectors/alienvault_otx/OTXv2.py

+import urlparse
+import urllib
+import urllib2
+import simplejson as json


One last question: why did you chose simplejson over json? To cite the release notes of Python 2.7:

Updated module: The json module was upgraded to version 2.0.9 of the simplejson package, which includes a C extension that makes encoding and decoding faster. (Contributed by Bob Ippolito; issue 4136.)

Also see SO:
What are the differences between json and simplejson Python modules?
, links a unicode bug in simplejson and states:

json is simplejson, added to the stdlib.

robcza added 3 commits August 25, 2015 16:38

alienvault-otx api collector and parser

1162c36

OTX SDK added

e8ae727

Create README.md

0f7d12f

robcza added 2 commits August 25, 2015 17:33

pep8 compliance fixed

e961c00

Merge branch 'alienvault-otx' of https:/robcza/intelmq in…

c55d10d

…to alienvault-otx

sebix reviewed Aug 25, 2015
View reviewed changes

sebix mentioned this pull request Aug 26, 2015

DataHarmonization Document improvements #185

Closed

sebix reviewed Sep 2, 2015
View reviewed changes

Merge branch 'master' of https:/certtools/intelmq into al…

d60e244

…ienvault-otx Conflicts: intelmq/bots/BOTS

sebix reviewed Sep 2, 2015
View reviewed changes

time observation in the collector

edf97f3

syntax error fixes

2322d00

robcza added 2 commits September 2, 2015 17:34

working collector + bot, test data provided however there is an issue…

844653a

… with test itself

license for the AlienVault OTX SDK

b9c2e16

fixed alienvault otx parser + tests

cadc76a

SYNchroACK added the component: bots label Sep 9, 2015

SYNchroACK added this to the Release 1 milestone Sep 9, 2015

SYNchroACK self-assigned this Sep 9, 2015

sebix reviewed Sep 9, 2015
View reviewed changes

malware.hash harmonization and several fixes proposed by @sebix

34b9628

SYNchroACK reviewed Sep 11, 2015
View reviewed changes

sebix and others added 2 commits September 11, 2015 13:38

BUG: Adapt alienvault otx to current master

d2f8e3d

Signed-off-by: Sebastian Wagner <[email protected]>

Merge branch 'master' of https:/certtools/intelmq into al…

95afec8

…ienvault-otx

SYNchroACK added a commit that referenced this pull request Sep 11, 2015

Merge pull request #298 from robcza/alienvault-otx

efba3a8

AlienVault OTX API

SYNchroACK merged commit efba3a8 into certtools:master Sep 11, 2015

robcza deleted the alienvault-otx branch September 11, 2015 13:33

sebix reviewed Sep 16, 2015
View reviewed changes

sebix mentioned this pull request Aug 11, 2016

Harmonization fixes #634

Merged

AlienVault OTX API #298

AlienVault OTX API #298

Conversation

robcza commented Aug 25, 2015

sebix commented Aug 25, 2015

robcza commented Aug 25, 2015

Choose a reason for hiding this comment

sebix commented Aug 25, 2015

sebix commented Aug 26, 2015

sebix commented Aug 26, 2015

SYNchroACK commented Sep 1, 2015

sebix commented Sep 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robcza commented Sep 2, 2015

sebix commented Sep 2, 2015

sebix commented Sep 2, 2015

robcza commented Sep 2, 2015

sebix commented Sep 3, 2015

SYNchroACK commented Sep 3, 2015

sebix commented Sep 3, 2015

SYNchroACK commented Sep 3, 2015

robcza commented Sep 9, 2015

Choose a reason for hiding this comment

sebix commented Sep 9, 2015

SYNchroACK commented Sep 9, 2015

sebix commented Sep 9, 2015

aaronkaplan commented Sep 9, 2015

SYNchroACK commented Sep 9, 2015

sebix commented Sep 9, 2015

robcza commented Sep 9, 2015

sebix commented Sep 11, 2015

sebix commented Sep 11, 2015

robcza commented Sep 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robcza commented Sep 11, 2015

SYNchroACK commented Sep 11, 2015

sebix commented Sep 11, 2015

robcza commented Sep 11, 2015

SYNchroACK commented Sep 11, 2015

Choose a reason for hiding this comment