Merge branch 'master' of https:/certtools/intelmq into al…

…ienvault-otx
certtools · Sep 11, 2015 · 95afec8 · 95afec8
2 parents d2f8e3d + c715b52
commit 95afec8
Show file tree

Hide file tree

Showing 101 changed files with 1,139 additions and 636 deletions.
diff --git a/.gitignore b/.gitignore
@@ -12,3 +12,6 @@ dist
 *.old
 .vagrant/
 *~
+.coverage
+.idea/
+htmlcov/
diff --git a/.travis.yml b/.travis.yml
@@ -0,0 +1,13 @@
+language: python
+python:
+ - "2.7"
+ - "3.4"
+# command to install dependencies
+install:
+ - if [[ $TRAVIS_PYTHON_VERSION == '2.7' ]]; then pip install -r REQUIREMENTS; fi
+ - if [[ ${TRAVIS_PYTHON_VERSION%.?} == 3 ]]; then pip install -r REQUIREMENTS3; fi
+ - "python setup_travis.py install"
+# command to run tests
+script: nosetests
+services:
+ - redis-server
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,17 +1,51 @@
 CHANGELOG
 ==========
 
-v1.0 (in developement)
-- renamed bots.parsers.spamhaus.parser to bots.parsers.spamhaus.parser_drop
-- added bots.parsers.spamhaus.parser_cert
-- added bots.parsers.fraunhofer.parser_dga
-- added bots.experts.certat_contact.expert
+v1.0 (in development, master branch)
+----
+
+### Bot changes
+- ENH: added bots.parsers.spamhaus.parser_cert
+- ENH: added bots.parsers.fraunhofer.parser_dga
+- ENH: added bots.experts.certat_contact.expert
+- MAINT: renamed bots.parsers.spamhaus.parser to bots.parsers.spamhaus.parser_drop
+
+### Bug fixes
+- FIX: all bots handle message which are None
+- FIX: various encoding issues resolved in core and bots
+- FIX: time.observation is generated in collectors, not in parsers
+
+### Other enhancements and changes
+- TST: testing framework for core and tests. Newly introduced components should always come with proper unit tests.
+- ENH: intelmqctl has shortcut parameters and can clear queues
+- STY: code obeys PEP8, new code should always be properly formatted
+- ENH: More code is Python 3 compatible
+- DOC: Updated user and dev guide
+
+### Harmonization
+- ENH: Additional data types: integer, float and Boolean
+- ENH: Added descriptions and matching types to all fields
+- DOC: harmonization documentation has same fields as configuration, docs are generated from configuration
+
+#### Most important changes:
+- `(source|destination).bgp_prefix` is now `(source|destination).network`
+- `(source|destination).cc` is now `(source|destination).geolocation.cc`
+- `(source|destination).reverse_domain_name` is `(source|destination).reverse_dns`
+- `misp_id` changed to `misp_uuid`
+- `protocol.transport` added
+- `webshot_url` removed
+- `additional_information` renamed to `extra`, must be JSON
+- `os.name`, `os.version`, `user_agent` removed in favor of `extra`
+
+-----
+
+
 
 ## 2015/06/03 (aaron)
 
  * fixed the license to AGPL in setup.py
  * moved back the docs/* files from the wiki repo to docs/. See #205.
- * added python-zmq as a setup requirment in UserGuide . See #206
+ * added python-zmq as a setup requirement in UserGuide . See #206
 
 
 
@@ -38,7 +72,7 @@ v1.0 (in developement)
  FILE: conf/harmonization.conf
 
  - in harmonization.conf is possible to define the fields of a specific message in json format.
- - the harmonization.py has datatypes witch contains sanitize and validation methods that will make sure that the values are correct to be part of an event.
+ - the harmonization.py has data types witch contains sanitize and validation methods that will make sure that the values are correct to be part of an event.
 
 
 
@@ -60,8 +94,8 @@ v1.0 (in developement)
 
 
 
-* Defaults configrations
- - new configuration file to specify the default parameters which will be apllied to all bots. Bots can overwrite the configurations.
+* Defaults configurations
+ - new configuration file to specify the default parameters which will be applied to all bots. Bots can overwrite the configurations.
 
 
 

diff --git a/REQUIREMENTS b/REQUIREMENTS
@@ -1,15 +1,19 @@
-python-dateutil==1.5
-geoip2==0.5.1
-dnspython==1.11.1
-redis==2.10.3
-pymongo==2.7.1
-xmpppy==0.5.0rc1
-imbox==0.5.5
+dnspython>=1.11.1
+geoip2>=0.5.1
+imbox>=0.5.5
 ipaddress
-unicodecsv==0.9.4
-pytz==2012d
-psutil==2.1.1
-pyzmq==14.6.0
-pydns==2.3.6
-pycurl==7.19.0
-mock
+mock>=1.1.1
+psutil>=2.1.1
+pyasn
+pycurl>=7.19.0
+pydns>=2.3.6
+pymongo>=2.7.1
+python-dateutil>=1.5
+pytz>=2012d
+pyzmq>=14.6.0
+redis>=2.10.3
+requests>=2.4.2
+six>=1.7
+unicodecsv>=0.9.4
+xmpppy>=0.5.0rc1
+
diff --git a/REQUIREMENTS3 b/REQUIREMENTS3
@@ -0,0 +1,17 @@
+dnspython3>=1.12.0
+geoip2>=0.5.1
+imbox>=0.5.5
+ipaddress
+mock>=1.1.1
+psutil>=2.1.1
+pyasn
+pycurl>=7.19.0
+pymongo>=2.7.1
+python-dateutil>=1.5
+pytz>=2012d
+pyzmq>=14.6.0
+redis>=2.10.3
+requests>=2.4.2
+six>=1.7
+unicodecsv>=0.9.4
+xmpppy>=0.5.0rc1
diff --git a/docs/Bots.md b/docs/Bots.md
@@ -0,0 +1,57 @@
+Bots documentation
+==================
+
+Experts
+-------
+
+| name | IPv6 | lookup | public | cache: redis db | information | comment |
+|:-----|:-----|:---------|:--------|
+| abusix | n | ? | y | 5 | ip to abuse contact | ipv6 implementation missing |
+| asn-lookup | n | local db | y | - | ip to asn | [IPv6 bugreport](https:/hadiasghari/pyasn/issues/14)
+| certat-contact | n | https | y | - | asn to cert abuse contact, cc |
+| cymru-whois | y | cymru dns | y | 6 | ip to geolocation, asn, network |
+| deduplicator | y | redis cache | y | 7 | - | not tested |
+| filter | y | - | y | - | drops event | not tested |
+| maxmind-geoip | ? | local db | n | - | ip to geolocation ? | not stable |
+| modify | - | config | y | - | arbitrary |
+| reverse-dns | n | dns | y | 8 | ip to domain | ipv6 implementation missing |
+| ripencc-abuse-contact | y | ? | y | 9 | ip to abuse contact |
+| taxonomy | - | - | y | - | classification type to taxonomy |
+| tor-nodes | n | local db | y | - | if ip is tor node |
+
+
+### Modify
+
+The modify expert bot allows you to change arbitrary field values of events just using a configuration file. Thus it is possible to adapt certain values or adding new ones only by changing JSON-files without touching the code of many other bots.
+
+The configuration is called `modify.conf` and looks like this:
+
+```json
+{
+"Spamhaus Cert": {
+ "__default": [{
+ "feed.name": "^Spamhaus Cert$"
+ }, {
+ "classification.identifier": "{msg[malware.name]}"
+ }],
+ "conficker": [{
+ "malware.name": "^conficker(ab)?$"
+ }, {
+ "classification.identifier": "conficker"
+ }],
+ "urlzone": [{
+ "malware.name": "^urlzone2?$"
+ }, {
+ "classification.identifier": "urlzone"
+ }]
+ }
+}
+```
+
+The dictionary in the first level holds sections, here called `Spamhaus Cert` to group the rulessets and for easier navigation. It holds another dictionary of rules, consisting of *conditions* and *actions*. The first matching rule is used. Conditions and actions are again dictionaries holding the field names of harmonization and have regex-expressions to existing values (condition) or new values (action). The rule conditions are merged with the default condition and the default action is applied if no rule matches.
+
+#### Examples
+
+We have an event with `feed.name = Spamhaus Cert` and `malware.name = confickerab`. The expert loops over all sections in the file and enters section `Spamhaus Cert`. First, the default condition is checked, it matches! Ok, going on. Otherwise the expert would have continued to the next section. Now, iteration through the rules, the first is rule `conficker`. We combine the conditions of this rule with the default conditions, and both rules match! So we can apply the action, here `classification.identifier` is set to `conficker`, the trivial name.
+
+Assume we have an event with `feed.name = Spamhaus Cert` and `malware.name = feodo`. The default condition matches, but no others. So the default action is applied. The value for `classification.identifier` is `{msg[malware.name]}`, this is [standard Python string format syntax](https://docs.python.org/3/library/string.html#formatspec). Thus you can use any value from the processed event, which is available as `msg`.
diff --git a/docs/Data-Harmonization.md b/docs/Data-Harmonization.md
@@ -12,6 +12,8 @@
 
 ## Overview
 
+All messages (reports and events) are Python/JSON dictionaries. The key names and according types are defined by the so called *harmonization*.
+
 The purpose of this document is to list and clearly define known **fields** in Abusehelper as well as Intelmq or similar systems. A field is a ```key=value``` pair. For a clear and unique definition of a field, we must define the **key** (field-name) as well as the possible **values**. A field belongs to an **event**. An event is basically a structured log record in the form ```key=value, key=value, key=value, …```. In the [List of known fields](#fields), each field is grouped by a **section**. We describe these sections briefly below.
 Every event **MUST** contain a timestamp field.
 
@@ -22,53 +24,6 @@ Every event **MUST** contain a timestamp field.
 
 The keys can be grouped together in sub-fields, e.g. `source.ip` or `source.geolocation.latitude`. Thus, keys must match `[a-z_.]`.
 
-## EBNF
-To grasp the concept of fields, events, keys, values, etc. the following [EBNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form) description might help. _Do not take this as a literal instruction for implementations_. The formatting of events and fields (and how fields are separated from each other) might vary depending on the encapsulating format (JSON, CSV , etc.) . This EBNF description is here to illustrate how these concepts work together (and are not complete):
-
-
-```
-Events ::= Event
- | Events '\n' Event
- 
-Event ::= Field
- | Event ', ' Field
-
- 
-Field ::= Key '=' Value
-
-Value ::= StringLiteral
- | Number
- 
-Key ::= [a-z0-9_-]+
-Number ::= [0-9]+
-StringLiteral
- ::= '"' [^"]* '"'
- | "'" [^']* "'"
- 
-```
-
-### Events
-![Events EBNF](https://raw.githubusercontent.com/certtools/intelmq/master/docs/images/Events.png)
-
-### Event
-![Event EBNF](https://raw.githubusercontent.com/certtools/intelmq/master/docs/images/Event.png)
-
-### Field
-![Field EBNF](https://raw.githubusercontent.com/certtools/intelmq/master/docs/images/Field.png)
-
-### Key
-![Key EBNF](https://raw.githubusercontent.com/certtools/intelmq/master/docs/images/Key.png)
-
-### Value
-![Value EBNF](https://raw.githubusercontent.com/certtools/intelmq/master/docs/images/Value.png)
-
-### String Literal
-![String Literal EBNF](https://raw.githubusercontent.com/certtools/intelmq/master/docs/images/StringLiteral.png)
-
-### Number
-![Number EBNF](https://raw.githubusercontent.com/certtools/intelmq/master/docs/images/Number.png)
-
-
 
 <a name="sections"></a>
 ## Sections
@@ -111,6 +66,8 @@ Some sources report an internal (NATed) IP address.
 
 ### Reported Identity
 
+Not used currently.
+
 #### Reported Source Identity
 
 As stated above, each abuse handling organization should define a policy, which IOC to use as the primary element describing an abuse event. Often the sources have done their attribution, but you may choose to correlate their attributive elements with your own. In practice this means that your sanitation should prefix the elements with the '''reported''' keyword, to denote that you've decided the attribute these yourself. The list below is not comprehensive, rather than a list of common things you may want to attribute yourself. Moreover, if you choose to perform your own attribution, the observation time will become your authoritative point of reference related to these IOC.
@@ -139,7 +96,7 @@ The elements listed below are additional keys used to describe abusive behavior,
 
 #### Classification
 
-Having a functional ontology to work with, especially for the abuse types is important for you to be able to classify, prioritize and report relevant actionable intelligence to the parties who need to be informed. The driving idea for this ontology has been to use a minimal set of values with maximal usability. Below, is a list of harmonized values for the abuse types.
+Having a functional ontology to work with, especially for the abuse types is important for you to be able to classify, prioritize and report relevant actionable intelligence to the parties who need to be informed. The driving idea for this ontology has been to use a minimal set of values with maximal usability. See the classification section below for explanations and examples.
 
 <a name="datatypes"></a>
 ## Data types
@@ -160,15 +117,14 @@ Note that this section does not yet define error handling and failure mechanisms
 
 A list of allowed fields can be found in [Harmonization-fields.md](Harmonization-fields.md)
 
-### Rules
+<a name="mapping"></a>
+## Classification
 
-All keys MUST be written in lowercase. 
+Intelmq classifies events using three labels: taxonomy, type and identifier. This tuple of three values can be used for deduplication of events and describes what happened.
+TODO: examples from chat
 
-<a name="mapping"></a>
-## Type/Taxonomy Mapping
+The taxonomy can be automatically added by the taxonomy expert bot based on the given type. The following taxonomy-type mapping is based on eCSIRT Taxonomy:
 
-The following mapping is based on eCSIRT Taxonomy.
-
 |Type|Taxonomy|Description|
 |----|--------|-----------|
 |spam|Abusive Content|This IOC refers to resources, which make up a SPAM infrastructure, be it a harvester, dictionary attacker, URL etc.|
@@ -192,30 +148,36 @@ The following mapping is based on eCSIRT Taxonomy.
 |unknown|Other|unknown events|
 |test|Test|This is a value for testing purposes.|
 
-Meaning of source, destination and local values for each classification type:
+Meaning of source, destination and local values for each classification type and possible identifiers. The identifier is often a normalized malware name, grouping many variants.
 
-|Type|Source|Destination|Local|
+|Type|Source|Destination|Local|Possible identifiers|
 |----|------|-----------|-----|
 |spam|*infected device*|targeted server|internal at source|
-|malware||||
-|botnet drone||||
-|ransomware||||
-|malware configuration||||
-|c&c|*connecting device*|sinkholed server||
+|malware|*infected device*||internal at source|zeus, palevo, feodo|
+|botnet drone|*infected device*|||
+|ransomware|*infected device*|||
+|malware configuration|*infected device*|||
+|c&c|*(sinkholed) c&c server*|||zeus, palevo, feodo|
 |scanner|*scanning device*|scanned device||
-|exploit||||
+|exploit|*hosting server*|||
 |brute-force|*attacker*|target||
-|ids alert||||
-|defacement||||
-|compromised||||
-|backdoor||||
+|ids alert|*triggering device*|||
+|defacement|*defaced website*|||
+|compromised|*server*|||
+|backdoor|*backdoored device*|||
 |ddos|*attacker*|target||
-|dropzone||||
-|phishing||||
-|vulnerable service||||
-|blacklist||||
+|dropzone|*server hosting stolen data*|||
+|phishing|*phishing website*|||
+|vulnerable service|*vulnerable device*||| heartbleed, openresolver, snmp |
+|blacklist|*blacklisted device*|||
 |unknown||||
 
+Field in italics is the interesting one for CERTs.
+
+Example:
+
+If you know of an IP address that connects to a zeus c&c server, it's about the infected device, thus type malware and identifier zeus. If you want to complain about the c&c server, it's type c&c and identifier zeus. The `malware.name` can have the full name, eg. 'zeus_p2p'.
+
 <a name="requirements"></a>
 ## Minimum requirements for events
 

diff --git a/docs/Developers-Guide.md b/docs/Developers-Guide.md
@@ -39,6 +39,8 @@ All changes have to be tested and new contributions must must be accompanied by
 
 It may be necessary to switch the user to `intelmq` if the run-path (`/opt/intelmq/var/run/`) is not writeable by the current user. Some bots need local databases to succeed. If you don't mind about those and only want to test one explicit test file, you can give the filepath as argument.
 
+There is a [Travis-CI](https://travis-ci.org/certtools/intelmq/builds) setup for automatic testing. (-> thx sebix!)
+
 ## Coding-Rules
 
 In general, we follow the [Style Guide for Python Code (PEP8)](https://www.python.org/dev/peps/pep-0008/).
@@ -259,9 +261,10 @@ class ExampleParserBot(Bot):
  self.acknowledge_message()
  return
 
- event = Event()
+ event = Event(report) # copies feed.name, time.observation
  ... # implement the logic here
- event.add('additional_information', 'Nothing here')
+ event.add('source.ip', '127.0.0.1')
+ event.add('extra', '{"os.name": "Linux"')
 
  self.send_message(event)
  self.acknowledge_message()
@@ -336,7 +339,7 @@ class TestExampleParserBot(test.BotTestCase, unittest.TestCase): # adjust test
  @classmethod
  def set_bot(cls):
  cls.bot_reference = ExampleParserBot # adjust bot class name
- cls.default_input_message = json.dumps(EXAMPLE_EVENT) # adjust source of the example event
+ cls.default_input_message = EXAMPLE_EVENT # adjust source of the example event (dict)
 
  # This is an example how to test the log output
  def test_log_test_line(self):