Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sieve Filter expert #1083

Merged
43 commits merged into from
Dec 18, 2017
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
24ab6ca
Added initial sieve metamodel
Aug 8, 2017
6e6465d
Refined sieve metamodel
Aug 8, 2017
0e43dd8
Added first draft of sieve/expert.py with unittests (not functional)
Aug 9, 2017
a187b3f
Implemented parsing the sieve file (partially)
Aug 9, 2017
9eb33c6
Sieve bot added to BOTS registration file
helderfernandes1279 Aug 9, 2017
54c05d0
Add comment regex in model
digihash Aug 9, 2017
e3eaa46
Remove debugging prints
digihash Aug 10, 2017
d06b197
Fix antoinet/intelmq#5: added error and debug logging messages
Aug 16, 2017
873a9fb
Implemented antoinet/intelmq#6: added keep action: stops processing a…
Aug 16, 2017
8446988
Add else and elsif clauses in rules, fixes antoinet/intelmq#4
Aug 17, 2017
fda9cad
Add new test to test ADD action
digihash Aug 17, 2017
99b6cc2
Document usage/features in README.md, antoinet/intelmq#9
Aug 18, 2017
6b2759f
Implementation of Numeric match value list and String match value list
helderfernandes1279 Aug 17, 2017
706819f
Add tests for REMOVE action
digihash Aug 17, 2017
f5cc29c
Added section about comments, antoinet/intelmq#9
Aug 18, 2017
3f48d8f
Added command to validate sieve files, antoinet/intelmq#10
Aug 18, 2017
40dda0d
Add 'ip in netblock' match, antoinet/intelmq#11
Aug 21, 2017
76bc0dd
Update README.md
antoinet Aug 18, 2017
b73c48f
Added test missing test cases antoinet/intelmq#8
Aug 22, 2017
8f1881c
Added REQUIREMENTS.txt with depedencies, antoinet/intelmq#13
Aug 25, 2017
17c0d4d
Fixed codestyle errors.
Aug 28, 2017
1176a85
Fixed first PR review outcomes, antoinet/intelmq#14
Aug 28, 2017
cda7c7d
Fix failing tests, antoinet/intelmq#14
Aug 28, 2017
3095d34
DOC: add sieve filter expert to Bots.md
Aug 28, 2017
61b4d0c
ENH: sieve filter expert validator is executable
Aug 28, 2017
6d7bca0
MAINT: sieve filter expert: imports
Aug 28, 2017
ea03ee2
BUG: sieve filter expert error handling
Aug 28, 2017
8276b5e
MAINT: sieve expert bot: shorten some code
Aug 28, 2017
6d50608
TST: sieve filter expert: add missing newlines
Aug 28, 2017
1b148d5
Merge branch 'develop' into antoinet-develop
Aug 28, 2017
3701bda
Merge pull request #15 from wagner-certat/antoinet-develop
antoinet Aug 29, 2017
f590a95
Review quick fixes, antoinet/intelmq#14
Aug 30, 2017
b46b642
antoinet/intelmq#18: rename action `modify` to `update`.
Sep 7, 2017
e385bfa
Re-enable cymru-whois tests (were disabled because service was tempor…
Sep 7, 2017
2d5ac01
ENH+DOC: check function for bots
Sep 11, 2017
764015b
ENH: check method for asn_lookup and file output
Sep 11, 2017
7ddce57
Merge branch 'call-bot-check' into develop
Sep 18, 2017
e878cc9
check method for sieve filter, antoinet/intelmq#20
Sep 29, 2017
e2f6a8c
Use '//' or '#' for line comments, antoinet/intelmq#19
Sep 29, 2017
342b483
implemented :notcontains operator, antoinet/intelmq#17
Sep 29, 2017
7ee859b
Strict checking for numeric match while using eval(), antoinet/intelm…
Oct 5, 2017
1419da5
Codestyle fixes
Oct 5, 2017
c8f6134
Validate IP addresses according to harmonization, antoinet/intelmq#11
Oct 16, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions intelmq/bots/BOTS
Original file line number Diff line number Diff line change
Expand Up @@ -572,6 +572,13 @@
"module": "intelmq.bots.experts.taxonomy.expert",
"parameters": {}
},
"Sievebot": {
"description": "Sievebot is the bot responsible to filter and modify intelMQ events.",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be more verbose, say that the sieve language is used. E.g. 'This bot filters and modifies events based on a sieve-based language.'

"module": "intelmq.bots.experts.sieve.expert",
"parameters": {
"file" :"/opt/intelmq/var/lib/bots/filter.sieve"
}
},
"Tor Nodes": {
"description": "Tor Nodes is the bot responsible to check if an IP is an Tor Exit Node.",
"module": "intelmq.bots.experts.tor_nodes.expert",
Expand Down
2 changes: 2 additions & 0 deletions intelmq/bots/experts/sieve/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.dot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they might have used graphviz :) Anyway, yes. Not needed here.


166 changes: 166 additions & 0 deletions intelmq/bots/experts/sieve/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Sieve Bot

The sieve bot is used to filter and/or modify events based on a set of rules. The
rules are specified in an external configuration file and with a syntax similar
to the [Sieve language](http://sieve.info/) used for mail filtering.

Each rule defines a set of matching conditions on received events. Events can be
matched based on keys and values in the event. If the processed event matches a
rule's conditions, the corresponding actions are performed. Actions can specify
whether the event should be kept or dropped in the pipeline (filtering actions)
or if keys and values should be changed (modification actions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to specify a specific outgoing pipeline as an action?

Example:

  if $condition
  then
    send-to "pipeline-A";
  fi

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not supported by the core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our instance, we are already (ab)using a newly introduced key called keep (boolean), which can be set anywhere in the pipeline. Downstream, events with keep==False are dropped.

This can be generalized by introducing a new key, e.g. route, with values that describe a specific destination. But this requires to put filter bots upstream of every destination such that only the corresponding events are kept.

If a routing model was provided in the core, we would certainly have use-cases for it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why no issue existed for this which and created #1088 for it


## Examples

The following excerpts illustrate some of the basic features of the sieve file
format:

```
if :exists source.fqdn {
keep // aborts processing of subsequent rules and forwards the event.
}


if :notexists source.abuse_contact || source.abuse_contact =~ '.*@example.com' {
drop // aborts processing of subsequent rules and drops the event.
}

if source.ip << 192.0.0.0/24 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validator says for this line:

ERROR: Error in (10, 17).
ERROR: Expected STRING or '[' at position test.sieve:(10, 17) => 'rce.ip << *192.0.0.0/'.

And why is there an additional * in the output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The * is a marker that refers to the position in the expression where the error occurred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the docs below, IP addresses must be quoted.

add! comment = 'bogon'
}

if classification.type == ['phishing', 'malware'] && source.fqdn =~ '.*\.(ch|li)$' {
add! comment = 'domainabuse'
keep
} elsif classification.type == 'scanner' {
add! comment = 'ignore'
drop
} else {
remove comment
}
```


## Parameters

The sieve bot only takes one parameter:
* `file` - filesystem path the the sieve file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar: file system path of the sieve file



## Reference

### Sieve File Structure

The sieve file contains an arbitrary number of rules of the form:

```
if EXPRESSION {
ACTIONS
} elif EXPRESSION {
ACTIONS
} else {
ACTIONS
}
```


### Expressions

Each rule specifies on or more expressions to match an event based on its keys
and values. Event keys are specified as strings without quotes. String values
must be enclosed in single quotes. Numeric values can be specified as integers
or floats and are unquoted. IP addresses and network ranges (IPv4 and IPv6) are
specified with quotes. Following operators may be used to match events:

* `:exists` and `:notexists` match if a given key exists, for example:

```if :exists source.fqdn { ... }```

* `==` and `!=` match for equality of strings and numbers, for example:

```if feed.name != 'acme-security' || feed.accuracy == 100 { ... }```

* `:contains` matches on substrings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wish list: if we have contains, out of reasons of symmetry , may we have :notcontains ?


* `=~` matches strings based on the given regex. `!~` is the inverse regex
match.

* Numerical comparisons are evaluated with `<`, `<=`, `>`, `>=`.

* `<<` matches if an IP address is contained in the specified network range:

```if source.ip << '10.0.0.0/8' { ... }```

* Values to match against can also be specified as list, in which case any one
of the values will result in a match:

```if source.ip == ['8.8.8.8', '8.8.4.4'] { ... }```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be problematic if we, one day, support lists for data fields. We currently support dictionaries (extra, and in the future possibly some others too) and lists would fit perfectly for e.g abuse_contact. Using :in as keyword would be more explicit.

@aaronkaplan what's your opinion on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is: every operator, not just ==, supports a list as a right hand side. If == is renamed to :in, what should we do with the other operators?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may we change the grammar here to :in $list ? Ah, same comment as sebix :)


In this case, the event will match if it contains a key `source.ip` with
either value `8.8.8.8` or `8.8.4.4`.


### Actions

If part of a rule matches the given conditions, the actions enclosed in `{` and
`}` are applied. By default, all events that are matched or not matched by rules
in the sieve file will be forwarded to the next bot in the pipeline, unless the
`drop` action is applied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possibility for a default action ?


* `add` adds a key value pair to the event. This action only applies if the key
is not yet defined in the event. If the key is already defined, the action is
ignored. Example:

```add comment = 'hello, world'```

* `add!` same as above, but will force overwrite the key in the event.

* `modify` modifies an existing value for a key. Only applies if the key is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-picking here: may we rename modify to update ?

already defined. If the key is not defined in the event, this action is ignored.
Example:

```modify feed.accuary = 50```

* `remove` removes a key/value from the event. Action is ignored if the key is
not defined in the event. Example:

```remove extra.comments```

* `keep` marks the event to be forwarded to the next bot in the pipeline
(same as the default behaviour), but in addition the sieve file processing is
interrupted upon reaching this action.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So implicitly here you are saying that the default behaviour without keep is to contineue with the matching ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the keep and drop actions explicitly interrupt the processing of the rules.

Without a mention of keep or drop, processing of subsequent rules continues. Afaik this is the same behavior as mail sieve.


* `drop` marks the event to be dropped. The event will not be forwarded to the
next bot in the pipeline. The sieve file processing is interrupted upon
reaching this action. No other actions may be specified besides the `drop`
action within `{` and `}`.


### Comments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain (maybe I missed it and you did) if

  • the order is relevant and
  • does the checking stop or continue after a match?


Comments may be used in the sieve file: all characters after `//` and until the end of the line will be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wish-list: // or # as a comment character.



## Validating a sieve file

Use the following command to validate your sieve files:
```
$ python intelmq/bots/experts/sieve/validator.py -h
usage: validator.py [-h] sievefile

Validates the syntax of sievebot files.

positional arguments:
sievefile Sieve file

optional arguments:
-h, --help show this help message and exit
```

## Installation

To use this bot, you need to install the required dependencies:
```
$ pip install -r REQUIREMENTS.txt
```
1 change: 1 addition & 0 deletions intelmq/bots/experts/sieve/REQUIREMENTS.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
textX>=1.5.1
Empty file.
Loading