Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sieve Filter expert #1083

Merged
43 commits merged into from
Dec 18, 2017
Merged

Sieve Filter expert #1083

43 commits merged into from
Dec 18, 2017

Conversation

digihash
Copy link
Contributor

This is for the Sievebot which was written in collaboration with @antoinet and @helderfernandes1279.

The sieve bot is used to filter and/or modify events based on a set of rules. The rules are specified in an external configuration file and with a syntax similar to the Sieve language used for mail filtering.

Each rule defines a set of matching conditions on received events. Events can be matched based on keys and values in the event. If the processed event matches a rule's conditions, the corresponding actions are performed. Actions can specify whether the event should be kept or dropped in the pipeline (filtering actions) or if keys and values should be changed (modification actions).

For more information, see README.md under "intelmq/bots/experts/sieve/README.md"

@ghost ghost changed the title Develop Sieve Filter expert Aug 28, 2017
@ghost ghost added this to the 1.1.0 milestone Aug 28, 2017
@ghost ghost added component: bots feature Indicates new feature requests or new features labels Aug 28, 2017
@antoinet
Copy link
Contributor

I fixed the codestyle checks in https:/antoinet/intelmq/commit/17c0d4d1d70903ff08f00a33ab6cb1a3276dc220
@digihash: can you add this commit to the PR?

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues pointed out by the tests:

Will look at the code once these issues are resolvend.

@ghost
Copy link

ghost commented Aug 28, 2017

@antoinet Thanks. If you push to antoinet:develop, the commits are part of this PR

Antoine Neuenschwander added 2 commits August 28, 2017 10:55
See: #1083 (review)
* removed tabs in BOTS
* safely import textx package
* added `skip_exotic` decorator to sievebot test class
* removed package `ipaddress` from REQUIREMENTS.txt

Other changes:
* removed `print` statement from expert.py
* Fixed BOTS format
* replaced re.fullsearch with re.search (method unavailable in python < 3.4)
* removed Enum definition (unavailable in python < 3.4)
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failing tests for cymru can be ignored

Looks good overall.

I guess there is some room for optimization as some rules are evaulated every time.

See also https:/antoinet/intelmq/pull/15 fixing some of the issues.

I am not very happy about just another configuration language in intelmq but I don't have better solution currently.

Parameters:
file: string
"""
from __future__ import unicode_literals
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support python3 only, this is obsolete. Also, you may find isort useful.

@@ -572,6 +572,13 @@
"module": "intelmq.bots.experts.taxonomy.expert",
"parameters": {}
},
"Sievebot": {
"description": "Sievebot is the bot responsible to filter and modify intelMQ events.",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be more verbose, say that the sieve language is used. E.g. 'This bot filters and modifies events based on a sieve-based language.'

from intelmq.lib.bot import Bot

try:
import textx
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imported but unused. You can shorten it by doing:

try:
    from textx.metamodel import metamodel_from_file
    from textx.exceptions import TextXError, TextXSemanticError
except ImportError:
    metamodel_from_file = None

and then

if metamodel_from_file is None:
    raise ...

except TextXError as e:
self.logger.error('Could not process sieve grammar file. Error in (%d, %d).', e.line, e.col)
self.logger.error(str(e))
self.stop()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of these three lines simply do raise ValueError('...'), everything else is handled by the Bot-class.

@@ -0,0 +1,2 @@
*.dot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they might have used graphviz :) Anyway, yes. Not needed here.

return self.process_ip_range_match(match.key, match.range, event)
elif match.__class__.__name__ == 'Expression':
return self.match_expression(match, event)
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary


try:
addr = ipaddress.ip_address(event[key])
except ValueError:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this happen in your tests? It should never, actually

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the event type is not enforced (any key can be specified as the left hand side of the << operator), the value is validated here. We did discuss whether or not to restrict this operator to keys with a type corresponding to an IP address (https:/antoinet/intelmq/issues/11), and I just realized I didn't implement it the suggested way.

@@ -572,6 +572,13 @@
"module": "intelmq.bots.experts.taxonomy.expert",
"parameters": {}
},
"Sievebot": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bot is called sieve everywhere else.

@@ -0,0 +1,3 @@
if comment == 'add field' {
add destination.ip="150.50.50.10"
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newlines everywhere

return False

def process_numeric_operator(self, lhs, op, rhs):
return eval(str(lhs) + op + str(rhs)) # TODO graceful error handling
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible without eval? At least a strict checking of the arguments should be done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@ghost ghost self-assigned this Aug 28, 2017
@ghost ghost requested a review from aaronkaplan August 28, 2017 15:20
Copy link
Member

@aaronkaplan aaronkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had quite a few comments after more careful review. but overall this is a fantastic addition!

Most welcome enhancement would be to be able to specify the output pipeline with this sieve filter.

Then you could do a type of switch-case statement where the data should be flowing to.

@@ -0,0 +1,2 @@
*.dot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they might have used graphviz :) Anyway, yes. Not needed here.

rule's conditions, the corresponding actions are performed. Actions can specify
whether the event should be kept or dropped in the pipeline (filtering actions)
or if keys and values should be changed (modification actions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to specify a specific outgoing pipeline as an action?

Example:

  if $condition
  then
    send-to "pipeline-A";
  fi

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not supported by the core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our instance, we are already (ab)using a newly introduced key called keep (boolean), which can be set anywhere in the pipeline. Downstream, events with keep==False are dropped.

This can be generalized by introducing a new key, e.g. route, with values that describe a specific destination. But this requires to put filter bots upstream of every destination such that only the corresponding events are kept.

If a routing model was provided in the core, we would certainly have use-cases for it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why no issue existed for this which and created #1088 for it

## Parameters

The sieve bot takes only one parameter:
* `file` - filesystem path the the sieve file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar: file system path of the sieve file

drop // aborts processing of subsequent rules and drops the event.
}

if source.ip << 192.0.0.0/24 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the docs below, IP addresses must be quoted.


```if feed.name != 'acme-security' || feed.accuracy == 100 { ... }```

* `:contains` matches on substrings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wish list: if we have contains, out of reasons of symmetry , may we have :notcontains ?


* `keep` marks the event to be forwarded to the next bot in the pipeline
(same as the default behaviour), but in addition the sieve file processing is
interrupted upon reaching this action.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So implicitly here you are saying that the default behaviour without keep is to contineue with the matching ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the keep and drop actions explicitly interrupt the processing of the rules.

Without a mention of keep or drop, processing of subsequent rules continues. Afaik this is the same behavior as mail sieve.


### Comments

Comments may be used in the sieve file: all characters after `//` and until the end of the line will be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wish-list: // or # as a comment character.


Use the following command to validate your sieve files:
```
$ intelmq.bots.experts.sieve.validator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be called implicitly when the intelmqctl configtest gets called.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intelmqctl check

return False

def process_numeric_operator(self, lhs, op, rhs):
return eval(str(lhs) + op + str(rhs)) # TODO graceful error handling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -0,0 +1,3 @@
if source.ip << ['192.0.0.0/24', '169.254.0.0/16'] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not seem to be documented in the docu. I mean that the << operator also works on lists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, every operator works on lists. I will point that out.

@aaronkaplan
Copy link
Member

I move that we merge this soon!

- Remove .gitignore for .dot files
- Typo and erroneous syntax in README.md
- codestyle in validator.py
@codecov-io
Copy link

codecov-io commented Aug 30, 2017

Codecov Report

Merging #1083 into develop will increase coverage by 2.38%.
The diff coverage is 90.13%.

@@             Coverage Diff             @@
##           develop    #1083      +/-   ##
===========================================
+ Coverage    73.93%   76.31%   +2.38%     
===========================================
  Files          217      220       +3     
  Lines         9417    10338     +921     
  Branches      1283     1363      +80     
===========================================
+ Hits          6962     7889     +927     
+ Misses        2176     2142      -34     
- Partials       279      307      +28
Impacted Files Coverage Δ
...elmq/tests/bots/experts/cymru_whois/test_expert.py 100% <ø> (+40%) ⬆️
intelmq/bots/experts/sieve/validator.py 0% <0%> (ø)
intelmq/bin/intelmqctl.py 11.05% <0%> (-0.07%) ⬇️
intelmq/tests/bots/experts/sieve/test_expert.py 100% <100%> (ø)
intelmq/bots/experts/asn_lookup/expert.py 62.5% <25%> (ø) ⬆️
intelmq/bots/outputs/file/output.py 74.07% <28.57%> (ø) ⬆️
intelmq/lib/bot.py 62.69% <75%> (ø) ⬆️
intelmq/bots/experts/sieve/expert.py 84.82% <84.82%> (ø)
intelmq/bots/experts/abusix/expert.py 39.28% <0%> (-46.43%) ⬇️
intelmq/tests/bots/experts/abusix/test_expert.py 56.66% <0%> (-43.34%) ⬇️
... and 21 more

@ghost ghost mentioned this pull request Sep 11, 2017
@antoinet
Copy link
Contributor

@wagner-certat and @aaronkaplan: I think I addressed all review points so far up to the :in operator, used to check if an ip address matches a list of values, e.g.:
source.ip :in ['127.0.0.1', '192.168.1.1', '10.0.0.1']
As mentioned above, most of the operators support a list in the right-hand-side by default, e.g.:
source.ip != ['127.0.0.1', '192.168.1.1', ...]
So it is a design question:

  • we can introduce the :in operator (and by symmetry :notin) and drop list-support for all other operators (makes it simpler)
  • :in is just an alias for == (and :notin an alias for !=), and we keep list-support for all other operators.
    What do you all think?

@ghost
Copy link

ghost commented Nov 30, 2017

Concerning the :in and == and lists...

If we one day support multiple source IP addresses or similar (with the IDEA format or whatever) everything blows up anyway... So I am in favour of just using the currently implemented behaviour with ==.

@aaronkaplan What do you think?
And @dmth maybe you have the leisure to comment too? :D

@SYNchroACK
Copy link
Contributor

Concerning the :in and == and lists...

If we one day support multiple source IP addresses or similar (with the IDEA format or whatever) everything blows up anyway... So I am in favour of just using the currently implemented behaviour with ==.

I understand, its well pointed @wagner-certat . However, I think we are far way from that day and also, the day we change to multi source IP addresses we need to change multiple experts bots. IMHO, we should keep with our principle of using one key = value, therefore, the presented code is fine.
Makes sense?

@ghost ghost merged commit c8f6134 into certtools:develop Dec 18, 2017
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: bots feature Indicates new feature requests or new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants