-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsers for Bambenek Consulting and Netlab 360 OSINT feeds #772
Conversation
Current coverage is 75.23% (diff: 92.50%)@@ master #772 diff @@
==========================================
Files 206 217 +11
Lines 7849 8031 +182
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 5557 6042 +485
+ Misses 2292 1989 -303
Partials 0 0
|
Parsers have now been added to this pull request for Netlab 360 Magnitude EK and DGA feeds. |
Hi thanks for your contribution. I've a question concerning your parsers: Why are all IPs/Domains written as "destination" and not as "source"? |
For these I have written them as the destination due to the relation to On Mon, Nov 14, 2016 at 5:45 AM, Dustin Demuth [email protected]
Registered Linux User # 379282 |
With putting these into a production state I found that Intelmq was filtering out some of the events. I will close this request and submit a new request once the issues have been worked out so that the proper information is recorded. |
First, thanks for your contribution. I added some comments inline, most of them apply to all bots of course. Once you think it's ready, I will try it with real data too. You can reuse this PR, just push your fixes here (you could even overwrite the history). |
|
||
event = Event(report) | ||
|
||
event.add('destination.ip', row_split[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be source.ip
, see https:/certtools/intelmq/blob/master/docs/Data-Harmonization.md#classification-1 (the second table and the text below).
@@ -0,0 +1,42 @@ | |||
# -*- coding: utf-8 -*- | |||
""" | |||
http://osint.bambenekconsulting.com/feeds/c2-dommasterlist.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also document it (with more details) at docs/Feeds.md
|
||
class Bambenekc2dommasterlistParserBot(Bot): | ||
|
||
def process(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the ParserBot, see https:/certtools/intelmq/blob/master/docs/Developers-Guide.md#parsers
This provides much better error handling.
As the iteration over rows is already implemented, we only need lines 25-35 + a yield in parse_lines
.
Example:
def parse_line(self, row, report): |
Sebix, Thanks again for the tips. The requested changed have been made and the bug that I found this morning corrected. |
if FQDN.is_valid(lvalue[1]): | ||
event.add('source.fqdn', lvalue[1]) | ||
else: | ||
event.add('source.ip', lvalue[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is uncovered by the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed since this feed only has FQDN's and not IP addresses. It may have been carried over from the dga feed.
event.add('source.ip', lvalue[1]) | ||
|
||
event.add('raw', line) | ||
event.add('classification.type', 'malware') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it a c&c?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, yes it is, not sure why I typed malware. Corrected.
event.add('time.source', lvalue[2] + " UTC") | ||
event.add('event_description.url', lvalue[3]) | ||
event.add('classification.type', 'c&c') | ||
event.add('status', 'online') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Puh, not sure if this field is intended for a status reported by the source. @aaronkaplan can you comment on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, according to the DHO that's for what it is. However, we have not clearly defined valid values in the DHO for this field ;-) Means: we might have to refactor this later.... But it's ok for now IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we proceed with this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we proceed with this now?
It's fine.
event.add('event_description.text', lvalue[1]) | ||
event.add('time.source', lvalue[2] + " 00:00 UTC") | ||
event.add('event_description.url', lvalue[3]) | ||
event.add('classification.type', 'ransomware') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description of ransomware is This IOC refers to a specific type of compromized machine, where the computer has been hijacked for ransom by the criminals. but the description of the feed says "known DGA generated
domains used by malware".
Please have a look @aaronkaplan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this even might change over time ... I have no good solution for this at the moment.
Currently it seems to cover ransomeware DGA domains, if I understand it correctly.
But... I think we should call it "dga domain" since that is what we are actually addressing with this feed. ("source.fqdn")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I change the classification.type to "dga domain" and also make the needed changes to harmonization.py to allow dga domain and also map it to Malicious Code in the Taxonomy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please :)
|
||
event.add('classification.identifier', lvalue[0].lower()) | ||
event.add('time.source', | ||
datetime.utcfromtimestamp(int(lvalue[1])).strftime('%Y-%m-%dT%H:%M:%S+00:00')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a function in the libs for this: https:/certtools/intelmq/blob/master/intelmq/lib/harmonization.py#L211 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into this more, before making the change to what it is now I kept getting an error in the tests about being out of range or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the implementation is incomplete or erroneous, I'd like to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have had something wrong when I first tried it. It is working now.
|
||
### Magnitude EK Feed | ||
|
||
Status: Unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU the upstream description (http://data.netlab.360.com/ek), this feed lists URLs with exploits? Please be more detailed here.
Parsers for Bambenek Consulting and Netlab 360 OSINT feeds Signed-off-by: Sebastian Wagner <[email protected]>
I've created the required parsers for the feeds from Bambenek Consulting; C2 IP Feed, C2 Domain Feed, and DGA Domain Feed. The required test scripts are also included. Please let me know if you have any questions.