[RFC] Introduce `email` field set - stage 2 #1593

ebeahan · 2021-08-25T23:40:41Z

Summary

Continuing onto Stage 2 with this proposal to introduce the email.* field set to the schema.

Stage 2 (Candidate) Criteria:

Opened pull request for this draft revising the existing proposal
Completed field definitions
Included a real-world example source document
Identifies scope of impact of changes to ingestion mechanisms (e.g., beats/logstash), usage mechanisms (e.g., Kibana applications, detections), and the ECS project (e.g., docs, tooling)
Subject matter experts weighed in on the technical utility of field definitions in the pull request

Preview of markdown proposal

ebeahan · 2021-08-25T23:43:07Z

Opening PR to capture any feedback or suggestions around the proposed set of email.* fields.

wasserman · 2021-09-20T19:19:02Z

In regards to Display Name, you could consider some of these options:

Allow for name/address pairs in all email fields instead of just the email address itself.
Permit RFC 5322 Email formats like First Last <[email protected]>. Then a keyword with a normalizer could index just the emails while still keeping the display names intact. Of course the names wouldn't be searchable.
Multi-fields with some variation of this could work and allow for some form of the keyword and text fields.

I personally ran with option 2. I was hoping to leverage the uax_url_email tokenizer, but I had to settle for a simple regex.

"normalizer": {
    "email_normalizer": {
      "type": "custom",
      "char_filter": [
        "email_filter"
      ]
    }
  },
  "char_filter": {
    "email_filter": {
      "type": "pattern_replace",
      "pattern": ".*?<([^>]+)>",
      "replacement": "$1"
    }
  }

ebeahan · 2021-11-18T22:01:00Z

Proposed fields now include arrays of objects with both the email address and display name for the to, cc, and bcc recipients. The display_name and address fields have been added under email.from.

Field name	Data type
`email.from.address`	keyword
`email.from.display_name`	keyword
`email.to`	nested
`email.to.address`	keyword
`email.to.display_name`	keyword
`email.subject`	keyword
`email.cc`	nested
`email.cc.address`	keyword
`email.cc.display_name`	keyword
`email.bcc`	nested
`email.bcc.address`	keyword
`email.bcc.display_name`	keyword

djptek

LGTM!

As an aside, when this is merged I'd like to use it as an example to update the docs around nested type, as the difference between email.from and email.to illustrates this perfectly

rfcs/text/0010-email.md

…il.reply_to.address` for consistency with the other `*.address` fields

ebeahan · 2021-11-22T22:54:55Z

The limitations around building visualizations using type nested fields make me question if using nested for the various email sender/recipient fields is the best direction.

I'm going to think this over a bit more and iterate on the proposed fields.

djptek · 2021-11-23T09:47:02Z

@ebeahan re nested type - for the legal use case nested type is not necessary as that would center on *.address

Regarding visualisation of emails, we'd probably need a Chord or Sankey diagram - a quick look in Kibana Issues does not have those on the radar at this point in time though there is a Sankey example using Vega on the Blog - again this would probably center on *.address. so not nested.

There might also be value in running address and display name through ML, that would probably be incompatible with nested type

The only use case off the top of my head where we might want nested might be Spoof detection, however, given the cardinality this would probably be best done using ML for the heavy lifting and then manual inspection of anomalies, so you could work around that

@peasead Do you have a specific use case/query/aggs in mind where we'd need to leverage nested type?

ebeahan · 2021-11-30T19:32:08Z

@jamiehynds as the sponsor, can you take a look at how the email.* fields proposal is shaping up?

peasead · 2021-12-01T16:12:38Z

@peasead Do you have a specific use case/query/aggs in mind where we'd need to leverage nested type?

Thanks for your patience.

@djptek I don't have anything specific. I wasn't sure if there'd be a use case to query nested objects independently, but thinking more, I'm not sure that'd be needed.

wasserman · 2021-12-01T17:19:29Z

FYI, nested was just thought to be a useful way to preserve the relationships between display names and emails. Multi-fields or a normalizer could work too. Ultimately any smart way to not lose the display names in the process since it could be valuable just to be able to see, if nothing else.

djptek

LGTM!

rfcs/text/0010-email.md

djptek · 2021-12-02T09:09:17Z

Thanks @wasserman

nested was just thought to be a useful way to preserve the relationships between display names and emails

Looking at one use case that leverages this relationship, e.g. checking for spoofing of a known address, we'd need to involve additional data - a table/index defining the a priori relationship between a defined and countable set of address(es) and the legitimate display_name(s) for these address(es). This would require additional logic, including a join against that table/index. Elasticsearch joins are generally best implemented at ingest time rather than query time, so this use case could perhaps be addressed by building this into the ingest pipeline, or by reindexing a subset of data related to specific address(es) of interest.

Conversely, where the relationship between display_name and address is not explicitly defined a priori, there is no upper limit to the number of address(es) in the related use cases so it may be preferable to avoid nested_type to ensure the most performant solution.

ebeahan · 2021-12-15T21:06:58Z

While implementing the email.* field set into the schema, reusing hash.* at email.attachments.file.hash.* felt like a better approach that aligns with the existing file.hash.* fields and adds a few additional hash fields for any attachment: #1688 (comment).

I'll capture these details fully in the proposal doc during stage 3.

ebeahan added the RFC label Aug 25, 2021

ebeahan self-assigned this Aug 25, 2021

ebeahan mentioned this pull request Nov 9, 2021

[meta] Add support for email in ECS #939

Closed

ebeahan mentioned this pull request Nov 17, 2021

[RFC] Email - Stage 1 Proposal #1219

Merged

8 tasks

ebeahan added 4 commits November 18, 2021 13:39

set proposal doc to target stage 2

4a28c5f

update candidate field set to match current proposal

9a935fc

set pr number in link

f715bd8

propose nested objects for tracking email address with display name

aeb8b7e

ebeahan force-pushed the rfc/0010/stage-2 branch from a01b7bc to aeb8b7e Compare November 18, 2021 19:40

ebeahan added 9 commits November 18, 2021 13:48

also add display_name to from

b5df3f2

update examples to use new from, to, cc, and bcc fields

5a2cd59

typo

e260c2d

removing unneeded comments

924b4e7

clean up examples

2a40ceb

clean up grammar

b889936

capturing scope of impact

9d309de

address concerns

de350cf

typo in the field table

5ffd9d9

ebeahan requested review from jamiehynds, a team and devonakerr November 18, 2021 22:01

ebeahan mentioned this pull request Nov 18, 2021

Mimecast_Elastic integration elastic/integrations#2157

Merged

4 tasks

djptek approved these changes Nov 19, 2021

View reviewed changes

devonakerr requested review from peasead and removed request for devonakerr November 19, 2021 13:50

peasead reviewed Nov 19, 2021

View reviewed changes

rfcs/text/0010-email.md Outdated Show resolved Hide resolved

kares mentioned this pull request Nov 22, 2021

Feat: ECS compatibility (RFC email fields) logstash-plugins/logstash-input-imap#56

Draft

3 tasks

djptek and others added 6 commits November 22, 2021 15:37

Convert from to a Nested Object and rename email.reply_to to `ema…

0133a74

…il.reply_to.address` for consistency with the other `*.address` fields

include email.reply_to.display_name

0ac04cf

update examples to make from values into arrays

69c4579

add new fields to the proposed field defs

c57cc32

Merge branch 'main' into rfc/0010/stage-2

2e69ced

tweak examples

5374eec

ebeahan added 7 commits November 30, 2021 11:02

revert nested address fields

964b0b4

update display name resolution

ef4bf13

s/Previously/Initially

00a1fc6

fix reply_to

6c106ae

tidy up table

cee119d

fix columns

d8b9ec7

Merge branch 'main' into rfc/0010/stage-2

43b23cf

ebeahan requested a review from a team November 30, 2021 19:30

djptek and others added 2 commits December 2, 2021 09:31

Merge branch 'main' into rfc/0010/stage-2

e242ace

remove spaces lines 85 to 89

e11eb88

djptek approved these changes Dec 2, 2021

View reviewed changes

rfcs/text/0010-email.md Outdated Show resolved Hide resolved

ebeahan added 2 commits December 13, 2021 15:59

Merge branch 'main' into rfc/0010/stage-2

07977a2

set date for stage 2

7249732

ebeahan merged commit 2b55ff8 into elastic:main Dec 13, 2021

ebeahan mentioned this pull request Dec 14, 2021

RFC 0010 email.* field set - stage 2 changes #1688

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Introduce `email` field set - stage 2 #1593

[RFC] Introduce `email` field set - stage 2 #1593

ebeahan commented Aug 25, 2021 •

edited

Loading

ebeahan commented Aug 25, 2021

wasserman commented Sep 20, 2021 •

edited

Loading

ebeahan commented Nov 18, 2021

djptek left a comment

ebeahan commented Nov 22, 2021

djptek commented Nov 23, 2021

ebeahan commented Nov 30, 2021

peasead commented Dec 1, 2021

wasserman commented Dec 1, 2021 •

edited

Loading

djptek left a comment

djptek commented Dec 2, 2021

ebeahan commented Dec 15, 2021

[RFC] Introduce email field set - stage 2 #1593

[RFC] Introduce email field set - stage 2 #1593

Conversation

ebeahan commented Aug 25, 2021 • edited Loading

Summary

ebeahan commented Aug 25, 2021

wasserman commented Sep 20, 2021 • edited Loading

ebeahan commented Nov 18, 2021

djptek left a comment

Choose a reason for hiding this comment

ebeahan commented Nov 22, 2021

djptek commented Nov 23, 2021

ebeahan commented Nov 30, 2021

peasead commented Dec 1, 2021

wasserman commented Dec 1, 2021 • edited Loading

djptek left a comment

Choose a reason for hiding this comment

djptek commented Dec 2, 2021

ebeahan commented Dec 15, 2021

[RFC] Introduce `email` field set - stage 2 #1593

[RFC] Introduce `email` field set - stage 2 #1593

ebeahan commented Aug 25, 2021 •

edited

Loading

wasserman commented Sep 20, 2021 •

edited

Loading

wasserman commented Dec 1, 2021 •

edited

Loading