Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Email - Stage 1 Proposal #1219

Merged
merged 46 commits into from
Aug 16, 2021
Merged

[RFC] Email - Stage 1 Proposal #1219

merged 46 commits into from
Aug 16, 2021

Conversation

P1llus
Copy link
Member

@P1llus P1llus commented Jan 12, 2021

Moving the Email RFC to Stage 1

Preview of the RFC proposal

Criteria for Stage 1

  • Opened pull request for this proposal revising the existing strawperson
  • Identified "sponsor" at Elastic who will participate in RFC process and take ownership of the change after the process completes
  • Outlined initial field definitions
  • High-level description of examples of usage
  • High-level description of example sources of data
  • Identified potential concerns and implementation challenges/complexity
  • Subject matter experts identified and weighed in on the high level utility of these changes in the pull request
  • ECS team weighed in on appropriateness of these changes in the pull request

@ebeahan ebeahan added the RFC label Jan 12, 2021
@ebeahan
Copy link
Member

ebeahan commented Jan 27, 2021

We've recently removed Stage 4 from the RFC process.

Updated proposal stages and their requirements: https://elastic.github.io/ecs/stages.html

This PR was initially targeting the now "legacy" stage two, and I'm thinking we update to target stage 1. Stage 1 (draft) fields will still be added as experimental fields as was done with legacy stage 2 fields.

Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @P1llus, for opening this. I listed some of the areas we want to go over and capture to move forward.

Field definitions

I suggest we replace any instance of wildcard with keyword since we've paused on introducing wildcard field support into ECS for now.

Can we create a directory using the RFC's number and add a YAML for the proposed set of field definitions? Here's an example: https:/elastic/ecs/tree/master/rfcs/text/0009

Example data

We've added a couple of great examples from M365, and it'd be great to capture one or two more examples from other email data sources.

Concerns

Are there any new concerns that have come up to be captured? Or, do any of the existing concerns have updates or resolutions we can capture?

Feedback

Any individuals or teams we should ask for their feedback around the approach, fields, etc.?

rfcs/text/0010-email.md Outdated Show resolved Hide resolved
rfcs/text/0010-email.md Show resolved Hide resolved
@epixa epixa changed the title Moving Email RFC to Stage 2 [RFC] Email - Stage 2 Proposal Apr 22, 2021
@ebeahan ebeahan changed the title [RFC] Email - Stage 2 Proposal [RFC] Email - Stage 1 Proposal May 28, 2021
@ebeahan
Copy link
Member

ebeahan commented May 28, 2021

ECS team will review and determine the next steps to continue moving this forward.

@jamiehynds, are you still willing to act as a sponsor?

@jamiehynds
Copy link
Contributor

Thanks @ebeahan - yes, I'll continue to sponsor the RFC.

@ebeahan ebeahan requested review from jamiehynds, devonakerr and a team July 27, 2021 17:19
@ebeahan ebeahan self-assigned this Jul 27, 2021
| `email.bcc` | keyword (array) | The email address(es) of the blind carbon copy (CC) recipient(s) |
| `email.content_type` | keyword | Information about how the message is to be displayed. Typically a MIME type |
| `email.message_id` | wildcard | Unique identifier for the email message that refers to a particular version of a particular message |
| `email.reply_to` | keyword | Address that replies should be delivered to |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be clarified that this is referring to RFC5322.ReplyTo and is not the same as Return-Path or RFC5321.MailFrom

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ericwbentley

So, something like this:
| - | - | - |
| - | - | - |
| email.reply_to | keyword | Stores the email address provided in the "Reply-To" originator field |

Copy link

@ericwbentley ericwbentley Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go as far to explicitly state the RFC in the description to eliminate any confusion. Or at least the name of the field in the headers.

Suggested change
| `email.reply_to` | keyword | Address that replies should be delivered to |
| `email.reply_to` | keyword | Address that replies should be delivered to (RFC5322.ReplyTo) |

Or

Suggested change
| `email.reply_to` | keyword | Address that replies should be delivered to |
| `email.reply_to` | keyword | Address that replies should be delivered to (Reply-To)|

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to improve and clarify this point in a856c0a. Let me know what you think.

| `tls.*` | Used for TLS related information for the connection to for example a SMTP server over TLS |


| `email.from` | keyword | Stores the `from` email address |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be clarified that this is referring to the RFC5322.From or "Header From" address specifically

There should be another field for RFC5321.MailFrom "Envelope From" and/or Return-Path as RFC5321.MailFrom and RFC5322.From often differ

Copy link
Contributor

@peasead peasead Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ericwbentley

So, something like this:
| - | - | - |
| - | - | - |
| email.from | keyword | Stores the email address provided in the "From" originator field |

Reference: https://datatracker.ietf.org/doc/html/rfc5322#section-3.6.2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd reference the RFC in the description and add a second field for the envelope from address since they should be distinct from each other.

Suggested change
| `email.from` | keyword | Stores the `from` email address |
| `email.from` | keyword | Stores the header `from` email address (RFC5322.From) |
| `email.envelope_from` | keyword | Stores the envelope `from` email address (RFC5321.MailFrom) |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericwbentley Makes sense to reference RFC 5322 - thanks for suggesting the distinction.

Like I put in the Concerns section, right now, this proposal focuses on the IMF from RFC5322. However, I like the idea of also introducing an smtp field set later that focuses on details of the protocol (and other email protocols, if helpful).

I see smtp.* fields pairing well with email.*, but I want to avoid increasing the scope here too much. Open to feedback, though.

| `email.direction` | keyword | Direction of the message based on the sending and receiving domains |
| `email.x_mailer` | keyword | What application was used to draft and send the original email.

### Additional event categorization allowed values

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider adding another ID field, for example email.local-id? Exchange uses it's own IDs in addition to message-id for example internal-message-id or network-message-id. Other email servers will have their own session IDs as well.

@peasead
Copy link
Contributor

peasead commented Aug 2, 2021

What happened to this E-mail fieldset? #999

I see it was merged, but I don't see it in master.

@ebeahan
Copy link
Member

ebeahan commented Aug 2, 2021

@peasead This picks up on the same work from #999 that @P1llus @jamiehynds started. @P1llus has been focused on other commitments, and I'm helping to move the proposal forward.

The field changes proposed in #999 aren't anywhere in the schema yet. Once we have a consensus and merge this PR, the agreed changes will be added to the experimental schema.

@ebeahan
Copy link
Member

ebeahan commented Aug 4, 2021

will a source ever be an array of values?

Corrected

are cc and bcc fields arrays and not keywords?

The cc and bcc are arrays of keyword values.

how will we account for attachments?

I added a nested attachment object and updated the Proofpoint TAP sample mapping to include an attachment. Let me know what you think.

@ebeahan
Copy link
Member

ebeahan commented Aug 13, 2021

Thanks, all, for the comments so far! I think I've addressed all the outstanding feedback.

Would appreciate additional looks to see if this is set for stage one.

@kgeller @devonakerr @peasead @P1llus @jamiehynds

Copy link
Contributor

@peasead peasead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good for a Stage 1.

Copy link
Contributor

@kgeller kgeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@ebeahan
Copy link
Member

ebeahan commented Aug 17, 2021

I overlooked including field definition files for stage 1. I will include those with the stage 2 PR, but the proposed definitions are also now included in the experimental schema.

@JamisonWhite
Copy link

JamisonWhite commented Oct 8, 2021

Hi @ebeahan,
I just discovered this today (and the ECS RFC process, so I apologize in advance if this is out of line), but I'm concerned that to, from, cc, and bcc are treated as emails. These values usually include the display name as well.
Thanks Jamie
https://datatracker.ietf.org/doc/html/rfc5322#section-3.4 (edit updated to latest RFC)

@peasead
Copy link
Contributor

peasead commented Oct 12, 2021

Hi @ebeahan, I just discovered this today (and the ECS RFC process, so I apologize in advance if this is out of line), but I'm concerned that to, from, cc, and bcc are treated as emails. These values usually include the display name as well. Thanks Jamie https://datatracker.ietf.org/doc/html/rfc5322#section-3.4 (edit updated to latest RFC)

Thanks, Jamie.

How would you like to see them reflected and where would do you think that the actual [email protected] would go for the "to", "from", "cc", and "bcc"?

Something like this?

email.to.address: [email protected]
email.to.display_name: "User 1"
email.cc.address: [email protected]
email.cc.display_name: "User 2"
...

@JamisonWhite
Copy link

@peasead It's definitely on the right track and works for "from". However "to", "cc", and "bcc" can be a list of addresses and names, so maybe it would look like the email.attachments nested objects. I'm not sure about the naming standards. "to_addresses" seems long, but "tos" or "toes" doesn't seem right either.

field type description
email.to_addresses nested nested object of addresses
email.to_addresses.address keyword email address
email.to_addresses.display_name keyword email display name
same for "cc" and "bcc"

@wasserman
Copy link

wasserman commented Oct 18, 2021

@JamisonWhite @peasead In email the to/cc/bcc is always a singular label regardless of having multiples. I would vote not to include the _addresses part since this would be understood and a bit redundant. Hopefully aggregations by email addresses will not be impacted by this structure.

@JamisonWhite
Copy link

Updated based on @wasserman feedback.

field type description
email.to nested nested object of addresses
email.to.address keyword email address
email.to.display_name keyword email display name
same for "cc" and "bcc"

@ebeahan
Copy link
Member

ebeahan commented Nov 17, 2021

Thanks, @wasserman @JamisonWhite, for the feedback!

I will incorporate the thinking from this discussion into the stage 2 PR #1593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants