Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maxmind vs Logstash's GeoIP Filter #1217

Closed
yaauie opened this issue Jan 11, 2021 · 6 comments · Fixed by #1229
Closed

Maxmind vs Logstash's GeoIP Filter #1217

yaauie opened this issue Jan 11, 2021 · 6 comments · Fixed by #1229
Labels
enhancement New feature or request

Comments

@yaauie
Copy link
Member

yaauie commented Jan 11, 2021

Summary

The Logstash GeoIP filter uses the a maxminddb-formatted database (that may or may not be provided by Maxmind) to populate a number of fields to enrich an event based on an IP address. As presently implemented in the Logstash filter, all fields need to be sub-fields of a single target (e.g, with target => client, we would have client.geo.*, client.as.*, etc.).

Motivation:

Clearly define destinations for all GeoIP Filter fields as a sub-field of a single target, so that users can enable ECS Compatibility Mode without losing metadata that they currently rely on.

Specifically, we have six fields that do not have direct analogues in ECS:

  • timezone: the IANA name of the timezone e.g., America/New_York
  • postal_code: a string postal code, length varies by country
  • continent_code: "A two character continent code like "NA" (North America) or "OC" (Oceania)"
  • organization: the name of the business or ISP associated with an address, reportedly available for ~40% of lookups.
  • isp: the name of the ISP
  • dma_code: us-only code representing Designated Market Area (~metro area)

Many of the existing ECS fields underneath geo.* are named aligning with Geo*2 like these here. I see some fields, such as timezone, being a good candidates as ECS additions, but I’m not so sure about others, such as the US-specific dma_code.

-- @ebeahan

Detailed Design:

  1. propose the addition of specs to ECS for
    • geo.timezone (IANA name, presently up to 30 characters, e.g., America/Argentina/Buenos_Aires),
    • geo.postal_code (freeform but relatively space limited, regulated by each country; see wikipedia),
    • geo.continent_code (docs define AF, AN, AS, EU, NA, OC, and SA)
  2. guidance for namespacing the other fields so that they are usable and unlikely to present future conflict, bearing in mind that they must be sub-fields of the singular target that is the parent of the related geo and as fields.
    • (a) under a mmdb sub-key? e.g, ${target}.mmdb.organization
      • since the mmdb being used may or may not be provided by Maxmind, I'd like to avoid explicitly using "maxmind" in the key name
      • dma_code, while provided in mmdb, is a US-only Nielsen Ratings construct and may not be adequately described with a mmdb prefix.
    • (b) some other way?

Related: logstash-plugins/logstash-filter-geoip#163

@yaauie yaauie added the enhancement New feature or request label Jan 11, 2021
@nickpeihl
Copy link
Member

nickpeihl commented Jan 11, 2021

Thanks for opening this issue @yaauie. Choropleth mapping in the Elastic Maps product could also benefit from some of these fields in ECS. We already support US postal codes and adding support for timezones and continents should also be possible.

cc @elastic/kibana-gis

@ebeahan
Copy link
Member

ebeahan commented Jan 12, 2021

++ to adding geo.timezone, geo.postal_code, and geo.continent_name.

The more I explore it, the more I also can see adding geo.organization and geo.isp. Several other IP geolocation database providers I looked at (ipinfo.io, DBIP, IP2Location, Neustar) also populate the ISP and organization name. WDYT?

The motivation for mmdb over Maxmind makes sense, and I think nesting any additional fields under mmdb subkey, such as client.geo.mmdb.dma_code, works. By being nested underneath the geo namespace, I think the geolocation intended use is clear.

@yaauie
Copy link
Member Author

yaauie commented Jan 12, 2021

I'm -1 to putting organization and isp under geo, since neither is geo-specific metadata. Yes, we get it though a geoip service at this point, but the source of the data shouldn't matter.

@ebeahan
Copy link
Member

ebeahan commented Jan 13, 2021

Thanks for the feedback, @yaauie. If we opt to not add organization and isp under geo, I think your proposal to nest under ${target}.mmdb works.

If we have an agreement around adding the three new fields to geo.*, I can work on a PR with the additions.

@yaauie
Copy link
Member Author

yaauie commented Jan 14, 2021

I think your proposal to nest under ${target}.mmdb works

Perfect. I appreciate the guidance.

++ to adding [...] geo.continent_name.

We already have a geo.continent_name; did you mean geo.continent_code?

++ to:

  • geo.timezone - IANA name e.g., America/Argentina/Buenos_Aires, likely keyword (wildcard?)
  • geo.postal_code - freeform, likely keyword
  • geo.continent_code - one of AF, AN, AS, EU, NA, OC, and SA; likely keyword

@ebeahan
Copy link
Member

ebeahan commented Jan 14, 2021

We already have a geo.continent_name; did you mean geo.continent_code?

Yes, geo.continent_code. 🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants