Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add search 'fields' option to support high-level field retrieval. #60100

Merged
merged 14 commits into from
Jul 27, 2020

Commits on Jul 27, 2020

  1. Add a simple 'fetch fields' phase. (#55639)

    Currently the phase just looks up each field name in the _source and returns its
    values in the 'fields' section of the response. There are several aspects that
    need improvement -- this PR just lays out the initial class structure and tests.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    745fa62 View commit details
    Browse the repository at this point in the history
  2. Make the fetch fields phase easier to test. (#55756)

    This commit pulls out a FieldValueRetriever object, which retrieves specific
    fields given a document's source. The new object makes it easier to unit test
    the logic, and will help keep FetchFieldsPhase from growing too complex as we
    add more functionality.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    7145c0b View commit details
    Browse the repository at this point in the history
  3. Resolve field aliases and multi-fields. (#55889)

    This commit adds the capability to `FieldTypeLookup` to retrieve a field's
    paths in the _source. When retrieving a field's values, we consult these
    source paths to make sure we load the relevant values. This allows us to handle
    requests for field aliases and multi-fields.
    
    We also retrieve values that were copied into the field through copy_to. To me
    this is what users would expect out of the API, and it's consistent with what
    comes back from `docvalues_fields` and `stored_fields`. However it does add
    some complexity, and was not something flagged as important from any of the
    clients I spoke to about this API. I'm looking for feedback on this point.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    30b5cf1 View commit details
    Browse the repository at this point in the history
  4. Allow field mappers to retrieve fields from source. (#56928)

    This PR adds new method `FieldMapper#lookupValues(SourceLookup)` that extracts
    and parses the source values. This lets us return values like numbers and dates
    in a consistent format, and also handle special data types like
    `constant_keyword`. The `lookupValues` method calls into `parseSourceValue`,
    which mappers can override to specify how values should be parsed.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    9e2ee63 View commit details
    Browse the repository at this point in the history
  5. Add support for a 'format' option in fields retrieval. (#57855)

    The new `format` option allows for passing a custom date format:
    
    ```
    POST logs-*/_search
    {
      "fields": [
        "file.*",
        {
          "field": "event.timestamp",
          "format": "epoch_millis"
        },
        ...
      ]
    }
    ```
    
    Other API notes:
    * We use the same syntax as `docvalue_fields` for consistency. Under the hood,
    both `fields` and `docvalue_fields` use the same `FieldAndFormat` object to
    share serialization logic.
    * Only `date` and `date_range` fields support formatting currently.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    51b6a4e View commit details
    Browse the repository at this point in the history
  6. Respect the ignore_above option. (#57307)

    For keyword-style fields, if the source value is larger than `ignore_above`
    then we don't retrieve the field. In particular, the field is treated as if the
    value didn't exist.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    72c69da View commit details
    Browse the repository at this point in the history
  7. For the fields fetch phase, avoid reloading stored fields. (#58196)

    This PR updates FetchFieldsPhase to override hitExecute instead of hitsExecute
    (plural). This way, we can make sure that the stored fields (including _source)
    are only loaded once per hit as part of FetchPhase.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    828f514 View commit details
    Browse the repository at this point in the history
  8. Skip over metadata fields in the field retrieval API. (#58710)

    This avoids unnecessary lookups, since metadata fields don't have _source
    values.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    5b31ede View commit details
    Browse the repository at this point in the history
  9. Return null_value when the source contains a 'null' for the field. (#…

    …58623)
    
    This PR adds a version of `XContentMapValues.extractValue` that accepts a
    default value to return in place of 'null'. It then uses this method when
    looking up source values to return the configured `null_value` instead of
    'null' when retrieving fields.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    608d185 View commit details
    Browse the repository at this point in the history
  10. Add docs for the fields retrieval API. (#58787)

    This PR adds docs for the `fields` parameter. We now present `fields` as the
    preferred way to load specific fields in a search, with `docvalue_fields` and
    `stored_fields` as other options to look into. Source filtering is no longer
    featured prominently, and its section is moved to the end.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    2fe80b4 View commit details
    Browse the repository at this point in the history
  11. Apply keyword normalizers in the field retrieval API. (#59260)

    As we discussed in the meta-issue, when returning `keyword` in the fields
    retrieval API, we'll apply their `normalizer`. This decision is not a clear-cut
    one, and we'll validate it with internal users before merging the feature
    branch.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    56ff9bc View commit details
    Browse the repository at this point in the history
  12. Support spatial fields in field retrieval API. (#59821)

    Although we accept a variety of formats during indexing, spatial data is
    returned in a single consistent format. This is GeoJSON by default, but
    well-known text is also supported by passing the option 'format: wkt'.
    
    Note that points (in addition to shapes) are returned in GeoJSON by default. The
    reasoning is that this gives better consistency, and is the most convenient
    format for most REST API users.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    8b2247c View commit details
    Browse the repository at this point in the history
  13. Remove the 'fields' URL param from the REST spec.

    We don't actually support 'fields' as a URL parameter.
    jtibshirani committed Jul 27, 2020
    Configuration menu
    Copy the full SHA
    594d631 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    840cd11 View commit details
    Browse the repository at this point in the history