Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIEM] Meta issue for saved object needs for large lists #64715

Closed
FrankHassanabad opened this issue Apr 28, 2020 · 7 comments
Closed

[SIEM] Meta issue for saved object needs for large lists #64715

FrankHassanabad opened this issue Apr 28, 2020 · 7 comments
Labels
Feature:Saved Objects Meta Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Endpoint Response Endpoint Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:SIEM

Comments

@FrankHassanabad
Copy link
Contributor

FrankHassanabad commented Apr 28, 2020

This is meta ticket around ad-hoc requirements and feature requests from Elastic Security for saved object support of large lists such as large list values.

Meta issue for saved objects improvements which has a lot of these requests from other teams:
#61716

Data Index implementation we have merged right now that does not use Saved Objects behind a feature flag to keep teams moving and not blocked:
#62552


Support for > 10k (Nk) objects (done in #86301)

Use case:
As a list user, I will be uploading different large list values that contain IP, host names, etc... These list values can contain > 10k items and this could be even larger such as >200k. As a list user I will be uploading and appending/changing list values as well as exporting > 10k items at a time using the provided REST streaming API.

Possible technical solutions:
Support search after within the find API or "search after" directly as a new complementary API next to the existing find API.

Support for delete by query

Use case:
As a list user, I will be uploading multiple lists where each list can contain large list values using a list_id to disambiguate between the lists. From time to time I will be deleting entire large list values by their list_id.

Possible technical solutions:
Support delete by query. We will have user list items from different lists mixed together within one saved object type and these will be distinguishable from which list they belong to using their "list_id".

These list items will need to be deleted by their key of "list_id" and we would like it if we could delete them all at once rather than calling back and forth to get each list item id and deleting them in batches. This can cause a lot of network traffic and possible bugs/issues if Kibana is rebooted or errors out half way during the process. It would be preferable to delete by query all at once and let Elastic Search do its thing.

Support update by query

Technically we do not use this yet within our data index implementation but we would have this if we used a de-normalized format. We currently use a normalized format

Use case:
As a list user, I will need to update list items selectively in bulk using their list_id such as all of the individual list item's names and descriptions that collectively that belong to a particular list.

Possible technical solutions:
Support update by query.

@elasticmachine
Copy link
Contributor

Pinging @elastic/siem (Team:SIEM)

@FrankHassanabad FrankHassanabad added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Apr 28, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@FrankHassanabad FrankHassanabad added the Team:Endpoint Response Endpoint Response Team label Apr 28, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/endpoint-response (Team:Endpoint Response)

@kobelb
Copy link
Contributor

kobelb commented Apr 29, 2020

Do we need these lists to be separate documents/saved-objects? Are we generally consuming these lists in their entirety?

@FrankHassanabad
Copy link
Contributor Author

Do we need these lists to be separate documents/saved-objects?

The structure for lists is different than the structure for list items:
List mapping
List item mapping

If we want to de-normalize it and have name,description,tags on each list item then the trade off is that when someone calls a REST request to update a single list name,description,tags we will have to do a "update by query" to update each "list item"'s name, description. For displaying the top level lists and not list items we will then be performing aggregations against the "list_id" field first, and then grabbing the first record's "name, description, tags, etc..." found from the records and make the assumption that all of the list items have the same denormalized but duplicated "name, description, tags, etc..."

...or...

We would use the mapping for list and list items has a single super set mapping where we begin branching off of a "type" within the same saved object collection and storing both list and list item together within the super set and query against "type" to figure out when we are a list vs a list item that belongs to a list within a single saved object type/index.

Either way we still need a "delete by query" and "search after" but I can see either of those two other options being a way to reduce down to a super set mapping or to reduce our existing data index implementation to a single data index from two data indexes that we have right now.

Are we generally consuming these lists in their entirety?

If the question is are we going to generally iterate over the entire list items all the time, that would be a 'no'.

The user will have multiple lists and each list contains multiple list items. The user can do CRUD against any individual list or against any individual list item within a list or subsets of data.

example CRUD operations against a single list are things like:

  • update a single list name
  • update a single list description
  • delete the entire list which will also delete all the list items

example CRUD operations against a single list item are things like:

  • add a new list item to an existing list
  • update a single list item
  • query against a set of list items using CIDR if the list type is that of "ip"
  • delete a single list item or against a set of them using CIDR if the list type is that of "ip"

All of those operations are working against the merged data index code we have right now that is behind a feature flag if you want to play around with it these curl scripts

@FrankHassanabad
Copy link
Contributor Author

"Reviewed by Frank Hassanabad on 7/29/2020, still valid as of this date" We have implemented several workarounds and TODO's on our side but still would like these things added for us to utilize so we can remove tech debt.

@rudolf
Copy link
Contributor

rudolf commented Mar 15, 2021

Updated the issue now that we have search_after support #86301

@FrankHassanabad FrankHassanabad closed this as not planned Won't fix, can't repro, duplicate, stale Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Saved Objects Meta Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Endpoint Response Endpoint Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:SIEM
Projects
None yet
Development

No branches or pull requests

5 participants