-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't index so many saved object fields #43673
Comments
Pinging @elastic/kibana-platform |
If we switch from |
In the packages, the saved objects are stored with decoded JSON fields. The reason is that this makes versioning of it much easier and the diffs are simpler. But inside Kibana some of the fields are stored as encoded json strings (this might change in the future elastic/kibana#43673). To not require special logic on the Package Manager to encode the strings, this is done directly during packaging. One thing not too nice about this PR is that it includes now a dependency on `common.MapStr` from Beats. Reason is that it makes the code much simpler. Part of elastic#42
In the packages, the saved objects are stored with decoded JSON fields. The reason is that this makes versioning of it much easier and the diffs are simpler. But inside Kibana some of the fields are stored as encoded json strings (this might change in the future elastic/kibana#43673). To not require special logic on the Package Manager to encode the strings, this is done directly during packaging. One thing not too nice about this PR is that it includes now a dependency on `common.MapStr` from Beats. Reason is that it makes the code much simpler. Part of #42
@elastic/kibana-platform Any update on this? We are using decoded JSON in all our packages (elastic/package-registry#354) at the moment for versioning purposes but then encode them again during the packaging. It would be nice to align here if possible. |
Kibana migrations already take care of backwards compatibility. So in 8.0 the maps team could write a migration to store This doesn't solve for the unnecessary encoding/decoding you currently have in package-registry (at least not until 8.0), but I assume the pain isn't bad enough to justify adding a JSON encoding/decoding option to saved objects just to remove this serialization? |
My question is more about long term alignment. Is the plan in 8.0 to have all SO content decoded or will it stay as is today? |
Yes, all JSON strings should be stored decoded as objects in 8.x. This will ultimately be up to different teams to implement and I'm not sure if we could enforce this for 8.0 but the effort should be minimal so I don't see a reason teams wouldn't be able to comply. |
This is great! I'm wondering if the "owners" of each saved object know about this? If not, perhaps worth sending out a note or ping them here? |
Yes I agree, it's definitely worth coordinating this with all the teams. However, since 8.x is still a long way off I think teams would benefit by not having 7.x and master branches diverge until we're closer to 8.x. I will own giving teams an early heads up during 8.0-alpha1 |
@timroes, @rudolf The telemetry saved object has all of 8 fields and we use all of them to determine a few things:
|
@TinaHeiligers if you've audited your plugin and no fields can be removed you can just tick off the task in the issue, thanks! |
There are many plugins running taskManager tasks to calculate the telemetry object and storing them as savedObjects. So when the fetcher kicks in, they'll return the savedObject content. Maybe it's a good idea to use the type |
* Initial App Search in Kibana plugin work - Initializes a new platform plugin that ships out of the box w/ x-pack - Contains a very basic front-end that shows AS engines, error states, or a Setup Guide - Contains a very basic server that remotely calls the AS internal engines API and returns results * Update URL casing to match Kibana best practices - URL casing appears to be snake_casing, but kibana.json casing appears to be camelCase * Register App Search plugin in Home Feature Catalogue * Add custom App Search in Kibana logo - I haven't had much success in surfacing a SVG file via a server-side endpoint/URL, but then I realized EuiIcon supports passing in a ReactElement directly. Woo! * Fix appSearch.host config setting to be optional - instead of crashing folks on load * Rename plugin to Enterprise Search - per product decision, URL should be enterprise_search/app_search and Workplace Search should also eventually live here - reorganize folder structure in anticipation for another workplace_search plugin/codebase living alongside app_search - rename app.tsx/main.tsx to a standard top-level index.tsx (which will contain top-level routes/state) - rename AS->ES files/vars where applicable - TODO: React Router * Set up React Router URL structure * Convert showSetupGuide action/flag to a React Router link - remove showSetupGuide flag - add a new shared helper component for combining EuiButton/EuiLink with React Router behavior (https:/elastic/eui/blob/master/wiki/react-router.md#react-router-51) * Implement Kibana Chrome breadcrumbs - create shared helper (WS will presumably also want this) for generating EUI breadcrumb objects with React Router links+click behavior - create React component that calls chrome.setBreadcrumbs on page mount - clean up type definitions - move app-wide props to IAppSearchProps and update most pages/views to simply import it instead of calling their own definitions * Added server unit tests (#2) * Added unit test for server * PR Feedback * Refactor top-level Kibana props to a global context state - rather them passing them around verbosely as props, the components that need them should be able to call the useContext hook + Remove IAppSearchProps in favor of IKibanaContext + Also rename `appSearchUrl` to `enterpriseSearchUrl`, since this context will contained shared/Kibana-wide values/actions useful to both AS and WS * Added unit tests for public (#4) * application.test.ts * Added Unit Test for EngineOverviewHeader * Added Unit Test for generate_breadcrumbs * Added Unit Test for set_breadcrumb.tsx * Added a unit test for link_events - Also changed link_events.tsx to link_events.ts since it's just TS, no React - Modified letBrowserHandleEvent so it will still return a false boolean when target is blank * Betterize these tests Co-Authored-By: Constance <[email protected]> Co-authored-by: Constance <[email protected]> * Add UI telemetry tracking to AS in Kibana (#5) * Set up Telemetry usageCollection, savedObjects, route, & shared helper - The Kibana UsageCollection plugin handles collecting our telemetry UI data (views, clicks, errors, etc.) and pushing it to elastic's telemetry servers - That data is stored in incremented in Kibana's savedObjects lib/plugin (as well as mapped) - When an end-user hits a certain view or action, the shared helper will ping the app search telemetry route which increments the savedObject store * Update client-side views/links to new shared telemetry helper * Write tests for new telemetry files * Implement remaining unit tests (#7) * Write tests for React Router+EUI helper components * Update generate_breadcrumbs test - add test suite for generateBreadcrumb() itself (in order to cover a missing branch) - minor lint fixes - remove unnecessary import from set_breadcrumbs test * Write test for get_username util + update test to return a more consistent falsey value (null) * Add test for SetupGuide * [Refactor] Pull out various Kibana context mocks into separate files - I'm creating a reusable useContext mock for shallow()ed enzyme components + add more documentation comments + examples * Write tests for empty state components + test new usecontext shallow mock * Empty state components: Add extra getUserName branch test * Write test for app search index/routes * Write tests for engine overview table + fix bonus bug * Write Engine Overview tests + Update EngineOverview logic to account for issues found during tests :) - Move http to async/await syntax instead of promise syntax (works better with existing HttpServiceMock jest.fn()s) - hasValidData wasn't strict enough in type checking/object nest checking and was causing the app itself to crash (no bueno) * Refactor EngineOverviewHeader test to use shallow + to full coverage - missed adding this test during telemetry work - switching to shallow and beforeAll reduces the test time from 5s to 4s! * [Refactor] Pull out React Router history mocks into a test util helper + minor refactors/updates * Add small tests to increase branch coverage - mostly testing fallbacks or removing fallbacks in favor of strict type interface - these are slightly obsessive so I'd also be fine ditching them if they aren't terribly valuable * Address larger tech debt/TODOs (#8) * Fix optional chaining TODO - turns out my local Prettier wasn't up to date, completely my bad * Fix constants TODO - adds a common folder/architecture for others to use in the future * Remove TODO for eslint-disable-line and specify lint rule being skipped - hopefully that's OK for review, I can't think of any other way to sanely do this without re-architecting the entire file or DDoSing our API * Add server-side logging to route dependencies + add basic example of error catching/logging to Telemetry route + [extra] refactor mockResponseFactory name to something slightly easier to read * Move more Engines Overview API logic/logging to server-side - handle data validation in the server-side - wrap server-side API in a try/catch to account for fetch issues - more correctly return 2xx/4xx statuses and more correctly deal with those responses in the front-end - Add server info/error/debug logs (addresses TODO) - Update tests + minor refactors/cleanup - remove expectResponseToBe200With helper (since we're now returning multiple response types) and instead make mockResponse var name more readable - one-line header auth - update tests with example error logs - update schema validation for `type` to be an enum of `indexed`/`meta` (more accurately reflecting API) * Per telemetry team feedback, rename usageCollection telemetry mapping name to simpler 'app_search' - since their mapping already nests under 'kibana.plugins' - note: I left the savedObjects name with the '_telemetry' suffix, as there very well may be a use case for top-level generic 'app_search' saved objects * Update Setup Guide installation instructions (#9) Co-authored-by: Chris Cressman <[email protected]> * [Refactor] DRY out route test helper * [Refactor] Rename public/test_utils to public/__mocks__ - to better follow/use jest setups and for .mock.ts suffixes * Add platinum licensing check to Meta Engines table/call (#11) * Licensing plugin setup * Add LicensingContext setup * Update EngineOverview to not hit meta engines API on platinum license * Add Jest test helpers for future shallow/context use * Update plugin to use new Kibana nav + URL update (#12) * Update new nav categories to add Enterprise Search + update plugin to use new category - per @johnbarrierwilson and Matt Riley, Enterprise Search should be under Kibana and above Observability - Run `node scripts/check_published_api_changes.js --accept` since this new category affects public API * [URL UPDATE] Change '/app/enterprise_search/app_search' to '/app/app_search' - This needs to be done because App Search and Workplace search *have* to be registered as separate plugins to have 2 distinct nav links - Currently Kibana doesn't support nested app names (see: #59190) but potentially will in the future - To support this change, we need to update applications/index.tsx to NOT handle '/app/enterprise_search' level routing, but instead accept an async imported app component (e.g. AppSearch, WorkplaceSearch). - AppSearch should now treat its router as root '/' instead of '/app_search' - (Addl) Per Josh Dover's recommendation, switch to `<Router history={params.history}>` from `<BrowserRouter basename={params.appBasePath}>` since they're deprecating appBasePath * Update breadcrumbs helper to account for new URLs - Remove path for Enterprise Search breadcrumb, since '/app/enterprise_search' will not link anywhere meaningful for the foreseeable future, so the Enterprise Search root should not go anywhere - Update App Search helper to go to root path, per new React Router setup Test changes: - Mock custom basepath for App Search tests - Swap enterpriseSearchBreadcrumbs and appSearchBreadcrumbs test order (since the latter overrides the default mock) * Add create_first_engine_button telemetry tracking to EmptyState * Switch plugin URLs back to /app/enterprise_search/app_search Now that #66455 has been merged in 🎉 * Add i18n formatted messages / translations (#13) * Add i18n provider and formatted/i18n translated messages * Update tests to account for new I18nProvider context + FormattedMessage components - Add new mountWithContext helper that provides all contexts+providers used in top-level app - Add new shallowWithIntl helper for shallow() components that dive into FormattedMessage * Format i18n dates and numbers + update some mock tests to not throw react-intl invalid date messages * Update EngineOverviewHeader to disable button on prop * Address review feedback (#14) * Fix Prettier linting issues * Escape App Search API endpoint URLs - per PR feedback - querystring should automatically encodeURIComponent / escape query param strings * Update server plugin.ts to use getStartServices() rather than storing local references from start() - Per feedback: https:/elastic/kibana/blob/master/src/core/CONVENTIONS.md#applications - Note: savedObjects.registerType needs to be outside of getStartServices, or an error is thrown - Side update to registerTelemetryUsageCollector to simplify args - Update/fix tests to account for changes * E2E testing (#6) * Wired up basics for E2E testing * Added version with App Search * Updated naming * Switched configuration around * Added concept of 'fixtures' * Figured out how to log in as the enterprise_search user * Refactored to use an App Search service * Added some real tests * Added a README * Cleanup * More cleanup * Error handling + README updatre * Removed unnecessary files * Apply suggestions from code review Co-authored-by: Constance <[email protected]> * Update x-pack/plugins/enterprise_search/public/applications/app_search/components/engine_overview/engine_table.tsx Co-authored-by: Constance <[email protected]> * PR feedback - updated README * Additional lint fixes Co-authored-by: Constance <[email protected]> * Add README and CODEOWNERS (#15) * Add plugin README and CODEOWNERS * Fix Typescript errors (#16) * Fix public mocks * Fix empty states types * Fix engine table component errors * Fix engine overview component errors * Fix setup guide component errors - SetBreadcrumbs will be fixed in a separate commit * Fix App Search index errors * Fix engine overview header component errors * Fix applications context index errors * Fix kibana breadcrumb helper errors * Fix license helper errors * ❗ Refactor React Router EUI link/button helpers - in order to fix typescript errors - this changes the component logic significantly to a react render prop, so that the Link and Button components can have different types - however, end behavior should still remain the same * Fix telemetry helper errors * Minor unused var cleanup in plugin files * Fix telemetry collector/savedobjects errors * Fix MockRouter type errors and add IRouteDependencies export - routes will use IRouteDependencies in the next few commits * Fix engines route errors * Fix telemetry route errors * Remove any type from source code - thanks to Scotty for the inspiration * Add eslint rules for Enterprise Search plugin - Add checks for type any, but only on non-test files - Disable react-hooks/exhaustive-deps, since we're already disabling it in a few files and other plugins also have it turned off * Cover uncovered lines in engines_table and telemetry tests * Fixed TS warnings in E2E tests (#17) * Feedback: Convert static CSS values to EUI variables where possible * Feedback: Flatten nested CSS where possible - Prefer setting CSS class overrides on individual EUI components, not on a top-level page + Change CSS class casing from kebab-case to camelCase to better match EUI/Kibana + Remove unnecessary .euiPageContentHeader margin-bottom override by changing the panelPaddingSize of euiPageContent + Decrease engine overview table padding on mobile * Refactor out components shared with Workplace Search (#18) * Move getUserName helper to shared - in preparation for Workplace Search plugin also using this helper * Move Setup Guide layout to a shared component * Setup Guide: add extra props for standard/native auth links Note: It's possible this commit may be unnecessary if we can publish shared Enterprise Search security mode docs * Update copy per feedback from copy team * Address various telemetry issues - saved objects: removing indexing per #43673 - add schema and generate json per #64942 - move definitions over to collectors since saved objects is mostly empty at this point, and schema throws an error when it imports an obj instead of being defined inline - istanbul ignore saved_objects file since it doesn't have anything meaningful to test but was affecting code coverage * Disable plugin access if a normal user does not have access to App Search (#19) * Set up new server security dependency and configs * Set up access capabilities * Set up checkAccess helper/caller * Remove NoUserState component from the public UI - Since this is now being handled by checkAccess / normal users should never see the plugin at all if they don't have an account/access, the component is no longer needed * Update server routes to account for new changes - Remove login redirect catch from routes, since the access helper should now handle that for most users by disabling the plugin (superusers will see a generic cannot connect/error screen) - Refactor out new config values to a shared mock * Refactor Enterprise Search http call to hit/return new internal API endpoint + pull out the http call to a separate library for upcoming public URL work (so that other files can call it directly as well) * [Discussion] Increase timeout but add another warning timeout for slow servers - per recommendation/convo with Brandon * Register feature control * Remove no_as_account from UI telemetry - since we're no longer tracking that in the UI * Address PR feedback - isSuperUser check * Public URL support for Elastic Cloud (#21) * Add server-side public URL route - Per feedback from Kibana platform team, it's not possible to pass info from server/ to public/ without a HTTP call :[ * Update MockRouter for routes without any payload/params * Add client-side helper for calling the new public URL API + API seems to return a URL a trailing slash, which we need to omit * Update public/plugin.ts to check and set a public URL - relies on this.hasCheckedPublicUrl to only make the call once per page load instead of on every page nav * Fix failing feature control tests - Split up scenario cases as needed - Add plugin as an exception alongside ML & Monitoring * Address PR feedback - version: kibana - copy edits - Sass vars - code cleanup * Casing feedback: change all plugin registration IDs from snake_case to camelCase - note: current remainng snake_case exceptions are telemetry keys - file names and api endpoints are snake_case per conventions * Misc security feedback - remove set - remove unnecessary capabilities registration - telemetry namespace agnostic * Security feedback: add warn logging to telemetry collector see #66922 (comment) - add if statement - pass log dependency around (this is kinda medium, should maybe refactor) - update tests - move test file comment to the right file (was meant for telemetry route file) * Address feedback from Pierre - Remove unnecessary ServerConfigType - Remove unnecessary uiCapabilities - Move registerTelemetryRoute / SavedObjectsServiceStart workaround - Remove unnecessary license optional chaining * PR feedback Address type/typos * Fix telemetry API call returning 415 on Chrome - I can't even?? I swear charset=utf-8 fixed the same error a few weeks ago * Fix failing tests * Update Enterprise Search functional tests (without host) to run on CI - Fix incorrect navigateToApp slug (hadn't realized this was a URL, not an ID) - Update without_host_configured tests to run without API key - Update README * Address PR feedback from Pierre - remove unnecessary authz? - remove unnecessary content-type json headers - add loggingSystemMock.collect(mockLogger).error assertion - reconstrcut new MockRouter on beforeEach for better sandboxing - fix incorrect describe()s -should be it() - pull out reusable mockDependencies helper (renamed/extended from mockConfig) for tests that don't particularly use config/log but still want to pass type definitions - Fix comment copy Co-authored-by: Jason Stoltzfus <[email protected]> Co-authored-by: Chris Cressman <[email protected]> Co-authored-by: scottybollinger <[email protected]> Co-authored-by: Elastic Machine <[email protected]> # Conflicts: # .github/CODEOWNERS # x-pack/scripts/functional_tests.js
Makes sense 👍 Especially for 7.9 we need to focus on the low hanging fruit. |
I made the |
striked out a few plugins that are deprecated. |
Closing this as the enhancements we've made to scale saved object migrations #144035 and serverless zdt migrations prevent us from removing existing fields. To mitigate the field growth we have split the .kibana saved objects into several smaller indices. |
Update 29 June 2020
With 7.9 currently having ~960 fields we're fast approaching the 1000 field default limit. Please audit your plugins mappings and remove any unnecessary fields. Link from your PR back to this issue and mark your plugin's task as complete once the PR has been merged.
Removing fields
Setting
index:false
anddoc_values:false
removes some of the overhead of a field, but doesn't reduce the field count. To reduce the field count fields need to be removed from the mappings completely. This can be done by specifyingdynamic: false
on any level of your mappings.For example, the following diff will remove three fields from the field count. The removed fields can still be stored in the Saved Object type but searching and aggregation is only possible on the
timestamp
field. Note: this change also removes any validation on Elasticsearch, which will allow saved objects with unknown attributes to be saved. Because of this we recommend by starting only with low-risk saved object types like telemetry data.You can use the following command to count the amount of fields to do a before/after comparison (requires
brew install jq
):Plugins:
[ ] plugins/timelion @elastic/kibana-app @flash1293xpack/plugins/canvas @elastic/kibana-canvasxpack/plugins/file_upload @elastic/kibana-gisxpack/plugins/graph @elastic/kibana-app @flash1293xpack/plugins/task_manager @elastic/kibana-alerting-servicesTask Manager does not use the .kibana indexOriginal issue
Looking at the current mapping for a lot of our saved objects we're indexing a terrible amount of unnecessary fields, i.e. fields we know we'll never want to search through or filter over. Indexing those will just waste some more heap in Elasticsearch, if the field is unnecessary analyzed waste a couple of milliseconds on every insert and thus every migration. We even use a lot of
text
fields in places where we store stringified JSON which doesn't make any sense, since the analyzer won't end up with anything meaningful here.This is not a huge problem, since the.kibana
index is rather small usually, and also a lot of those JSON fields might be over the defaultignore_above
value of 256 and thus not indexed in most documents. Despite not being a huge problem I discussed this with @joshdover @tylersmalley and @rudolf and we agreed, that we should not waste Heap and indexing performance on fields we know we'll never need indexed.As the field count on
.kibana
is approaching the default limit of 1000 fields we need to urgently evaluate whether or not all fields are really necessary for performing queries or filters.Mapping recommendations
Here are a couple of general recommendations for how the mappings of a saved object should look:
type=text only for full text search on real text
A field with type
text
in the mapping will be analyzed and indexed. This makes sense only for fields we know we want to do full text search on, e.g. thetitle
ordescription
of a field. If you don't need the field value analyzed for full text search, don't index the field (see below) or usekeyword
with an appropriateignore_above
as a type instead. Good examples for a properkeyword
field would be thevisType
orlanguage
of a query.Don't index if not needed
Especially with
keyword
fields, we very often index a field without thinking about it (because it's the default option). If we know we'll never need to aggregate over that field or query for that field, but just have it available when retrieving the saved object, setindex: false
anddoc_values: false
(unless it's atext
orannotated_text
field) in the mapping for that field.A couple of examples where it might make sense to have a (
keyword
) field indexed:visType
: we might want to filter on that later and thus need to be able to query by that fieldlanguage
(of a query): even though we might never want to expose that in the UI, we might want to aggregate that field for telemetry dataA couple of examples where indexing doesn't make much sense:
expression
(the "canvas" expression of a visualization): It doesn't make any sense filtering on the complex expression as a whole, neither aggregate over it. If we would want to build telemetry, we would anyway need to look at each document individually and e.g. parse it and count the containing functions.JSON fields
We have a couple of places where we use a
keyword
field (often even indexed) to store some JSON object, like the configuration of a visualization, or the state of a dashboard. As a first step, these fields should be set toindex: false
.As a further optimization this data can be saved as a field of type
object
withenabled: false
. That way the content of that field will simply be ignored by Elasticsearch, it won't be indexed or analyzed, but still returned as it was indexed (as JSON) in the saved object. This removes an unnecessaryJSON.stringify
andJSON.parse
when saving/loading those objects. Note: this will require writing a migration function for your saved object and changing any consuming code, so this is not an immediate need, but rather something to work towards for 8.0.Consider using
type: 'flattened'
(licence basic) if you need to search over many fields or an unknown amount of fieldsFlattened types uses a single field for the entire object. It comes with some limitations but in many instances can significantly reduce the field count while still being able to search/aggregate over the fields inside the object.
Keep in mind, that using the
flattened
field type, will still index all data within this field. If you just need one specific sub-field aggregated/searchable, but the rest not, the above describeddynamic: false
approach (where the parent key isdynamic: false
and just that one sub-field you need search/aggregation on would have an (indexed) typing) would be more preferable. Usage offlattened
is mostly preferred, if you potentially need to search/aggregate through a larger amount of sub-fields.What happens after I changed my plugins mappings?
If you switch a field from an indexed to a not-indexed state (e.g. with
enabled: false
orindex: false
), the migration system will automatically update the mappings when Kibana is upgraded, no further action is required. If your plugin has recently removed or renamed an entire Saved Object type, these old mappings might not have been cleaned up. Please reach out to @elastic/kibana-platform if you think this might be the case.The text was updated successfully, but these errors were encountered: