[Fleet] Initiate Fleet setup on boot #111858

joshdover · 2021-09-10T13:30:23Z

Blocked by:

In order to support smooth Stack upgrades, certain packages, if installed, need to be kept in sync with the Stack version. To accomplish this, the Fleet plugin should initiate its setup process when Kibana starts up rather than waiting for a user to visit the Fleet app in the UI.

Requirements:

Add logic to the Fleet plugin's start method to initiate the [setup process] - [Fleet] Move Fleet Setup to start lifecycle #117552
- This logic should:
  - Install any preconfigured packages
  - ~~Upgrade any bundled / managed packages from disk (see [Fleet] Add support for bundling Stack-version aligned packages with Kibana #112095)~~
  - ~~Upgrade package policies for all packages above~~
- This will block Kibana startup on this process completing
- We should ensure that we do not make the default on-prem or development boot time slow. To accomplish this we can either:
  - Separate the managed package upgrades from setup
    - During boot we only ensure that any managed packages are upgraded (if previously installed).
    - Preconfiguration and setup would still be a separate process that runs either via the API or when the user loads the Fleet or Integrations apps.
  - No longer install packages by default and run the full setup & preconfiguration process during boot
    - This would require less changes and result in a cleaner, more clear setup procedure that is simpler to maintain and debug.
    - Default on-prem configuration would be very fast.
    - We can then remove the setup APIs or make them no-ops. Removing them in 8.0 is preferable since it is a breaking change.
Upgrading packages should also upgrade all package policies (only for managed packages) - [Fleet] Update logic for "Keep policies up to date" defaults in 8.0 #119126
- ~~Add a package spec package_policy_upgrade_strategy field for specifying a package policy upgrade strategy~~
- Add support using package_policy_upgrade_strategy to decide when to attempt policy upgrades: [Change Proposal] Add package_policy_upgrade_strategy field to support Fleet upgrade behavior package-spec#244
- ~~Update managed packages to use new field~~
- Update: As of 2021-11-18, it seems like the change proposal to add this field to the package spec is not going to result in exactly the functionality we proposed. Discussions are still ongoing, so we've elected to unblock ourselves here and simply continue working with our "hardcoded package list" concept in Fleet. See new tasks below
- Expand the list of packages for which Fleet automatically upgrades policies to include our AUTO_UPDATE_PACKAGES as well as the existing DEFAULT_PACKAGES
Add usage telemetry on upgrades
- We should add telemetry for upgrade attempts and the result (success/failure) so we can get a handle on how successful this process is for users in the wild.
- It's important that we capture why an upgrade fails so we can fix edge cases and bugs.
- [Fleet] Add usage telemetry for package policy upgrade conflicts #109870
- [Integrations] Add telemetry for package upgrades #111136
Add custom status to Kibana API - [Fleet] Wire Fleet setup status to core Kibana status API #120020
- We should report Fleet's upgrade status to the Core status API using the core.status.set API
Verify that packages listed in the auto upgrade list should also be downgraded if Kibana is rolled back. - [Fleet] Add tests for rolling back versions of managed packages. #118797
- This is critical to support Kibana rollbacks
- From my reading of the code, this should already be the behavior today, but we should add explicit test coverage if we do not already have this.
- Update: We captured the testing working here: [Fleet] Initiate Fleet setup on boot #111858 (comment) as a manual test process, and the QAS team created a test ticket here: Test ticket for Initiating Fleet setup on boot #120726
- Update: As of 2021-11-18, we've decided to punt this to the bottom of the list. Complications around writing automated test for this rollback case caused us to reevaluate. Since this task is mainly centered around tests, we're comfortable shipping it after FF for 8.0 if necessary.
~~Remove the /setup and /agents/setup APIs ?~~ This was scrapped
- They could be useful for troubleshooting purposes, but would have same effect as restarting Kibana and we'd prefer to have a single lifecycle that is done prior to the user using any Fleet features.
- 8.0 is a good time to remove things
- Would need to ensure that Agent is updated to remove these API calls
- If we're not blocking Kibana boot, continuing to call this from the UI when the Fleet app is mounted gives us 'retries'
- We should determine how slow the no-op scenario is, and if it's slow we can either:
  1. Remove the setup calls from the UI + add a retry button if failing; or
  2. Keep the setup state in memory to make the API faster + add a force option to bypass

Open questions

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-09-10T13:30:25Z

Pinging @elastic/fleet (Team:Fleet)

kpollich · 2021-09-14T15:41:53Z

We need to force that these packages to always upgrade their package policies and not let this be configurable by the user. @kpollich do we have a mechanism for doing this already?

This is captured as part of our top-level package policy upgrade under the final "Automatic package upgrades" bullet point.

#106048

The plan, for now, is to add a flag to integrations that denotes whether associated package policies should automatically be upgraded when the package is updated. This should eventually be replaced with a value that comes from the actual package spec instead, so that packages like APM can instruct Fleet to automatically upgrade policies instead of relying on user configuration.

We can introduce a piece of reconfiguration for these existing "auto-update" packages that includes this flag, as well. We set these packages up here:

kibana/x-pack/plugins/fleet/common/constants/epm.ts

Lines 17 to 40 in ee7e832

 /* 

  Package rules: 

 | | unremovablePackages | defaultPackages | autoUpdatePackages | 

 |---------------|:---------------------:|:---------------:|:------------------:| 

 | Removable | ❌ | ✔️ | ✔️ | 

 | Auto-installs | ❌ | ✔️ | ❌ | 

 | Auto-updates | ❌ | ✔️ | ✔️ | 

 `endpoint` is a special package. It needs to autoupdate, it needs to _not_ be 

 removable, but it doesn't install by default. Following the table, it needs to 

 be in `unremovablePackages` and in `autoUpdatePackages`, but not in 

 `defaultPackages`. 

 */ 

 export const unremovablePackages = [ 

 FLEET_SYSTEM_PACKAGE, 

 FLEET_ELASTIC_AGENT_PACKAGE, 

 FLEET_SERVER_PACKAGE, 

 FLEET_ENDPOINT_PACKAGE, 

 ]; 

 export const defaultPackages = unremovablePackages.filter((p) => p !== FLEET_ENDPOINT_PACKAGE); 

 export const autoUpdatePackages = [FLEET_ENDPOINT_PACKAGE];

and install them as part of our reconfiguration process here:

kibana/x-pack/plugins/fleet/server/services/setup.ts

Lines 79 to 91 in ee7e832

 packages = [ 

 ...packages, 

 ...DEFAULT_PACKAGES.filter((pkg) => !preconfiguredPackageNames.has(pkg.name)), 

 ...autoUpdateablePackages.filter((pkg) => !preconfiguredPackageNames.has(pkg.name)), 

 ]; 

 const { nonFatalErrors } = await ensurePreconfiguredPackagesAndPolicies( 

 soClient, 

 esClient, 

 policies, 

 packages, 

 defaultOutput 

 );

kibana/x-pack/plugins/fleet/server/services/preconfiguration.ts

Line 45 in ee7e832

export async function ensurePreconfiguredPackagesAndPolicies(

So, we could set a flag on some or all of these specific preconfigured packages, and if necessary another one to indicate that this piece of configuration is "frozen" and uneditable by the user. When the setup process saves these packages with these flags set, all should function as expected once the implementation specified in the above top-level issue is completed.

joshdover · 2021-11-10T11:59:55Z

One thing that has come up as part of moving the Fleet setup call to start on Kibana boot is the issue of multiple nodes running the setup concurrently. Today we have a naive guard that prevents this happening on a single node, but nothing that prevents it from happening concurrently on multiple nodes. My thinking is that by moving this to Kibana boot, it’s more likely that the multi-node scenario could happen during upgrades.

Questions:

Which pieces of Fleet setup are not idempotent?
- Anything that creates a new Saved Object that doesn't use a deterministic ID (eg via uuidv5):
  - ~~Outputs~~ uses uuidv5
  - ~~Packages~~ uses package name
  - ~~Package assets~~ uses uuidv5
  - Package policies uses uuidv4
  - Agent policies relies on default ID created by SO service
- Enrollment keys in .fleet-enrollment-api-keys uses uuidv4 but creating duplicates should not be a problem.
Which should be safe?
- Installing all Elasticsearch assets should not cause an issue (index templates, ingest pipelines, transforms, etc.)
How can we make sure that switching to deterministic IDs does not create problems for clusters that were setup prior to 8.0 where we weren't using deterministic IDs?
- I think we can continue to use the same logic for checking if an existing object already exists

Given the above, I think this should be safe to run all nodes if we can make the agent policies and package policy IDs deterministic and ensure that the create calls use overwrite: true to avoid conflict errors.

For Agent policies, we do require that preconfigured policies supply a name here. Would it be safe to use this name to seed a uuidv5 for a deterministic ID? If we did end up creating duplicates, are there any really bad side-effects?

It's worth noting that we don't require that an id is supplied if the policy is the default policy or default fleet server policy. Maybe instead of name we should use id and fallback to default_policy or default_fleet_server_policy in cases where there is no ID?

To make package policies deterministic, we can probably piggy back off the agent policy deterministic ID logic and simply append the name parameter to it. This should work because package policies must belong to a single agent policy AND because we enforce global unique package policy names as of #115212

kpollich · 2021-11-10T13:13:44Z

Thanks, @joshdover for the thorough explanation of our idempotency issues around setup. It seems to me that you've captured every concern I might've had and provided a path forward.

For Agent policies, we do require that preconfigured policies supply a name here. Would it be safe to use this name to seed a uuidv5 for a deterministic ID? If we did end up creating duplicates, are there any really bad side-effects?

I don't think there are any negative side effects in the case that we create two identical agent policies from preconfiguration. Outside of general confusion for the user, I don't think this would cause any breakdowns in Fleet's functionality.

It's worth noting that we don't require that an id is supplied if the policy is the default policy or default fleet server policy. Maybe instead of name we should use id and fallback to default_policy or default_fleet_server_policy in cases where there is no ID?

It does sound safest to me if we fall back to default_policy in cases like this.

nchaulet · 2021-11-10T14:05:10Z

Outputs uses uuidv5

Output can use uuidv5 if the user provide an id but we do not provide an id for the default output (it should not be hard to change that)

I don't think there are any negative side effects in the case that we create two identical agent policies from preconfiguration. Outside of general confusion for the user, I don't think this would cause any breakdowns in Fleet's functionality.

I think it could be an issue as agents could be enrolling based on is_default_fleet_server or is_default

For Agent policies, we do require that preconfigured policies supply a name here. Would it be safe to use this name to seed a uuidv5 for a deterministic ID? If we did end up creating duplicates, are there any really bad side-effects

Yes I think it will make sense to seed an uuid v5 with the name also we enforce name to be unique. For the cloud preconfigured policy they provide an id, should we rather make the id field mandatory in the preconfiguration? (it's what I did for preconfigured output )

joshdover · 2021-11-18T10:27:41Z

@kpollich @nchaulet thanks for the feedback. I've summarized the discussion in this comment on the dedicated issue that Kyle created: #118423 (comment)

kpollich · 2021-12-01T18:46:59Z

Originally contained a WIP version of test instructions. See finalized version below

joshdover · 2021-12-02T10:39:53Z

am I understanding the requirements correctly here and following the expected procedure for rolling back Kibana after an upgrade? I looked around and there's no official way to downgrade Kibana as far as I can tell, so I assume what we were looking for here was a rollback to a previous snapshot, and then confirmation that package versions are reset after that rollback. Does that sound correct?

Here are the full docs, sorry for not surfacing these sooner: https://www.elastic.co/guide/en/kibana/7.16/upgrade-migrations.html#upgrade-migrations-rolling-back

I think the key steps are:

Delete all saved object indices with DELETE /.kibana*
Restore the kibana feature state or all `.kibana* indices and their aliases from the snapshot

So we don't want to restore the whole snapshot just the .kibana* indices. It also makes a note about shutting down the Kibana nodes first, so you may have to scale Kibana to 0 on Cloud, then restore the snapshot manually via the ES REST API.

kpollich · 2021-12-02T19:41:55Z

It also makes a note about shutting down the Kibana nodes first, so you may have to scale Kibana to 0 on Cloud, then restore the snapshot manually via the ES REST API.

This definitely makes sense, but following these instructions requires this step number 5, which I don't think is possible in Cloud:

Start up all Kibana instances on the older version you wish to rollback to.

I followed the other steps (deleting the .kibana* indices, restoring via the ES API) successfully, but I'm not able to terminate Kibana and restart an earlier version from the Cloud console, it seems.

kpollich · 2021-12-06T17:04:30Z

@EricDavisX - sharing the latest version of manual testing steps for testing our managed packages as they relate to Kibana's upgrade/downgrade process. I just ran through these myself in a local dev environment ping-ponging between 7.15 and 7.16 instances. Hopefully, you are the right person to tag regarding developing a more robust test plan here with our QAS folks? Let me know if I can clarify anything here!

Managed Packages and Kibana Downgrades - Manual Test Instructions

Verify that packages listed in the auto upgrade list should also be downgraded if Kibana is rolled back.

We investigated creating some automated tests to cover this case in #118797, but encountered some difficulties that prevented us from making much progress. We're instead opting to document manual testing procedures for Kibana downgrades and how they interact with our various "managed" integrations in Fleet.

Manual Testing Procedure

Goal: Ensure that managed packages that we consider as "default" or "auto update" packages are downgraded when Kibana is rolled back.

These steps are written to be performed on self-hosted Kibana, as downgrading Kibana in cloud is not currently supported.

List of packages under test:

Default Packages
- System
- Elastic Agent
- Fleet Server
Additional Packages
- APM
- Endpoint
- Synthetics

Start up a 7.15 environment

Start up a fresh instance of Elasticsearch on a 7.15 snapshot as well as a fresh Kibana 7.15 instance. Ensure Fleet setup is completed by visiting the Fleet application Kibana and waiting for the loading indicator to disappear and for the Fleet UI to appear.

Ensure Kibana is set up for on-disk backups

Add an path value to your elasticsearch.yml file or via command line arguments to ensure Elasticsearch is configured to store on-disk snapshots, e.g.

path:
  repo:
    /tmp/es-backups

# Or from the CLI
$ -E path.repo=/tmp/es-backups

Install additional managed packages

Install the following non-default managed packages. We don't need to create package policies here, so navigating to the integration's Settings tab and clicking Install [integration] assets should suffice.

APM
Endpoint
Synthetics

Confirm integration versions for managed packages

Confirm the versions of all managed packages. We'll reference these versions later when we upgrade and then eventually downgrade again.

Integration	Version
APM	0.4.0
Elastic Agent	1.2.1
Endpoint	1.1.1
Fleet Server	1.0.1
Synthetics	0.3.0
System	1.6.3

Snapshot your Kibana data

Register a repository

Via the Stack Management -> Data -> Snapshot and Restore UI, register a repository using the "Shared file system" option. Give it a name e.g. my-repository and provide the path you configured above: /tmp/es-backups.

Follow the docs to create a snapshot of your instance via Kibana dev tools, e.g.

PUT /_snapshot/my-repository/[timestamp]_snapshot?wait_for_completion=true

Upgrade to a 7.16 environment

Stop your 7.15 environment, and run a 7.16 environment in its place. Ensure you provide the same configuration values for the path.repo field as above. Start up a 7.16 instance of Elasticsearch and Kibana.

Run Fleet setup again

Navigate to /app/fleet and make sure the Fleet setup process has run successfully again.

Confirm integration versions for managed packages

Your integrations should have a few new versions in 7.16. These are called out in bold.

Integration	Version
APM	0.4.0
Elastic Agent	1.3.0
Endpoint	1.2.2
Fleet Server	1.1.0
Synthetics	0.5.0
System	1.6.3

Elastic Agent, Endpoint, Fleet Server, and Synthetics should all have upgraded to new versions, while APM and System should remain on their existing version through the upgrade process.

Roll back to 7.15

Follow the rollback documentation to rollback to 7.15. The key steps of this process should be:

Stop your 7.16 Kibana instance
Make a DELETE /.kibana* API request to your Elasticsearch instance

Restore your previous 7.15 data using the Elasticsearch API, e,g

# You may need to do this first to allow wildcard deletes
PUT _cluster/settings
{
  "persistent" : {
    "action.destructive_requires_name" : false
  }
}

# Actual delete command
POST _snapshot/my-repository/[timestamp]_snapshot/_restore
{
  "indices": ".kibana*"
}

Start a 7.15 Kibana instance

Confirm integration versions for managed packages

Your packages should have downgraded to the versions they were prior to upgrading to 7.16. Kibana should not have maintained the newer versions of any packages.

Integration	Version
APM	0.4.0
Elastic Agent	1.2.1
Endpoint	1.1.1
Fleet Server	1.0.1
Synthetics	0.3.0
System	1.6.3

EricDavisX · 2021-12-06T20:23:34Z

@kpollich thanks, you can reach out to me, sure. I am going to pass this on to @dikshachauhan-qasource and @sagarnagpal-qasource to review and submit a greater testing assessment to us as to what combinations are possible from your comment just above, #111858 (comment)

joshdover · 2021-12-07T14:01:34Z

Going to close this issue as the implementation work here is done. If needed, please open a new issue for testing or continue discussing right here.

dikshachauhan-qasource · 2021-12-08T13:08:28Z

Hi @EricDavisX

We have attempted it to validate as per steps mentioned and shared observations on related testing ticket : #120726

Thanks
QAS

joshdover added enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team required-for-8.0 This work is required to be done before 8.0 lands, bc it relates to a breaking change or similar. labels Sep 10, 2021

joshdover mentioned this issue Sep 14, 2021

[Fleet] Add support for bundling Stack-version aligned packages with Kibana #112095

Closed

This was referenced Oct 12, 2021

[7.16] [Fleet] Enable auto-upgrades for Stack-aligned packages during Fleet setup #112831

Closed

Allow elastic/fleet-server to call appropriate Fleet APIs #113932

Merged

joshdover changed the title ~~[Fleet] Initiate Fleet setup on boot, phase 1~~ [Fleet] Initiate Fleet setup on boot Oct 21, 2021

This was referenced Oct 29, 2021

[Fleet] Fleet server setup instructions do not show if there is an unenrolled Fleet Server agent #105480

Closed

Adding integrations taking a long time in a new space #117015

Closed

jen-huang added the v8.0.0 label Nov 2, 2021

kpollich mentioned this issue Nov 3, 2021

[Change Proposal] Add package_policy_upgrade_strategy field to support Fleet upgrade behavior elastic/package-spec#244

Closed

jen-huang assigned joshdover and kpollich Nov 3, 2021

kpollich mentioned this issue Nov 10, 2021

[Fleet] Move Fleet Setup to start lifecycle #117552

Merged

2 tasks

axw mentioned this issue Nov 14, 2021

Adapt to breaking changes in 8.0 elastic/apm-integration-testing#1312

Merged

kpollich mentioned this issue Nov 16, 2021

[Fleet] Add tests for rolling back versions of managed packages. #118797

Closed

joshdover mentioned this issue Nov 17, 2021

[Fleet] Evaluate Fleet page load performance and steps to improve #118751

Closed

kpollich mentioned this issue Nov 17, 2021

[Fleet] Fix Fleet Setup to handle concurrent calls across nodes in HA Kibana deployment #118423

Closed

9 tasks

kpollich mentioned this issue Nov 19, 2021

[Fleet] Update logic for "Keep policies up to date" defaults in 8.0 #119126

Merged

kpollich mentioned this issue Nov 30, 2021

[Fleet] Wire Fleet setup status to core Kibana status API #120020

Merged

nchaulet mentioned this issue Dec 1, 2021

[Fleet] Use predefined id for default output #120158

Merged

kpollich mentioned this issue Dec 2, 2021

[Fleet] Kibana Status should be Unavailable when Fleet Setup fails #120237

Open

joshdover closed this as completed Dec 7, 2021

dikshachauhan-qasource mentioned this issue Dec 8, 2021

Test ticket for Initiating Fleet setup on boot #120726

Closed

joshdover mentioned this issue Jan 6, 2022

Provide a bulk API for creating ingest assets elastic/elasticsearch#77505

Closed

joshdover mentioned this issue Jan 24, 2022

[FTR] Start dockerServers before ES & Kibana #123592

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Initiate Fleet setup on boot #111858

[Fleet] Initiate Fleet setup on boot #111858

joshdover commented Sep 10, 2021 •

edited by kpollich

Loading

elasticmachine commented Sep 10, 2021

kpollich commented Sep 14, 2021

joshdover commented Nov 10, 2021 •

edited

Loading

kpollich commented Nov 10, 2021

nchaulet commented Nov 10, 2021

joshdover commented Nov 18, 2021

kpollich commented Dec 1, 2021 •

edited

Loading

joshdover commented Dec 2, 2021

kpollich commented Dec 2, 2021

kpollich commented Dec 6, 2021 •

edited

Loading

EricDavisX commented Dec 6, 2021

joshdover commented Dec 7, 2021

dikshachauhan-qasource commented Dec 8, 2021

[Fleet] Initiate Fleet setup on boot #111858

[Fleet] Initiate Fleet setup on boot #111858

Comments

joshdover commented Sep 10, 2021 • edited by kpollich Loading

Open questions

elasticmachine commented Sep 10, 2021

kpollich commented Sep 14, 2021

joshdover commented Nov 10, 2021 • edited Loading

kpollich commented Nov 10, 2021

nchaulet commented Nov 10, 2021

joshdover commented Nov 18, 2021

kpollich commented Dec 1, 2021 • edited Loading

joshdover commented Dec 2, 2021

kpollich commented Dec 2, 2021

kpollich commented Dec 6, 2021 • edited Loading

Managed Packages and Kibana Downgrades - Manual Test Instructions

Manual Testing Procedure

Start up a 7.15 environment

Ensure Kibana is set up for on-disk backups

Install additional managed packages

Confirm integration versions for managed packages

Snapshot your Kibana data

Register a repository

Upgrade to a 7.16 environment

Run Fleet setup again

Confirm integration versions for managed packages

Roll back to 7.15

Confirm integration versions for managed packages

EricDavisX commented Dec 6, 2021

joshdover commented Dec 7, 2021

dikshachauhan-qasource commented Dec 8, 2021

joshdover commented Sep 10, 2021 •

edited by kpollich

Loading

joshdover commented Nov 10, 2021 •

edited

Loading

kpollich commented Dec 1, 2021 •

edited

Loading

kpollich commented Dec 6, 2021 •

edited

Loading