provide caps on task manager configuration settings #56573

pmuellr · 2020-02-01T00:23:59Z

Task manager currently exposes a number of configuration properties that could cause systemic problems if set to extreme values.

Here are the current config properties:

kibana/x-pack/plugins/task_manager/server/config.ts

Lines 12 to 45 in 0a17cde

 export const configSchema = schema.object({ 

 enabled: schema.boolean({ defaultValue: true }), 

 /* The maximum number of times a task will be attempted before being abandoned as failed */ 

 max_attempts: schema.number({ 

 defaultValue: 3, 

 min: 1, 

 }), 

 /* How often, in milliseconds, the task manager will look for more work. */ 

 poll_interval: schema.number({ 

 defaultValue: DEFAULT_POLL_INTERVAL, 

 min: 100, 

 }), 

 /* How many requests can Task Manager buffer before it rejects new requests. */ 

 request_capacity: schema.number({ 

 // a nice round contrived number, feel free to change as we learn how it behaves 

 defaultValue: 1000, 

 min: 1, 

 }), 

 /* The name of the index used to store task information. */ 

 index: schema.string({ 

 defaultValue: '.kibana_task_manager', 

 validate: val => { 

 if (val.toLowerCase() === '.tasks') { 

 return `"${val}" is an invalid Kibana Task Manager index, as it is already in use by the ElasticSearch Tasks Manager`; 

 } 

 }, 

 }), 

 /* The maximum number of tasks that this Kibana instance will run simultaneously. */ 

 max_workers: schema.number({ 

 defaultValue: DEFAULT_MAX_WORKERS, 

 // disable the task manager rather than trying to specify it with 0 workers 

 min: 1, 

 }), 

 });

Issue #56478 specifically calls out max_workers and poll_interval to whitelist on the cloud, and so these properties are ones we should consider capping first, to keep customers from shooting themselves in the foot.

Suggestions:

max_workers: should have max: 100
poll_interval: current min: 100; suggest changing it to 1000 (1 second)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-02-01T00:24:02Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

gmmorris · 2020-02-03T12:27:44Z

Not sure why we'd want to raise the min of poll_interval to 1000.
When running perf test I see a lot of dead time waiting for the interval as many queries often take far less that a second to run, so allowing this number to be dialed down makes sense to me.

pmuellr · 2020-02-03T16:18:36Z

Running the polling 10x/second seems like it's going to be overkill, if it's running at that speed all the time.

Seems like we're going to eventually want some kind of adaptive story. It would be fine to run 10x/second if there's a ton of short-running task to process, and they need to be processed in a timely fashion, but if you don't have anything scheduled for the next 10 minutes, that's a lot of ES calls all resulting in empty results.

I've used polling systems that adapt to "idle" situations by polling differently or less often. We could also optimistically take into account the time the next task actually has to run. If it's an hour from now, we could certainly scale back the amount of polling.

I think we'll get a better sense for whether we need something like this, once we instrument TM a bit more (telemetry &| logging).

It's probably also true that the built-in "lag" from the polling interval provides a bit of "spreading out" a stampeding herd of new tasks.

mikecote · 2020-12-08T17:05:01Z

It was discussed to deprecate the values above our limit and then in 8.0 to remove support. Log some sort of warning until 8.0, create follow up issue for cleanup.

…limit resolves elastic#56573 In this PR we create a new task manager limit on the config property `xpack.task_manager.max_workers` of 100, but only log a deprecation warning if that property exceeds the limit. We'll enforce the limit in 8.0. The rationale is that it's unlikely going to be useful to run with more than some number of workers, due to the amount of simultaneous work that would end up happening. In practice, too many workers can slow things down more than speed them up. We're setting the limit to 100 for now, but may increase / decrease it based on further research.

…limit (#85574) resolves #56573 In this PR we create a new task manager limit on the config property `xpack.task_manager.max_workers` of 100, but only log a deprecation warning if that property exceeds the limit. We'll enforce the limit in 8.0. The rationale is that it's unlikely going to be useful to run with more than some number of workers, due to the amount of simultaneous work that would end up happening. In practice, too many workers can slow things down more than speed them up. We're setting the limit to 100 for now, but may increase / decrease it based on further research.

…limit (elastic#85574) resolves elastic#56573 In this PR we create a new task manager limit on the config property `xpack.task_manager.max_workers` of 100, but only log a deprecation warning if that property exceeds the limit. We'll enforce the limit in 8.0. The rationale is that it's unlikely going to be useful to run with more than some number of workers, due to the amount of simultaneous work that would end up happening. In practice, too many workers can slow things down more than speed them up. We're setting the limit to 100 for now, but may increase / decrease it based on further research.

…limit (#85574) (#85872) resolves #56573 In this PR we create a new task manager limit on the config property `xpack.task_manager.max_workers` of 100, but only log a deprecation warning if that property exceeds the limit. We'll enforce the limit in 8.0. The rationale is that it's unlikely going to be useful to run with more than some number of workers, due to the amount of simultaneous work that would end up happening. In practice, too many workers can slow things down more than speed them up. We're setting the limit to 100 for now, but may increase / decrease it based on further research.

pmuellr added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Feb 1, 2020

pmuellr mentioned this issue Feb 1, 2020

Allow-list Task Manager configuration in cloud #56478

Closed

gmmorris self-assigned this Feb 3, 2020

mikecote unassigned gmmorris Dec 7, 2020

pmuellr self-assigned this Dec 8, 2020

pmuellr mentioned this issue Dec 10, 2020

[task manager] provide warning when setting max_workers greater than limit #85574

Merged

8 tasks

pmuellr closed this as completed in #85574 Dec 14, 2020

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provide caps on task manager configuration settings #56573

provide caps on task manager configuration settings #56573

pmuellr commented Feb 1, 2020

elasticmachine commented Feb 1, 2020

gmmorris commented Feb 3, 2020

pmuellr commented Feb 3, 2020

mikecote commented Dec 8, 2020

provide caps on task manager configuration settings #56573

provide caps on task manager configuration settings #56573

Comments

pmuellr commented Feb 1, 2020

elasticmachine commented Feb 1, 2020

gmmorris commented Feb 3, 2020

pmuellr commented Feb 3, 2020

mikecote commented Dec 8, 2020