Share thread pools that have similar purposes. #12939

jpountz · 2015-08-17T15:16:01Z

Because we have thread pools for almost everything, even if each of them has a
reasonable size, the total number of threads that elasticsearch creates is
high-ish. For instance, with 8 processors, elasticsearch creates between 58
(only fixed thread pools) and 111 threads (including fixed and scaling pools).
With this change, the numbers go down to 33/59.

Ideally the SEARCH and GET thread pools should be the same, but I couldn't do
it now given that some SEARCH requests block on GET requests in order to
retrieve indexed scripts or geo shapes. So they are still separate pools for
now.

However, the INDEX, BULK, REFRESH and FLUSH thread pools have been merged into
a single WRITE thread pool, the SEARCH, PERCOLATE and SUGGEST have been merged
into a single READ thread pool and FETCH_SHARD_STARTED and FETCH_SHARD_STORE
have been merged into FETCH_SHARD. Also the WARMER pool has been removed: it
was useful to parallelize fielddata loading but now that we have doc values by
default, we can make things simpler by just loading them in the current thread.

Close #12666

Because we have thread pools for almost everything, even if each of them has a reasonable size, the total number of threads that elasticsearch creates is high-ish. For instance, with 8 processors, elasticsearch creates between 58 (only fixed thread pools) and 111 threads (including fixed and scaling pools). With this change, the numbers go down to 33/59. Ideally the SEARCH and GET thread pools should be the same, but I couldn't do it now given that some SEARCH requests block on GET requests in order to retrieve indexed scripts or geo shapes. So they are still separate pools for now. However, the INDEX, BULK, REFRESH and FLUSH thread pools have been merged into a single WRITE thread pool, the SEARCH, PERCOLATE and SUGGEST have been merged into a single READ thread pool and FETCH_SHARD_STARTED and FETCH_SHARD_STORE have been merged into FETCH_SHARD. Also the WARMER pool has been removed: it was useful to parallelize fielddata loading but now that we have doc values by default, we can make things simpler by just loading them in the current thread. Close elastic#12666

nik9000 · 2015-08-17T15:26:26Z

default, we can make things simpler by just loading them in the current thread.

That'd be on the merge or WRITE thread, right?

jpountz · 2015-08-17T15:31:48Z

Exactly.

nik9000 · 2015-08-17T16:00:40Z

core/src/main/java/org/elasticsearch/indices/IndicesWarmer.java

- logger.warn("warming has been interrupted", e);
- }
- break;
+ listener.warmNewReaders(indexShard, indexMetaData, context);


I guess one sacrifice here is that the warmers will run in series instead of parallel now. That is probably OK unless someone has thousands of the the things and they take a long time to run - like on a newly merged segment. But I'm pretty sure the docs advise against having tons and tons of warmers anyway.

Right, and current changes will hopefully make warming faster, like doc-values by default (ES 2.0) or disk-based norms (Lucene 5.3).

nik9000 · 2015-08-17T16:12:43Z

LGTM

bleskes · 2015-08-18T19:02:10Z

I'm +1 on most of these merges (because of the simplification, I would personally like to understand more about the concerns due to the overhead of threads, especially for the scaling thread pools).

However, I am concerned that not having a dedicated BULK threadpool will cause operations to stall due to heavy indexing load. I would suggest leaving BULK as a separate Thread Pool and have WRITE be used for all "lite" write operations.

jpountz · 2015-08-19T08:13:03Z

Thanks for taking a look @bleskes. There are several reasons for reducing the number of threads:

reducing context switching
reducing memory usage (see Use JVM default stacksize #9135)
speeding up elasticsearch startup

These pools are only one part of the threads that elasticsearch creates, we also have transport threads, merge threads, a scheduling thread pool, ...

I understand your concerns about BULK vs. WRITE but the same could be said about SUGGEST vs. SEARCH or REFRESH vs. INDEX, or even long-running low-value search requests vs. short-running high-value search requests. If we want to try to give better latency to some operations, I think we should rather use priority queues than set up new thread pools?

bleskes · 2015-08-19T09:48:29Z

I understand your concerns about BULK vs. WRITE but the same could be said about SUGGEST vs. SEARCH or REFRESH vs. INDEX, or even long-running low-value search requests vs. short-running high-value search requests.

Agreed that these are the same tensions and it’s all about a good balance. I think the bulk thread pool is much more likely to be tasked with have load than the other ones. That dedicated bulk pool was added to users actually running into this starvation issue. We have similar protections in other places (like a dedicate recovery channel for small files). I’d hate to see a regression here…

If we want to try to give better latency to some operations, I think we should rather use priority queues than set up new thread pools?

Agreed that might a good idea in general, but it still wouldn’t solve the case were all threads of the pool are busy with a heavy task where a light one comes.

On 19 Aug 2015, at 10:13, Adrien Grand [email protected] wrote:

Thanks for taking a look @bleskes. There are several reasons for reducing the number of threads:

• reducing context switching
• reducing memory usage (see #9135)
• speeding up elasticsearch startup
These pools are only one part of the threads that elasticsearch creates, we also have transport threads, merge threads, a scheduling thread pool, ...

I understand your concerns about BULK vs. WRITE but the same could be said about SUGGEST vs. SEARCH or REFRESH vs. INDEX, or even long-running low-value search requests vs. short-running high-value search requests. If we want to try to give better latency to some operations, I think we should rather use priority queues than set up new thread pools?

—
Reply to this email directly or view it on GitHub.

jpountz · 2015-08-19T12:38:49Z

I added the INDEX threadpool back.

kimchy · 2015-08-19T12:46:47Z

I am hesitant with this change to be honest. Few examples:

Systems that rely on fast id based GET API, but also execute heavy search requests. This simple supposedly fast get by id be blocked and compete with search requests.
Fetch thread pools, they were separated intentionally (and they are scaling, they mostly matter on cluster restarts or node join/leave). Fetching store is typically slow, but fetching states is not. And fetching state is what makes a primary be allocated. We can end up with primaries not being allocates since they are waiting on fetching store of replicas.
Same reasoning from Boaz around index and bulk, though I see it was added back, but I am confused, since it is called INDEX and it is used for BULK?

jpountz · 2015-08-19T13:32:20Z

But then how do we reduce the number of threads that elasticsearch starts? For instance, I started elasticsearch on my 8-cores machine (single-node cluster) and even with moderate activity, I have:

16 http_server_worker
16 transport_client_worker
16 transport_server_worker
13 search
8 get
8 index
8 bulk
8 suggest
8 percolate
5 management
5 warmer
4 refresh
4 flush
4 listener
as well as ~10 more threads for varous purposes (http_server_boss, transport_client_timer, ttl_expire, master_mapping_updater, timer, scheduler, transport_server_boss, transport_client_boss, discovery#multicast#receiver, clusterService#updateTask)

Overall, this is more than 16 times the number of cores I have on my machine yet not all threadpools are active (eg. fetch_shard_started, optimize).

Same reasoning from Boaz around index and bulk, though I see it was added back, but I am confused, since it is called INDEX and it is used for BULK?

The "write" pool is still used for bulk in the PR, I just revived the "index" threadpool so that index/delete/update operations are not delayed byheavy bulk requests, which I think addresses Boaz's concerns?

kimchy · 2015-08-19T13:43:57Z

First, I am not sure if it is a problem? Many of our operations are IO heavy, like refresh or flush, the cost of a thread in today OS is light, compared to doing blocking IO for the actual operation. Having bulk operations compete with refresh doesn't sound right. Another example is completion suggester, that is supposed to provide results extremely fast, should it compete with "regular" search requests?

If we do think it is a problem, then we should come up with a better solution compared to folding all those thread pools. I am not sure what a better solution is compared to what we have today.

jpountz added >enhancement v2.1.0 :Internal labels Aug 17, 2015

nik9000 reviewed Aug 17, 2015
View reviewed changes

Update documentation.

b8bf16f

Add INDEX thread pool back.

0e6cd30

clintongormley added v2.2.0 and removed v2.1.0 labels Nov 20, 2015

spinscale added v2.3.0 and removed v2.2.0 labels Dec 23, 2015

jpountz closed this Jan 26, 2016

clintongormley removed the v2.3.0 label Jan 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share thread pools that have similar purposes. #12939

Share thread pools that have similar purposes. #12939

jpountz commented Aug 17, 2015

nik9000 commented Aug 17, 2015

jpountz commented Aug 17, 2015

nik9000 Aug 17, 2015

jpountz Aug 17, 2015

nik9000 commented Aug 17, 2015

bleskes commented Aug 18, 2015

jpountz commented Aug 19, 2015

bleskes commented Aug 19, 2015

jpountz commented Aug 19, 2015

kimchy commented Aug 19, 2015

jpountz commented Aug 19, 2015

kimchy commented Aug 19, 2015

Share thread pools that have similar purposes. #12939

Share thread pools that have similar purposes. #12939

Conversation

jpountz commented Aug 17, 2015

nik9000 commented Aug 17, 2015

jpountz commented Aug 17, 2015

nik9000 Aug 17, 2015

Choose a reason for hiding this comment

jpountz Aug 17, 2015

Choose a reason for hiding this comment

nik9000 commented Aug 17, 2015

bleskes commented Aug 18, 2015

jpountz commented Aug 19, 2015

bleskes commented Aug 19, 2015

jpountz commented Aug 19, 2015

kimchy commented Aug 19, 2015

jpountz commented Aug 19, 2015

kimchy commented Aug 19, 2015