Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share thread pools that have similar purposes. #12939

Closed
wants to merge 3 commits into from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Aug 17, 2015

Because we have thread pools for almost everything, even if each of them has a
reasonable size, the total number of threads that elasticsearch creates is
high-ish. For instance, with 8 processors, elasticsearch creates between 58
(only fixed thread pools) and 111 threads (including fixed and scaling pools).
With this change, the numbers go down to 33/59.

Ideally the SEARCH and GET thread pools should be the same, but I couldn't do
it now given that some SEARCH requests block on GET requests in order to
retrieve indexed scripts or geo shapes. So they are still separate pools for
now.

However, the INDEX, BULK, REFRESH and FLUSH thread pools have been merged into
a single WRITE thread pool, the SEARCH, PERCOLATE and SUGGEST have been merged
into a single READ thread pool and FETCH_SHARD_STARTED and FETCH_SHARD_STORE
have been merged into FETCH_SHARD. Also the WARMER pool has been removed: it
was useful to parallelize fielddata loading but now that we have doc values by
default, we can make things simpler by just loading them in the current thread.

Close #12666

Because we have thread pools for almost everything, even if each of them has a
reasonable size, the total number of threads that elasticsearch creates is
high-ish. For instance, with 8 processors, elasticsearch creates between 58
(only fixed thread pools) and 111 threads (including fixed and scaling pools).
With this change, the numbers go down to 33/59.

Ideally the SEARCH and GET thread pools should be the same, but I couldn't do
it now given that some SEARCH requests block on GET requests in order to
retrieve indexed scripts or geo shapes. So they are still separate pools for
now.

However, the INDEX, BULK, REFRESH and FLUSH thread pools have been merged into
a single WRITE thread pool, the SEARCH, PERCOLATE and SUGGEST have been merged
into a single READ thread pool and FETCH_SHARD_STARTED and FETCH_SHARD_STORE
have been merged into FETCH_SHARD. Also the WARMER pool has been removed: it
was useful to parallelize fielddata loading but now that we have doc values by
default, we can make things simpler by just loading them in the current thread.

Close elastic#12666
@nik9000
Copy link
Member

nik9000 commented Aug 17, 2015

default, we can make things simpler by just loading them in the current thread.

That'd be on the merge or WRITE thread, right?

@jpountz
Copy link
Contributor Author

jpountz commented Aug 17, 2015

Exactly.

logger.warn("warming has been interrupted", e);
}
break;
listener.warmNewReaders(indexShard, indexMetaData, context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one sacrifice here is that the warmers will run in series instead of parallel now. That is probably OK unless someone has thousands of the the things and they take a long time to run - like on a newly merged segment. But I'm pretty sure the docs advise against having tons and tons of warmers anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and current changes will hopefully make warming faster, like doc-values by default (ES 2.0) or disk-based norms (Lucene 5.3).

@nik9000
Copy link
Member

nik9000 commented Aug 17, 2015

LGTM

@bleskes
Copy link
Contributor

bleskes commented Aug 18, 2015

I'm +1 on most of these merges (because of the simplification, I would personally like to understand more about the concerns due to the overhead of threads, especially for the scaling thread pools).

However, I am concerned that not having a dedicated BULK threadpool will cause operations to stall due to heavy indexing load. I would suggest leaving BULK as a separate Thread Pool and have WRITE be used for all "lite" write operations.

@jpountz
Copy link
Contributor Author

jpountz commented Aug 19, 2015

Thanks for taking a look @bleskes. There are several reasons for reducing the number of threads:

These pools are only one part of the threads that elasticsearch creates, we also have transport threads, merge threads, a scheduling thread pool, ...

I understand your concerns about BULK vs. WRITE but the same could be said about SUGGEST vs. SEARCH or REFRESH vs. INDEX, or even long-running low-value search requests vs. short-running high-value search requests. If we want to try to give better latency to some operations, I think we should rather use priority queues than set up new thread pools?

@bleskes
Copy link
Contributor

bleskes commented Aug 19, 2015

I understand your concerns about BULK vs. WRITE but the same could be said about SUGGEST vs. SEARCH or REFRESH vs. INDEX, or even long-running low-value search requests vs. short-running high-value search requests.

Agreed that these are the same tensions and it’s all about a good balance. I think the bulk thread pool is much more likely to be tasked with have load than the other ones. That dedicated bulk pool was added to users actually running into this starvation issue. We have similar protections in other places (like a dedicate recovery channel for small files). I’d hate to see a regression here…

If we want to try to give better latency to some operations, I think we should rather use priority queues than set up new thread pools?

Agreed that might a good idea in general, but it still wouldn’t solve the case were all threads of the pool are busy with a heavy task where a light one comes.

On 19 Aug 2015, at 10:13, Adrien Grand [email protected] wrote:

Thanks for taking a look @bleskes. There are several reasons for reducing the number of threads:

• reducing context switching
• reducing memory usage (see #9135)
• speeding up elasticsearch startup
These pools are only one part of the threads that elasticsearch creates, we also have transport threads, merge threads, a scheduling thread pool, ...

I understand your concerns about BULK vs. WRITE but the same could be said about SUGGEST vs. SEARCH or REFRESH vs. INDEX, or even long-running low-value search requests vs. short-running high-value search requests. If we want to try to give better latency to some operations, I think we should rather use priority queues than set up new thread pools?


Reply to this email directly or view it on GitHub.

@jpountz
Copy link
Contributor Author

jpountz commented Aug 19, 2015

I added the INDEX threadpool back.

@kimchy
Copy link
Member

kimchy commented Aug 19, 2015

I am hesitant with this change to be honest. Few examples:

  • Systems that rely on fast id based GET API, but also execute heavy search requests. This simple supposedly fast get by id be blocked and compete with search requests.
  • Fetch thread pools, they were separated intentionally (and they are scaling, they mostly matter on cluster restarts or node join/leave). Fetching store is typically slow, but fetching states is not. And fetching state is what makes a primary be allocated. We can end up with primaries not being allocates since they are waiting on fetching store of replicas.
  • Same reasoning from Boaz around index and bulk, though I see it was added back, but I am confused, since it is called INDEX and it is used for BULK?

@jpountz
Copy link
Contributor Author

jpountz commented Aug 19, 2015

But then how do we reduce the number of threads that elasticsearch starts? For instance, I started elasticsearch on my 8-cores machine (single-node cluster) and even with moderate activity, I have:

  • 16 http_server_worker
  • 16 transport_client_worker
  • 16 transport_server_worker
  • 13 search
  • 8 get
  • 8 index
  • 8 bulk
  • 8 suggest
  • 8 percolate
  • 5 management
  • 5 warmer
  • 4 refresh
  • 4 flush
  • 4 listener
  • as well as ~10 more threads for varous purposes (http_server_boss, transport_client_timer, ttl_expire, master_mapping_updater, timer, scheduler, transport_server_boss, transport_client_boss, discovery#multicast#receiver, clusterService#updateTask)

Overall, this is more than 16 times the number of cores I have on my machine yet not all threadpools are active (eg. fetch_shard_started, optimize).

Same reasoning from Boaz around index and bulk, though I see it was added back, but I am confused, since it is called INDEX and it is used for BULK?

The "write" pool is still used for bulk in the PR, I just revived the "index" threadpool so that index/delete/update operations are not delayed byheavy bulk requests, which I think addresses Boaz's concerns?

@kimchy
Copy link
Member

kimchy commented Aug 19, 2015

First, I am not sure if it is a problem? Many of our operations are IO heavy, like refresh or flush, the cost of a thread in today OS is light, compared to doing blocking IO for the actual operation. Having bulk operations compete with refresh doesn't sound right. Another example is completion suggester, that is supposed to provide results extremely fast, should it compete with "regular" search requests?

If we do think it is a problem, then we should come up with a better solution compared to folding all those thread pools. I am not sure what a better solution is compared to what we have today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better share threadpools
6 participants