The cluster scalar uses total memory instead of available memory. #2103

ejacox · 2018-03-02T21:05:35Z

The cluster scaler calculation to determine which machine types to start uses the total memory value. The Mesos slaves will return available memory, though, which might not be sufficient to run the jobs. This leads to the situation where the clusterScalar will not spin up additional machines because it thinks that there are enough, but some jobs will hang forever since the memory offered by Mesos is insufficient. This occurred in issue #2078, which was resolved by lowering memory requirements, but did not fix the underlying issue.

┆Issue is synchronized with this Jira Story
┆friendlyId: TOIL-233

DailyDreaming added the roadmap label Apr 15, 2020

This was referenced Aug 17, 2022

Allow workflows to work with data from S3 without actually copying it into the AWS JobStore #4147

Closed

Optimizations on creation of nodes given jobs? #4162

Closed

unito-bot assigned adamnovak Aug 17, 2022

adamnovak mentioned this issue Nov 2, 2022

Account for overhead in Cluster Scaler #4267

Merged

19 tasks

adamnovak closed this as completed in #4267 Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The cluster scalar uses total memory instead of available memory. #2103

The cluster scalar uses total memory instead of available memory. #2103

ejacox commented Mar 2, 2018 •

edited by unito-bot

Loading

The cluster scalar uses total memory instead of available memory. #2103

The cluster scalar uses total memory instead of available memory. #2103

Comments

ejacox commented Mar 2, 2018 • edited by unito-bot Loading

ejacox commented Mar 2, 2018 •

edited by unito-bot

Loading