-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set celery task limits and fix worker timeout #1486
Conversation
@aarontp removing '--pool=solo' from worker.py:253 should enable time limits and avoid the blocking issues we were experiencing. |
Indeed, that worked, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, leaving this for you to make any changes based on our chat. Thanks for the PR!
PTAL, I added the soft limit exception to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Set celery task limits and fix worker timeout * Remove solo pool and change concurrency=1 * Add soft/hard limit buffers * Fix server timeout * Late import for psutil * small fixes, tests * fix run tests * test format string * Fix process_result test * fix execute test * revert unnecessary config changes * Handle soft timeout exception in execute() * update timeout message * Yaaaaaaaapf
* Added the TurbiniaRequest hashed object in Redis * Converted TurbiniaTasks to hash objects in Redis * Made get_request_data more efficient * Made get_requests_summary more efficient * Fixed some issues in request_status * Remove GCP dependencies (#1440) * Remove gcp dependencies * Update dockerfiles * Update dockerfiles * Update gcp error reporting * Updates to formatting * Add unit test * Update unit test * Clean up * Update unit test * Update error reporting * Update file * Update config template * Catch exception * Updates * fix lint * Lint fixes * Updates * Updates * Various updates and fixes * Updates * --- (#1483) updated-dependencies: - dependency-name: requests dependency-type: indirect dependency-group: pip - dependency-name: requests dependency-type: indirect dependency-group: pip ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updates * Update typo * Updates to unit tests * Updates to unit tests and linter fixes * Update table width UI * Bump the pip group across 2 directories with 1 update (#1487) Bumps the pip group with 1 update in the / directory: [requests](https:/psf/requests). Bumps the pip group with 1 update in the /turbinia/api/cli directory: [requests](https:/psf/requests). Updates `requests` from 2.32.0 to 2.32.3 - [Release notes](https:/psf/requests/releases) - [Changelog](https:/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.0...v2.32.3) Updates `requests` from 2.32.0 to 2.32.3 - [Release notes](https:/psf/requests/releases) - [Changelog](https:/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.0...v2.32.3) --- updated-dependencies: - dependency-name: requests dependency-type: indirect dependency-group: pip - dependency-name: requests dependency-type: indirect dependency-group: pip ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix small UI bug * Minor typos/errors * Set celery task limits and fix worker timeout (#1486) * Set celery task limits and fix worker timeout * Remove solo pool and change concurrency=1 * Add soft/hard limit buffers * Fix server timeout * Late import for psutil * small fixes, tests * fix run tests * test format string * Fix process_result test * fix execute test * revert unnecessary config changes * Handle soft timeout exception in execute() * update timeout message * Yaaaaaaaapf * Review fixes * Updates and yapf fix * Minor updates * Update docstrings * Updates to evidence_size * Updates * Change log level for message * Lint * bug fixes * Address review comments * Fix docstrings * Minor UI update --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Igor Rodrigues <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Aaron Peterson <[email protected]>
* Added the TurbiniaRequest hashed object in Redis * Converted TurbiniaTasks to hash objects in Redis * Made get_request_data more efficient * Made get_requests_summary more efficient * Fixed some issues in request_status * Remove GCP dependencies (#1440) * Remove gcp dependencies * Update dockerfiles * Update dockerfiles * Update gcp error reporting * Updates to formatting * Add unit test * Update unit test * Clean up * Update unit test * Update error reporting * Update file * Update config template * Catch exception * Updates * fix lint * Lint fixes * Updates * Updates * Various updates and fixes * Updates * --- (#1483) updated-dependencies: - dependency-name: requests dependency-type: indirect dependency-group: pip - dependency-name: requests dependency-type: indirect dependency-group: pip ... * Updates * Update typo * Updates to unit tests * Updates to unit tests and linter fixes * Update table width UI * Bump the pip group across 2 directories with 1 update (#1487) Bumps the pip group with 1 update in the / directory: [requests](https:/psf/requests). Bumps the pip group with 1 update in the /turbinia/api/cli directory: [requests](https:/psf/requests). Updates `requests` from 2.32.0 to 2.32.3 - [Release notes](https:/psf/requests/releases) - [Changelog](https:/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.0...v2.32.3) Updates `requests` from 2.32.0 to 2.32.3 - [Release notes](https:/psf/requests/releases) - [Changelog](https:/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.0...v2.32.3) --- updated-dependencies: - dependency-name: requests dependency-type: indirect dependency-group: pip - dependency-name: requests dependency-type: indirect dependency-group: pip ... * Fix small UI bug * Minor typos/errors * Set celery task limits and fix worker timeout (#1486) * Set celery task limits and fix worker timeout * Remove solo pool and change concurrency=1 * Add soft/hard limit buffers * Fix server timeout * Late import for psutil * small fixes, tests * fix run tests * test format string * Fix process_result test * fix execute test * revert unnecessary config changes * Handle soft timeout exception in execute() * update timeout message * Yaaaaaaaapf * Review fixes * Updates and yapf fix * Minor updates * Update docstrings * Updates to evidence_size * Updates * Change log level for message * Lint * bug fixes * Address review comments * Fix docstrings * Minor UI update --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Igor Rodrigues <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Aaron Peterson <[email protected]>
Description of the change
SINGLE_RUN
feature and config option.'--pool=solo
to--concurrency=1
as a solo pool executes tasks directly instead of forking and cannot process celery timeouts.After this change there will be four different timeouts that can occur:
TurbiniaTask.execute()
and uses the timeout values set in the Job config in the turbinia config file.celery.exceptions.SoftTimeLimitExceeded
exception to the worker and should be caught in theTurbiniaTask.execute()
orTurbiniaTask.run_wrapper()
methods depending on the state of the Task. This is set slightly longer than the Turbinia workout timeout to allow the worker to clean-up and give results when possible. The buffer is defined bytask_manager.CELERY_SOFT_TIMEOUT_BUFFER
.task_manager.CELERY_HARD_TIMEOUT_BUFFER
), celery is configured to kill the worker running the task. This is not catchable so we will not get any results back from the worker so the server will also need to time out the task.task_manager.SERVER_TASK_TIMEOUT_BUFFER
seconds after the Task timeout. This buffer is large because the timer starts from when the task is enqueued and can't account for any scheduling and other wait time. This is to avoid timing out valid tasks due to worker congestion.Applicable issues
Additional information
Checklist