Skip to content

Commit

Permalink
Add CHANGELOG entry #130
Browse files Browse the repository at this point in the history
Signed-off-by: Thomas Druez <[email protected]>
  • Loading branch information
tdruez committed Nov 22, 2021
1 parent aad3d41 commit bff6e05
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 4 deletions.
19 changes: 19 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,25 @@ Changelog
Unreleased
----------

- Synchronize QUEUED and RUNNING pipeline runs with their related worker jobs during
worker maintenance tasks scheduled every 10 minutes.
If a container was taken down while a pipeline was running, or if pipeline process
was killed unexpectedly, that pipeline run status will be updated to a FAILED state
during the next maintenance tasks.
QUEUED pipeline will be restored in the queue as the worker redis cache backend data
is now persistent and reloaded on starting the image.
Note that internaly, a running job emits a "heartbeat" every 60 seconds to let all the
workers know that it is properly running.
After 90 seconds without any heartbeats, a worker will determine that the job is not
active anymore and that job will be moved to the failed registry during the worker
maintenance tasks. The pipeline run will be updated as well to reflect this failure
in the Web UI, the REST API, and the command line interface.
https:/nexB/scancode.io/issues/130

- Enable redis data persistence using the "Append Only File" with the default policy of
fsync every second in the docker-compose.
https:/nexB/scancode.io/issues/130

- Add a new tutorial chapter about license policies and compliance alerts.
https:/nexB/scancode.io/issues/337

Expand Down
16 changes: 14 additions & 2 deletions scancodeio/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,23 @@ class ScanCodeIOWorker(Worker):
def run_maintenance_tasks(self):
"""
Add Runs and Jobs synchronization to the periodic maintenance tasks.
Maintenance tasks should run on first worker startup or every 10 minutes.
During the maintenance, one of the worker will acquire a "cleaning lock" and
will run the registries cleanup.
During that cleanup, started Jobs that haven't sent a heartbeat in the past 90
seconds (job_monitoring_interval + 60) will be considered failed and will be
moved to the FailedJobRegistry.
This happens when the Job process is killed (voluntary or not) and the heartbeat
is the RQ approach to determine if the job is stills active.
The `sync_runs_and_jobs` will see this Job as failed and will update its related
Run accordingly.
"""
super().run_maintenance_tasks()

# The synchronization needs to be executed after the `self.clean_registries()`
# that takes place in the in the parent `super().run_maintenance_tasks()`.
# The Runs and Jobs synchronization needs to be executed after the
# `self.clean_registries()` that takes place in the in the parent
# `super().run_maintenance_tasks()`.
scanpipe_app.sync_runs_and_jobs()


Expand Down
3 changes: 3 additions & 0 deletions scanpipe/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ def ready(self):

# In SYNC mode, the Run instances cleanup is triggered on app.ready()
# only when the app is started through "runserver".
# This cleanup is required if the a running pipeline process gets killed and
# since KeyboardInterrupt cannot be captured to properly update the Run instance
# before its running process death.
# In ASYNC mode, the cleanup is handled by the "ScanCodeIOWorker" worker.
if not settings.SCANCODEIO_ASYNC and "runserver" in sys.argv:
self.sync_runs_and_jobs()
Expand Down
7 changes: 5 additions & 2 deletions scanpipe/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1052,11 +1052,14 @@ def execute_task_async(self):

def sync_with_job(self):
"""
Synchronise the `self` Run instance with its related RQ Job.
Synchronise this Run instance with its related RQ Job.
This is require when a Run gets out of sync with its Job, this can happen
This is required when a Run gets out of sync with its Job, this can happen
when the worker or one of its processes is killed, the Run status is not
properly updated and may stay in a Queued or Running state forever.
In case the Run is out of sync of its related Job, the Run status will be
updated accordingly. When the run was in the queue, it will be enqueued again.
"""
RunStatus = self.Status

Expand Down

0 comments on commit bff6e05

Please sign in to comment.