Add CHANGELOG entry #130

Signed-off-by: Thomas Druez <[email protected]>
aboutcode-org · Nov 22, 2021 · bff6e05 · bff6e05
1 parent aad3d41
commit bff6e05
Show file tree

Hide file tree

Showing 4 changed files with 41 additions and 4 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -4,6 +4,25 @@ Changelog
 Unreleased
 ----------
 
+- Synchronize QUEUED and RUNNING pipeline runs with their related worker jobs during
+ worker maintenance tasks scheduled every 10 minutes.
+ If a container was taken down while a pipeline was running, or if pipeline process
+ was killed unexpectedly, that pipeline run status will be updated to a FAILED state
+ during the next maintenance tasks.
+ QUEUED pipeline will be restored in the queue as the worker redis cache backend data
+ is now persistent and reloaded on starting the image.
+ Note that internaly, a running job emits a "heartbeat" every 60 seconds to let all the
+ workers know that it is properly running.
+ After 90 seconds without any heartbeats, a worker will determine that the job is not
+ active anymore and that job will be moved to the failed registry during the worker
+ maintenance tasks. The pipeline run will be updated as well to reflect this failure
+ in the Web UI, the REST API, and the command line interface.
+ https:/nexB/scancode.io/issues/130
+
+- Enable redis data persistence using the "Append Only File" with the default policy of
+ fsync every second in the docker-compose.
+ https:/nexB/scancode.io/issues/130
+
 - Add a new tutorial chapter about license policies and compliance alerts.
  https:/nexB/scancode.io/issues/337
 

diff --git a/scancodeio/worker.py b/scancodeio/worker.py
@@ -37,11 +37,23 @@ class ScanCodeIOWorker(Worker):
  def run_maintenance_tasks(self):
  """
  Add Runs and Jobs synchronization to the periodic maintenance tasks.
+ Maintenance tasks should run on first worker startup or every 10 minutes.
+
+ During the maintenance, one of the worker will acquire a "cleaning lock" and
+ will run the registries cleanup.
+ During that cleanup, started Jobs that haven't sent a heartbeat in the past 90
+ seconds (job_monitoring_interval + 60) will be considered failed and will be
+ moved to the FailedJobRegistry.
+ This happens when the Job process is killed (voluntary or not) and the heartbeat
+ is the RQ approach to determine if the job is stills active.
+ The `sync_runs_and_jobs` will see this Job as failed and will update its related
+ Run accordingly.
  """
  super().run_maintenance_tasks()
 
- # The synchronization needs to be executed after the `self.clean_registries()`
- # that takes place in the in the parent `super().run_maintenance_tasks()`.
+ # The Runs and Jobs synchronization needs to be executed after the
+ # `self.clean_registries()` that takes place in the in the parent
+ # `super().run_maintenance_tasks()`.
  scanpipe_app.sync_runs_and_jobs()
 
 

diff --git a/scanpipe/apps.py b/scanpipe/apps.py
@@ -67,6 +67,9 @@ def ready(self):
 
  # In SYNC mode, the Run instances cleanup is triggered on app.ready()
  # only when the app is started through "runserver".
+ # This cleanup is required if the a running pipeline process gets killed and
+ # since KeyboardInterrupt cannot be captured to properly update the Run instance
+ # before its running process death.
  # In ASYNC mode, the cleanup is handled by the "ScanCodeIOWorker" worker.
  if not settings.SCANCODEIO_ASYNC and "runserver" in sys.argv:
  self.sync_runs_and_jobs()

diff --git a/scanpipe/models.py b/scanpipe/models.py
@@ -1052,11 +1052,14 @@ def execute_task_async(self):
 
  def sync_with_job(self):
  """
- Synchronise the `self` Run instance with its related RQ Job.
+ Synchronise this Run instance with its related RQ Job.
 
- This is require when a Run gets out of sync with its Job, this can happen
+ This is required when a Run gets out of sync with its Job, this can happen
  when the worker or one of its processes is killed, the Run status is not
  properly updated and may stay in a Queued or Running state forever.
+
+ In case the Run is out of sync of its related Job, the Run status will be
+ updated accordingly. When the run was in the queue, it will be enqueued again.
  """
  RunStatus = self.Status