Tekton modifies objects in the informer cache #2734

mattmoor · 2020-06-03T02:14:02Z

Expected Behavior

Tekton should .DeepCopy() any resource that is fetched from an informer's Lister, otherwise the resource in the underlying cache is altered and bad things result (TBH, I'm not even sure of the extent, but this is a "no no").

Actual Behavior

In a few places Tekton updates the resource returned from the informer cache.

Steps to Reproduce the Problem

I outlined this a bit here: #2729 (comment), and also highlighted a few of the problem spots.

The text was updated successfully, but these errors were encountered:

mattmoor · 2020-06-03T02:15:24Z

/kind bug

I'm currently looking at this in spare cycles, but would be open to handing off to someone with more bandwidth to run down the issues this has dislodged.

mattmoor · 2020-06-03T03:43:57Z

One complication here is that the single control loop is performing multiple Lister reads -> Client writes, but given that the fakes aren't updating the informers index during the client writes the second Lister read -> Client write reads a stale version of the object from the Lister cache.

mattmoor · 2020-06-03T04:02:01Z

One partial solution to the above (since the same will happen during normal reconciliation) is to use a Patch like this, which bypasses the stale read and optimistic concurrency checks to update the labels and annotations:

-               newPr.ObjectMeta.Labels = pr.ObjectMeta.Labels
-               newPr.ObjectMeta.Annotations = pr.ObjectMeta.Annotations
-               return c.PipelineClientSet.TektonV1beta1().PipelineRuns(pr.Namespace).Update(newPr)
+               mergePatch := map[string]interface{}{
+                       "metadata": map[string]interface{}{
+                               "labels":      pr.ObjectMeta.Labels,
+                               "annotations": pr.ObjectMeta.Annotations,
+                       },
+               }
+               patch, err := json.Marshal(mergePatch)
+               if err != nil {
+                       return nil, err
+               }
+               return c.PipelineClientSet.TektonV1beta1().PipelineRuns(pr.Namespace).Patch(pr.Name, types.MergePatchType, patch)

I'm working through some other test failures as well, at least one of which is a test bug that was masked by this bug.

In each of the `{Pipeline,Task}Run` reconcilers the functions to update status and labels/annotations refetch the resource from the informer cache, check the field they want to update, and if an update is needed they set the field on the informer's copy and call the appropriate update method. In pseudo-code: ```go func update(fr *FooRun) { newFr := lister.Get(fr.Name) if reflect.DeepEqual(newFr.Field, fr.Field) { newFr.Field = fr.Field // This modified the informer's copy! return client.Update(newFr) } } ``` I have worked around this in two different ways: 1. For the status updates I added a line like `newFr = newFr.DeepCopy()` immediately above the mutation to avoid writing to the informer's copy. 2. For the label/annotation updates, I changed the `Update` call to a `Patch` that bypasses optimistic concurrency checks. This last bit is important because otherwise the update above will lead to the first reconciliation *always* failing due to `resourceVersion` skew caused by the status update. This also works around some fun interactions with the test code (see fixed issue). There are two other notable aspects to this change: 1. Test bugs! There were a good number of places that were assuming that the object stored in the informer was altered. I changed most of these to refetch through the client. 2. D-Fence! I added some logic to some of the common test setup code to `DeepCopy()` resources before feeding them to the fake clients to try and avoid assumptions about "same object" creeping back in. It is also worth calling out that this change will very likely destabilize the metric that I identified [here](tektoncd#2729) as racy, which is likely masked by the mutation of the informer copies. Fixes: tektoncd#2734

In each of the `{Pipeline,Task}Run` reconcilers the functions to update status and labels/annotations refetch the resource from the informer cache, check the field they want to update, and if an update is needed they set the field on the informer's copy and call the appropriate update method. In pseudo-code: ```go func update(fr *FooRun) { newFr := lister.Get(fr.Name) if reflect.DeepEqual(newFr.Field, fr.Field) { newFr.Field = fr.Field // This modified the informer's copy! return client.Update(newFr) } } ``` I have worked around this in two different ways: 1. For the status updates I added a line like `newFr = newFr.DeepCopy()` immediately above the mutation to avoid writing to the informer's copy. 2. For the label/annotation updates, I changed the `Update` call to a `Patch` that bypasses optimistic concurrency checks. This last bit is important because otherwise the update above will lead to the first reconciliation *always* failing due to `resourceVersion` skew caused by the status update. This also works around some fun interactions with the test code (see fixed issue). There are two other notable aspects to this change: 1. Test bugs! There were a good number of places that were assuming that the object stored in the informer was altered. I changed most of these to refetch through the client. 2. D-Fence! I added some logic to some of the common test setup code to `DeepCopy()` resources before feeding them to the fake clients to try and avoid assumptions about "same object" creeping back in. It is also worth calling out that this change will very likely destabilize the metric that I identified [here](#2729) as racy, which is likely masked by the mutation of the informer copies. Fixes: #2734

tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 3, 2020

mattmoor mentioned this issue Jun 3, 2020

Avoid modifications to the informer's copy of resources. #2736

Merged

tekton-robot closed this as completed in #2736 Jun 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tekton modifies objects in the informer cache #2734

Tekton modifies objects in the informer cache #2734

mattmoor commented Jun 3, 2020

mattmoor commented Jun 3, 2020

mattmoor commented Jun 3, 2020

mattmoor commented Jun 3, 2020

Tekton modifies objects in the informer cache #2734

Tekton modifies objects in the informer cache #2734

Comments

mattmoor commented Jun 3, 2020

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

mattmoor commented Jun 3, 2020

mattmoor commented Jun 3, 2020

mattmoor commented Jun 3, 2020