Fix issues with named fragment handling code #2619

pcmanus · 2023-06-07T14:59:05Z

This PR attempts to fix a number of remaining issues around the code to handle named fragments, namely at least:

the assertion failure reported in v1.20.0: Unexpected query planner behavior with fragments within an interface (Cannot add fragment of condition "..." to parent type "...") router#3227.
the assertion failure reported in v1.20.0: query planning of fragments yields Cannot add fragment of condition "..." to parent type "..." router#3217 (which is the same assertion failing than in the previous point, but on a slightly different code path and due to a different issue).
an issue whereby named fragments are reused in a rare situation where this is invalid, resulting in an invalid (in the sense of graphQL validations) subgraph fetch.

The 2 first point have repro on the linked tickets, but for the last point, the problem is that the graphQL validation for conflicting fields specified by FieldsInSetCanMerge traverses all fragments to look for conflicts, even "branches" that can be statically proven to be dead branches, but because the named fragment reuse code was taking those dead branch into account, it was sometimes using a named fragment that created a validation issue (in such a dead branch). See e1d1044 for an example of the problem with some explanations.

netlify · 2023-06-07T14:59:09Z

👷 Deploy request for apollo-federation-docs pending review.

Visit the deploys page to approve it

Name	Link
🔨 Latest commit	`28d9b92`

changeset-bot · 2023-06-07T14:59:10Z

🦋 Changeset detected

Latest commit: 28d9b92

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages

Name	Type
@apollo/query-planner	Patch
@apollo/federation-internals	Patch
@apollo/gateway	Patch
@apollo/composition	Patch
@apollo/query-graphs	Patch
@apollo/subgraph	Patch
apollo-federation-integration-testsuite	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

codesandbox-ci · 2023-06-07T14:59:57Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

clenfest · 2023-06-08T04:25:35Z

internals-js/src/operations.ts

+ // Returns a copy of this operation with the provided updated selection set. Optionally, a new set of fragments
+ // can be also provided. If `newFragments` is undefined, then the existing fragments will be reused, but if it
+ // `null`, then the new operation will have no fragments.
+ private withUpdatedSelectionSet(newSelectionSet: SelectionSet, newFragments?: NamedFragments | null): Operation {


I know we've done it a couple of times before, but I'm not totally sure how I feel about controlling the behavior of a function by passing null rather than undefined. It would be nice to make things more explicit.

Alright, I've split it into 2 separately named methods.

clenfest · 2023-06-08T04:29:42Z

internals-js/src/operations.ts

+ * any unecessary top-level inline fragments, possibly multiple layers of them, but we never recurse
+ * inside the sub-selection of an selection that is not removed by the normalization.
+ */
+ normalize(parentType: CompositeType, options?: { recursive? : boolean }): SelectionSet {


Rather than having named parameters for options, I would just make the whole function use named parameters and get rid of options.

I've changed it for the sake of not being too contrarian, but I'll admit not entirely seeing how this isn't almost entirely a stylistic preference.

clenfest · 2023-06-08T13:45:39Z

query-planner-js/src/buildPlan.ts

@@ -2806,7 +2788,6 @@ export class QueryPlanner {
 // going to expand everything during the algorithm anyway. We'll re-optimize subgraph fetches with fragments
 // later if possible (which is why we saved them above before expansion).
 operation = operation.expandAllFragments();


There is no reason to normalize at this point?

Not now that expandAllFragments always ensure it's output is normalized.

The naming of `trimUnsatisfiableBranches` was misleading because it does not just "remove unsatisfiable branches", it more generally remove things that are not-very-optimal (even if they are not full branches). So this rename this method to `normalize`, which more clearly explains then intent (on top of being shorter). Additionally, the method was not actually removing unsatisfiable branches in all cases, which made it a bit error prone to use in practice and led to some small inefficiencies in the `NamedFragmentDefinition.selectionAtType` method. So fixed that, making the method more regular and fixing the aforementioned inefficiency. This also force a normalisation after `expandAllFragments` because we were relying on it in practice so this is less fragile this way.

The QP code never truly rely on operation validation as it is assumed the operation passed to it have already been validated prior to being passed to the planner, but having the added validation helps a bit in other testing context when that pre-validation hasn't happened, and since this is fairly cheap...

The code for normalizing fields was not taking into account that if we normalize on a more precise type, this may make a field definition "more precise", which can impacts the type we should use to normalize the sub-selection of said field.

internals-js/src/__tests__/operations.test.ts

sachindshinde · 2023-06-10T08:09:26Z

internals-js/src/operations.ts

- notCoveredByFragments = notCoveredByFragments.minus(fragment.expandedSelectionSetAtType(parentType));
+ for (const { fragment, atType} of filteredApplyingFragments) {
+ const notCovered = subSelection.minus(atType.selectionSet);
+ if (atType.trimmed && !selectionSetsDoMerge(notCovered, atType.trimmed)) {


While this checks trimmed against the parts of the original sub-selection it hasn't already been checked against, presumably you also need to check the trimmed of each fragment against the trimmeds of other fragments you add.

That's a very good point. Added some testing for this case and fixed.

internals-js/src/operations.ts

sachindshinde · 2023-06-10T08:37:11Z

internals-js/src/operations.ts

+ selectionSet = typeRuntimes.length === conditionRuntimes.length
+ ? expandedSelectionSet
+ : expandedSelectionSet.filter((s) =>
+ s.kind !== 'FragmentSelection' || !s.element.typeCondition || typeRuntimes.some((t) => t.name == s.element.typeCondition?.name)


To check for intersection, shouldn't

typeRuntimes.some((t) => t.name == s.element.typeCondition?.name)

be

runtimeTypesIntersects(type, s.element.typeCondition)

instead? (Or at least, I don't think s.element.typeCondition is necessarily guaranteed to be an object type here.)

Yes, you're right; I brain-farted. I added test for this, and in the process realised the that I was using the filter method wrong as the one implemented was recursing inside sub-selection, which we don't want. Anyway, this was confusing and I actually renamed the existing filter into filterRecursiveDepthFirst to be extra clear and kept a simpler filter method which only apply to top-level (so this part of the code hasn't changed, but just explaining why there is some filter-related changes in the last commit).

sachindshinde · 2023-06-10T09:02:30Z

internals-js/src/operations.ts

 ));
 return newSet.toSelectionSet(this.parentType);
 }
 }

- return this.selectionSet === trimmedSelectionSet ? this : this.withUpdatedSelectionSet(trimmedSelectionSet);
+ return this.selectionSet === normalizedSelectionSet ? this : this.withUpdatedSelectionSet(normalizedSelectionSet);


I'm noticing that unlike FieldSelection.normalize() and InlineFragmentSelection.normalize() which update this.parentType to parentType, InlineFragmentSelection.normalize() doesn't seem to, e.g.:

The return this.selectionSet === normalizedSelectionSet ? this : this.withUpdatedSelectionSet(normalizedSelectionSet); doesn't update the element's parentType.

The return this.withUpdatedSelectionSet(selectionSetOfElement( ... ) above doesn't update the element's parentType.

The return newSet.toSelectionSet(this.parentType) above uses this.parentType instead of parentType.

There's a (this.element.typeCondition ?? this.parentType).typenameField()! call above instead of (thisCondition ?? parentType).typenameField()!,

Also I noticed that when you lift selections out of the fragment in this function, the parentType isn't accordingly updated for those selections.

Is this fine/expected? (It might be that the field is only used in certain cases, or maybe some other code requires it to not be changed, but I'm not familiar enough with the code here.)

No,I was being sloppy.

I mean, it is probably mostly fine in the grand scheme of stuff because when we build a SelectionSet with SelectionSetUpdates (which SelectionSet.normalize does through lazyMap), then it ensures the elements have the proper parent type by calling rebasOn, so I believe the selections still ended up with the proper parent type ultimately.

Still, that was sloppy and would be unexpected at least out of the Selection.normlize call and could cause issue someday, so fixed it.

you lift selections out of the fragment in this function, the parentType isn't accordingly updated for those selections

As mentioned above, the fact that we passed the lifted selections to a SelectionSetUpdates took care of updating the parent type of the elements (of course, as you pointed out, the toSelectionSet call for that SelectionSetUpdates was incorrect, so did fixed that, but just calling out that there is nothing additional to do).

pcmanus · 2023-06-12T14:03:11Z

Note: the node 20 circleCI check on this PR seems to have failed a bunch of times in a row, yet the error seems CI related. Check the link for detail, but the error is:

npm ERR! code 2
npm ERR! path /home/circleci/project/node_modules/@parcel/watcher
npm ERR! command failed
npm ERR! command sh -c node-gyp-build
npm ERR! sh: 1: node-gyp-build: Text file busy

during npm package installing, and for a PR that don't even change any dependency, so it screams "unrelated to the PR". Yet, I haven't (yet) seen this on other PRs and as said, I got this here a few time in a row now, so wondering if anyone has a clue.

sachindshinde

Thanks for making the fixes! There's a few small things below, but other than that LGTM

sachindshinde · 2023-06-13T03:53:14Z

internals-js/src/operations.ts

 }
 }

- return this.selectionSet === normalizedSelectionSet ? this : this.withUpdatedSelectionSet(normalizedSelectionSet);
+ return this.selectionSet === normalizedSelectionSet


I think this might need to be

this.parentType === parentType && this.selectionSet === normalizedSelectionSet

instead of

this.selectionSet === normalizedSelectionSet

if the first branch of the ternary conditional is going to return this.

Yes, good catch, thanks.

sachindshinde · 2023-06-13T04:22:27Z

internals-js/src/operations.ts

+ // have the same number of runtimes, then they have the same set of runtimes and no point doing any filtering.
+ const typeRuntimes = possibleRuntimeTypes(type);
+ const conditionRuntimes = possibleRuntimeTypes(this.typeCondition);
+ selectionSet = typeRuntimes.length === conditionRuntimes.length


selectionSet in the if block above will be rebased on type due to the SelectionSet.normalize() call, but it looks like we'd have to manually do that rebasing here. (Not sure if it really matters, since it looks like this function is only used by tryOptimizeSubselectionWithFragments().)

You're right, and this actually made me realise that I was making this more complicated than it needed to be.

Because calling rebaseOn(type) on the if block above is actually usually incorrect. Typically, that if block was taken when, say, type was some interface I1 while this.typeCondition was another interface I2 (assuming the runtimes match what expect). But in general, you can't just rebase fields of I2 into I1 directly, even it happens that it "would" be alright from the POV of the underlying possible runtimes.

So really, the issue was that the canApplyAtType method, which was guarding calls to this expandedSelectionSetAtType should just not allow those cases where calling normalize(type) does not work. So fixed that and simplified accordingly, and I've validated that fragments are still reused as well as before.

@sachindshinde since the change is probably a bit bigger than you intended, I'll give you the chance to have a look at the last commit/check my logic before merging.

I'll give you the chance to have a look at the last commit/check my logic before merging.

Slightly changed my mind given circumstance: I optimistically merged, but do raise any concern if you have and we'll address.

sachindshinde · 2023-06-13T04:48:48Z

Regarding the Node 20 issues, I couldn't immediately pinpoint a cause.

I tried to SSH into the box and look at the debug logs. Toward the end I saw:

4635 timing build:link Completed in 11ms
4636 info run @parcel/[email protected] install node_modules/@parcel/watcher node-gyp-build
4637 info run [email protected] install node_modules/dtrace-provider node-gyp rebuild || node suppress-error.js
4638 info run @parcel/[email protected] install { code: 2, signal: null }
4639 timing reify:rollback:createSparse Completed in 1708ms
4640 timing reify:rollback:retireShallow Completed in 0ms
4641 timing command:ci Completed in 19138ms
4642 verbose stack Error: command failed
4642 verbose stack     at ChildProcess.<anonymous> (/home/circleci/.nvm/versions/node/v20.3.0/lib/node_modules/npm/node_modules/@npmcli/promise-spawn/lib/index.js:53:27)
4642 verbose stack     at ChildProcess.emit (node:events:511:28)
4642 verbose stack     at maybeClose (node:internal/child_process:1098:16)
4642 verbose stack     at Socket.<anonymous> (node:internal/child_process:456:11)
4642 verbose stack     at Socket.emit (node:events:511:28)
4642 verbose stack     at Pipe.<anonymous> (node:net:334:12)
4643 verbose pkgid @parcel/[email protected]
4644 verbose cwd /home/circleci/project
4645 verbose Linux 5.15.0-1030-aws
4646 verbose node v20.3.0
4647 verbose npm  v9.7.1
4648 error code 2
4649 error path /home/circleci/project/node_modules/@parcel/watcher
4650 error command failed
4651 error command sh -c node-gyp-build
4652 error sh: 1: node-gyp-build: Text file busy
4653 verbose exit 2
4654 timing npm Completed in 19317ms
4655 verbose unfinished npm timer reify 1686630640693
4656 verbose unfinished npm timer reify:build 1686630657224
4657 verbose unfinished npm timer build 1686630657226
4658 verbose unfinished npm timer build:deps 1686630657227
4659 verbose unfinished npm timer build:run:install 1686630657258
4660 verbose unfinished npm timer build:run:install:node_modules/@parcel/watcher 1686630657259
4661 verbose unfinished npm timer build:run:install:node_modules/dtrace-provider 1686630657309
4662 verbose code 2

From some googling, sh: 1: node-gyp-build: Text file busy seems to mean that something was trying to overwrite node-gyp-build, but it was in the middle of being executed.

In any case, I changed my volta config in my package.json to the above (node 20.3.0 and npm 9.7.1) and ran tests, and they passed. So it's probably fine to ignore the Node 20 CI error (hopefully it won't re-appear on main).

abernix · 2023-06-13T09:36:49Z

@pcmanus:

Note: the node 20 circleCI check on this PR seems to have failed a bunch of times in a row, yet the error seems CI related. Check the link for detail, but the error is:

Just triangulating a theme I'm seeing around Node.js v20.3 with the Text file busy error, a very similar occurrence of that error happened with something totally different in this PR from @jerelmiller on another repository: apollographql/apollo-utils#302.

I think that either Node.js v20.3 OR CircleCI's Node.js v20.3 image are potentially the culprit. I don't yet know if that's foreshadowing something bigger that will break in that version or just a bug. Hoping for the latter.

jerelmiller · 2023-06-13T14:54:30Z

@abernix interestingly I was able to get it working again using the Node 20.2 image: apollographql/apollo-utils#306 so seems its specific to the 20.3 image. I hope this is just a temp fix and as you said, hope its not a foreshadowing of something worse.

…ches In apollographql#2619, code was added to avoid using a named fragment if some "trimmed" parts of that fragment may have field conflicts with the rest of the selection (or other trimmed part of other fragment at the same level). But the logic added is actually not generic enough: because field conflict validation traverses all fragments, it is not enough to check a fragment against the selection at the "current" fragment, we need to check againt every sub-selection that "merges" at the same level, which may avoid some "sibling" branch of the current one. To make this easier, the code adds some initial phase to the `Selection.optimize` method (that try reusing named fragment) which basically pre-compute the job of collecting the result of the 1st step of [`FieldsInSetCanMerge`](https://spec.graphql.org/draft/#FieldsInSetCanMerge()). The code from apollographql#2619 is then adapted to use that, thus taking "all branches" into account regardless of where the recursion of `optimize` is at.

#2627) In #2619, code was added to avoid using a named fragment if some "trimmed" parts of that fragment may have field conflicts with the rest of the selection (or other trimmed part of other fragment at the same level). But the logic added is actually not generic enough: because field conflict validation traverses all fragments, it is not enough to check a fragment against the selection at the "current" fragment, we need to check againt every sub-selection that "merges" at the same level, which may avoid some "sibling" branch of the current one. To make this easier, the code adds some initial phase to the `Selection.optimize` method (that try reusing named fragment) which basically pre-compute the job of collecting the result of the 1st step of [`FieldsInSetCanMerge`](https://spec.graphql.org/draft/#FieldsInSetCanMerge()). The code from #2619 is then adapted to use that, thus taking "all branches" into account regardless of where the recursion of `optimize` is at.

…ches In apollographql#2619, code was added to avoid using a named fragment if some "trimmed" parts of that fragment may have field conflicts with the rest of the selection (or other trimmed part of other fragment at the same level). But the logic added is actually not generic enough: because field conflict validation traverses all fragments, it is not enough to check a fragment against the selection at the "current" fragment, we need to check againt every sub-selection that "merges" at the same level, which may avoid some "sibling" branch of the current one. To make this easier, the code adds some initial phase to the `Selection.optimize` method (that try reusing named fragment) which basically pre-compute the job of collecting the result of the 1st step of [`FieldsInSetCanMerge`](https://spec.graphql.org/draft/#FieldsInSetCanMerge()). The code from apollographql#2619 is then adapted to use that, thus taking "all branches" into account regardless of where the recursion of `optimize` is at.

…with conflicting fields Trying to reuse fragments, if done without specific case, can in some context result in invalid selections due to fields conflicting. It is to avoid that problem that apollographql#2619 introduced the `FieldsConflictValidator` mechanism (later improved by apollographql#2635). Unfortunately, in some fairly specific setups with nested fragments, the code was missing some data in the validation mentioned above, which led to still having case where a subgraph fetch may be invalid due to some fields (within reusing fragments) conflicting at some point of the query. This commit fix that issue by ensuring we take everything we should into account when doing the aforementioned validation.

…with conflicting fields (#2740) Trying to reuse fragments, if done without specific case, can in some context result in invalid selections due to fields conflicting. It is to avoid that problem that #2619 introduced the `FieldsConflictValidator` mechanism (later improved by #2635). Unfortunately, in some fairly specific setups with nested fragments, the code was missing some data in the validation mentioned above, which led to still having case where a subgraph fetch may be invalid due to some fields (within reusing fragments) conflicting at some point of the query. This commit fix that issue by ensuring we take everything we should into account when doing the aforementioned validation.

pcmanus requested a review from a team as a code owner June 7, 2023 14:59

pcmanus force-pushed the no-fragment-reuse-if-create-invalid-op branch from f0d937e to a497b20 Compare June 7, 2023 15:05

clenfest reviewed Jun 8, 2023

View reviewed changes

pcmanus added 7 commits June 9, 2023 15:53

Add test to reproduce issue with current approach

302cf27

Check if fragment reuse is invalid

e0ee5ce

Fix computation of NamedFragmentDefinition.exapndedSelectionSetAtType

1cb47e0

Add changeset

ae97b8e

Fix normalization of field selections

405aa89

The code for normalizing fields was not taking into account that if we normalize on a more precise type, this may make a field definition "more precise", which can impacts the type we should use to normalize the sub-selection of said field.

pcmanus force-pushed the no-fragment-reuse-if-create-invalid-op branch 2 times, most recently from 9074cf6 to e8ede8a Compare June 9, 2023 14:47

Review feedback

b54033d

pcmanus force-pushed the no-fragment-reuse-if-create-invalid-op branch from e8ede8a to b54033d Compare June 9, 2023 14:51

pcmanus requested a review from clenfest June 9, 2023 15:01

sachindshinde requested changes Jun 10, 2023

View reviewed changes

Review feedback from Sachin

9959b19

pcmanus requested a review from sachindshinde June 12, 2023 14:03

sachindshinde approved these changes Jun 13, 2023

View reviewed changes

Fixup condition to check for applying fragment candidates

28d9b92

pcmanus force-pushed the no-fragment-reuse-if-create-invalid-op branch from 6740dba to 28d9b92 Compare June 13, 2023 14:20

pcmanus merged commit 7f1ef73 into apollographql:main Jun 13, 2023

github-actions bot mentioned this pull request Jun 13, 2023

release: on branch main #2625

Merged

pcmanus mentioned this pull request Jun 14, 2023

Field conflict detection on fragment reuse do not handle sibling bran… #2627

Merged

o0Ignition0o mentioned this pull request Jun 20, 2023

prep release: v1.21.0 apollographql/router#3280

Merged

pcmanus mentioned this pull request Aug 18, 2023

Fix case where reusing fragments can still lead to invalid selection … #2740

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues with named fragment handling code #2619

Fix issues with named fragment handling code #2619

pcmanus commented Jun 7, 2023

netlify bot commented Jun 7, 2023 •

edited

Loading

changeset-bot bot commented Jun 7, 2023 •

edited

Loading

codesandbox-ci bot commented Jun 7, 2023 •

edited

Loading

clenfest Jun 8, 2023

pcmanus Jun 9, 2023

clenfest Jun 8, 2023

pcmanus Jun 9, 2023

clenfest Jun 8, 2023

pcmanus Jun 9, 2023

sachindshinde Jun 10, 2023

pcmanus Jun 12, 2023

sachindshinde Jun 10, 2023

pcmanus Jun 12, 2023

sachindshinde Jun 10, 2023

pcmanus Jun 12, 2023

pcmanus commented Jun 12, 2023

sachindshinde left a comment

sachindshinde Jun 13, 2023

pcmanus Jun 13, 2023

sachindshinde Jun 13, 2023

pcmanus Jun 13, 2023

pcmanus Jun 13, 2023

sachindshinde commented Jun 13, 2023

abernix commented Jun 13, 2023

jerelmiller commented Jun 13, 2023

Fix issues with named fragment handling code #2619

Fix issues with named fragment handling code #2619

Conversation

pcmanus commented Jun 7, 2023

netlify bot commented Jun 7, 2023 • edited Loading

👷 Deploy request for apollo-federation-docs pending review.

changeset-bot bot commented Jun 7, 2023 • edited Loading

🦋 Changeset detected

codesandbox-ci bot commented Jun 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcmanus commented Jun 12, 2023

sachindshinde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sachindshinde commented Jun 13, 2023

abernix commented Jun 13, 2023

jerelmiller commented Jun 13, 2023

netlify bot commented Jun 7, 2023 •

edited

Loading

changeset-bot bot commented Jun 7, 2023 •

edited

Loading

codesandbox-ci bot commented Jun 7, 2023 •

edited

Loading