Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broaden set of theses returned by not_consented_to_proquest scope #1109

Merged
merged 1 commit into from
Jan 20, 2023

Conversation

jazairi
Copy link
Contributor

@jazairi jazairi commented Jan 19, 2023

Why these changes are being introduced:

This is a similar issue to the one solved in #1108, but in the not_consented_to_proquest scope. Here, the scope excludes all theses with an author ProQuest consent value of 'true'. This works in most cases, except when one author has opted in and another hasn't, that thesis should still be partially harvested. With the current logic, it will not show up in the export preview because the scope sees a 'true' consent value and thus excludes the thesis.

Relevant ticket(s):

How this addresses that need:

This inverts the logic of the not_consented_to_proquest scope. Instead of excluding theses with at least one ProQuest consent value of true, it includes theses with at least one ProQuest consent value of false or nil, using the 'or' method to return the union.

Side effects of this change:

Since 'includes' is not compatible with the 'or' method, the scope now uses 'left_joins' to load the author records. This also means that the the result will be an ActiveRecord assocation rather than an array. ActiveJob does not accept associations as params, so we are now casting the params as arrays. I've done with this with both the full and partial export sets to avoid confusion, even though it's only needed at this stage for the partial export set.

Developer

  • All new ENV is documented in README
  • All new ENV has been added to Heroku Pipeline, Staging and Prod
  • ANDI or Wave has been run in accordance to
    our guide and
    all issues introduced by these changes have been resolved or opened as new
    issues (link to those issues in the Pull Request details above)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Requires database migrations?

NO

Includes new or updated dependencies?

NO

@mitlib mitlib temporarily deployed to thesis-submit-pr-1109 January 19, 2023 17:50 Inactive
Why these changes are being introduced:

This is a similar issue to the one solved in #1108, but in the
not_consented_to_proquest scope. Here, the scope excludes all
theses with an author ProQuest consent value of 'true'. This
works in most cases, except when one author has opted in and
another hasn't, that thesis should still be partially harvested.
With the current logic, it will not show up in the export preview
because the scope sees a 'true' consent value and thus excludes
the thesis.

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/ETD-603

How this addresses that need:

This inverts the logic of the not_consented_to_proquest scope.
Instead of excluding theses with at least one ProQuest consent
value of true, it includes theses with at least one ProQuest consent
value of false or nil, using the 'or' method to return the union.

Side effects of this change:

Since 'includes' is not compatible with the 'or' method, the scope
now uses 'left_joins' to load the author records. This also means
that the the result will be an ActiveRecord assocation rather than
an array. ActiveJob does not accept associations as params, so
we are now casting the params as arrays. I've done with this with
both the full and partial export sets to avoid confusion, even
though it's only needed at this stage for the partial export set.
@jazairi jazairi force-pushed the etd-603-not-consented-to-proquest-bug branch from ff9ed87 to 85ffa17 Compare January 19, 2023 17:51
@jazairi jazairi temporarily deployed to thesis-submit-pr-1109 January 19, 2023 17:51 Inactive
@coveralls
Copy link

coveralls commented Jan 19, 2023

Coverage Status

Coverage: 97.59%. Remained the same when pulling 85ffa17 on etd-603-not-consented-to-proquest-bug into 2212178 on main.

@jazairi
Copy link
Contributor Author

jazairi commented Jan 19, 2023

This bug should have been caught by our test suite, but there's a typo in the assertion that would've caught it, such that it instead looks for the opposite condition: https:/MITLibraries/thing/pull/1109/files#diff-d75b7faadae90a81e1e198252235e1650115d699a0291df2cb913164f25a797aL1325-L1328

This is now fixed, and I think all other edge cases are tested, but I'd like the reviewer to check my work on that as I should've caught this earlier.

@JPrevost JPrevost self-assigned this Jan 20, 2023
Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic is fine. We may benefit from reusing theses state less in assertions, but that is not blocking just something to ponder.

thesis = theses(:two)
assert_not_includes Thesis.not_consented_to_proquest, thesis
# multi-author thesis with one opt-in and one null is included
assert_equal 2, thesis.authors.count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we are inheriting the thesis from the pervious set of assertions and not loading one intentionally. The test is likely valid as-is, but if someone inserted a different set of assertions between the previous and this one that loaded or changed state of a thesis/authors we'd be confused here. I'd suggest we load a fixture for a thesis for each case we are testing even if we end up reloading the same fixture multiple times. The prevent this potential leakage, it may (or may not!) be useful to use more explicit variable names than thesis. It's really easy to let data slip between assertions when we reuse the variable name (which I think we do quite frequently and is not limited to this work but I'm only noticing it now as a potential for introducing bugs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JPrevost I totally hear you. In this case, the impetus was expediency. If I were to add a couple fixtures (or even change existing ones), I would have to update several tests in the report model that expect specific numbers of entries in a table. I fixed a few of those tests the last time around out of frustration, but there are still a handful of them, and my goal was to land this as quickly as possible.

Your point is well taken, though, and I opened this ticket to fix it up: https://mitlibraries.atlassian.net/browse/ETD-621

I also opened this ticket to refactor the bad tests in the report model: https://mitlibraries.atlassian.net/browse/ETD-620

@jazairi jazairi merged commit fe1de1b into main Jan 20, 2023
@jazairi jazairi deleted the etd-603-not-consented-to-proquest-bug branch January 20, 2023 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants