Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin upgrades #1543

Closed
itowlson opened this issue May 25, 2023 · 29 comments · Fixed by #1577
Closed

Plugin upgrades #1543

itowlson opened this issue May 25, 2023 · 29 comments · Fixed by #1577

Comments

@itowlson
Copy link
Contributor

This is a micro-SIP for informing Spin users when there are new versions of their favourite plugins. It is for discussion.

Principles:

  • Do not break the user's existing workflow (habits, scripts, etc.).
  • Do not block the user (make them wait on a prompt, make progress conditional on a download).
  • Do not surprise the user.
  • Do not badger the user.
  • Respect CI environments.
    • A specific consideration is that a GH action that installs an older plugin should continue to work.

Classification:

  • If the user is on a stable line of a plugin (>=1.x), and there is a higher version available within the same major version, that upgrade is eligible.
  • Upgrades within unstable lines (0.x) are questionable.
  • Upgrades across major versions (1.x -> 2.x) are questionable.
  • Upgrades with pre-release entries in their version (1.2.3-alpha.1) are ineligible.
    • We could refine this rule to make pre-releases eligible if the user is already on an earlier pre-release of that patch version, but pre-release text doesn't sort well and I'm not sure we have a sense of "earlier/later" other than the version.

When to notify:

  • When a user runs a plugin, we check for the availability of an upgrade.
  • If there is an eligible upgrade:.
    • Has the user already been notified about this version? If so, do nothing.
    • Has the user been notified about any upgrade from their current version of the plugin within the last B days? If so, do nothing.
      • The nuance here is for the following reason. Suppose the user is on 1.3. On Monday, the user runs the plugin, and we detect and offer an upgrade to 1.4. They do the upgrade. On Tuesday, 1.4.1 comes out. On Wednesday, the user runs the plugin again. We should probably mention 1.4.1 even though we prompted on Monday. But if the user chose not to do the upgrade to 1.4, then we should not keep badgering them. So we need a sense of "resetting" the badger clock when they do an upgrade.
      • There is still a risk of badger if a plugin has high churn. The user keeps running to keep up but there's always a new upgrade to be pestered about. We can probably cross this bridge if we come to it.
    • Otherwise, print a message informing the user that the upgrade is available and telling them how to get it.
  • If there is a questionable upgrade:
    • TBD. We could notify them using the same rules as eligible upgrades but with the message warning about breaking changes, and possibly with a longer badger clock. Or we could not notify them. Or something else.
  • If there is an ineligible upgrade, do nothing.
  • This proposal doesn't define the badger timeout B, but that is easy to tweak during implementation and rollout. My guess is we would start it at 7 or 14 days.

How to notify:

  • Spin could notify before handing off to the plugin. This gets it out of the way. But if the plugin produces non-trivial amounts of output, the message will disappear. It also means we have to either make the user wait while we check for availability, or do a background check and notify the next time they run the plugin.
  • Spin could notify after the plugin exits. This would allow Spin to do checks in the background while the plugin is running, avoiding any startup delay, ensures that the message is visible even if the plugin has produced a lot of output, and prints it at the moment the user/terminal comes free, so they can action the upgrade offer immediately. I am not sure if this increases complexity though.
  • My preference is to notify after the plugin exits.
  • In either case, notifications should be "print to terminal" only. Spin should not display a prompt that requires a response in order to proceed.
  • In either case, the upgrade check must be silent and errors must not be reported to the user.
    • Spin should skip the upgrade check If the environment is not a terminal, because there is no user to notify so it's a waste of bandwidth.

Privacy;

  • The user should be able to opt out of upgrade checks, via an environment variable or such like.
    • Perhaps the first time the user runs a plugin (any plugin), we should inform the user how to opt out. That doesn't give them an opportunity to back out, though, unless we have the more complex rule of "the first time, we print the opt out notification; only the second time do we actually perform an upgrade check." I am not sure how much of a concern this is.
  • The most information sent from the user's machine during the upgrade check should be the plugin name and version. (I suspect we won't need to send that because we will look at the git repo, but then we get into repeated pulls, rate limiting, and similar "the whole reason why the update command exists" issues. So I want to leave this much wiggle room.)

Future refinements:

  • It could be worth allowing authors to tag releases as urgent, e.g. security patches. Urgent releases would change the rules a bit:
    • Urgent releases within the 0.x line would be eligible. (Not sure about across stable major versions. In principle a security patch would be backported. In practice, that's not always the case.)
    • Urgent releases would be notified even if user had recently been badgered. (Not sure if we would repeatedly notify for the same urgent release. But if the user had recently declined an upgrade, and then there was an urgent release, that should be notified.)
@kate-goldenring
Copy link
Contributor

I'm aligned with all of this. We will probably want to add this mini-SIP to the spin-plugins repository so authors know how their versioning scheme affects auto-upgrades

@kate-goldenring
Copy link
Contributor

Also, I wonder whether we want to enable this for plugins that aren't from the spin-plugins repo. This would require changing our install system a little, maybe by storing a reference to where the plugin was installed from and expecting a specific repository layout (somewhat like Homebrew private taps).

@lann
Copy link
Collaborator

lann commented May 26, 2023

I feel like we could get away with slightly more badgeriness, like renotifying for eligible updates every badger timeout. It could also be frustrating to miss out on updates because you weren't staring with rapt attention at a command completing.

@itowlson
Copy link
Contributor Author

@kate-goldenring For checking upgrade availability, do you think it would be okay to effectively run spin update, or would you recommend something different? I am thinking about:

  1. Is it good or bad that they see an updated registry even though they never explicitly updated it themselves?
  2. What is the potential impact on GitHub throttling - this was the original motive for not checking on every upgrade I think? I don't know if pulling only the 'latest' file for a plugin would have a lower impact.

@kate-goldenring
Copy link
Contributor

@itowlson, I don't see a downside to (1), since the local registry will still contain older versions of the manifests but now just also have the new ones. For (2), wouldn't updating one file with git require fetching all files? I am not sure in git how to just fetch one file but we could effectively curl the latest file at the expected path.

I do feel concern in running update every time a plugin is executed. I wonder if we want an update interval along with a badger interval. IE if it has been over 12 hours since the last update, we do a spin plugin update.

@itowlson
Copy link
Contributor Author

@kate-goldenring Yeah curling the latest file was what I prototyped.

The way I have prototyped it does things in a different order, so it would only check for updates if:

  1. The user is now running a different plugin version from the last notification (i.e. they accepted the last badger); or
  2. The badger interval has expired.

So that should keep the GH load down even without a separate update interval; plugins in the registry are unlikely to churn that fast enough for 1 to be a big issue (given that it's a PR and manual approval process to update the registry).

@itowlson
Copy link
Contributor Author

Hmm, the downside of curling only the latest is that if latest is incompatible with the user's version of Spin, but an intermediate version is compatible, we would not prompt to upgrade to the intermediate version. So that might be a reason to do the full update.

(I forgot about Spin compatibility above... for completeness, upgrades where the Spin version requirement does not match are, naturally, ineligible.)

I am a bit concerned, though, that full update could be slow. For plugins that normally complete quickly, this could result in a lot of cancelled updates (because we don't want to unduly delay exiting just for the sake of the badger).

@lann
Copy link
Collaborator

lann commented May 31, 2023

I am a bit concerned, though, that full update could be slow. For plugins that normally complete quickly, this could result in a lot of cancelled updates (because we don't want to unduly delay exiting just for the sake of the badger).

Could fork a daemon to do the update, write the result to a file, and check the result file on plugin exit. If the update doesn't complete before the plugin exits this time you'll have the result ready for next time

@itowlson
Copy link
Contributor Author

I considered spinning off a background process, but was a bit intimidated by the additional moving parts (particularly for error cases, and edge cases like multiple instances running at once / user running update at the same time as the daemon is trying to / etc.). It's definitely an option if we go the full update route though. I'm not sure why a full daemon rather than a background process - would you see that as serialising updates or something?

It's probably worth aligning on perspective too. I have been thinking about this as an advisory feature, that isn't worth making too involved and doesn't need to be perfect. But if we regard it as a key servicing feature then it merits the extra level of robustness.

@lann
Copy link
Collaborator

lann commented Jun 1, 2023

why a full daemon rather than a background process

Ah yeah I guess I mean "short-lived daemon". Sometimes "daemon" means "system service" and sometimes it just means "that one weird double-fork trick".

@kate-goldenring
Copy link
Contributor

we would not prompt to upgrade to the intermediate version

This is a good point. I wonder if adding a subrepository to the plugins repository that just contains the manifests directory would reduce the fetch time.

I agree with the point on this not needing to be perfect on the first run. it may make sense to do full updates for now without a separate daemon process and then get feedback and potentially iterate to something more robust

@itowlson
Copy link
Contributor Author

itowlson commented Jun 1, 2023

The repo only contains about 60K of non-manifest data (which will update only infrequently) so I wouldn't worry too much about breaking out a sub-repository.

It sounds like you are proposing that we do go for the update rather than curl, and accept that we will hit more cases where we don't get the update in time. Is that correct? (I'm not disagreeing, just making sure I've understood correctly.)

@itowlson
Copy link
Contributor Author

itowlson commented Jun 2, 2023

I have an experimental branch using the "full update" strategy at https:/itowlson/spin/tree/plugins-badger-full-update.

At the moment it looks like it needs about 750ms to complete an update when the working copy is already up to date. For a fast-running plugin (which completes in say 10ms), that means a perceptible wait after the plugin exits. The timeout I used in the curl branch was 250ms, which was barely noticeable (at least to me, though I am admittedly less sensitive to this stuff than many people). So it depends on whether realistic plugins are likely to take 500ms+.

@lann
Copy link
Collaborator

lann commented Jun 2, 2023

it needs about 750ms to complete an update

...on your particular uplink to wherever GH has it hosted. We should probably assume that some users will see consistently higher latency and consider whether this would mean they would never see notifications.

@itowlson
Copy link
Contributor Author

itowlson commented Jun 5, 2023

Yep, sorry, should have made that explicit.

@kate-goldenring
Copy link
Contributor

The timeout I used in the curl branch was 250ms, which was barely noticeable (at least to me, though I am admittedly less sensitive to this stuff than many people)

Another option would be to add a versions.txt file to each plugin in the spin-plugins repo that contains all the existing versions of the plugin. Then on each update, we curl that file to check if an update is needed of it. Then if there are N latest updates to the plugin, it will then fetch the Nth. If its not compatible, it could then fetch each of the intermediate ones or do a full update at that point to not have too much of a curl train.

@itowlson
Copy link
Contributor Author

itowlson commented Jun 7, 2023

Okay there is a background processy branch at https:/itowlson/spin/tree/plugins-badger-background-process-eek-eek-eek and it seems to work but I have no real idea how to evaluate it for correctness, especially in the face of errors or concurrency.

@lann I don't know if this is what you were thinking of when you mentioned a daemon approach or a "weird double fork trick" - all it does at the moment is spawn a background process - were you proposing more than that?

@lann
Copy link
Collaborator

lann commented Jun 7, 2023

The "weird double fork trick" is about detaching the child process from the controlling terminal and process group, but now that I think about it I'm actually not sure whether you want the child to detach in this scenario... 🤔

@itowlson
Copy link
Contributor Author

itowlson commented Jun 7, 2023

Uhhhhhhhhhhhhhhhhh... well... with what I've done the child can continue to run after the parent exits... I have not tested what happens if I close the terminal... do we care? DON'T MAKE ME CARE LANN

@itowlson
Copy link
Contributor Author

itowlson commented Jun 8, 2023

How do we want to take this forward?

@kate-goldenring
Copy link
Contributor

Okay it sounds like we had three options here:

  1. Curl latest file (plugins-badger branch)
    • Concern: lack of intermediate version compat checking)
    • Consensus: do not choose
  2. Full update: Run a spin plugins update at appropriate times (plugins-badger-full-update brance)
    • Concern: extends the time of the plugin process potentially substantially
  3. Run full update as background process (...eek branch
    • Concern: Background say what?

@itowlson
Copy link
Contributor Author

itowlson commented Jun 8, 2023

Yes. I don't love any of them. The background process is probably the most robust in terms of getting updates in front of users, and I think avoids any UI speed bumps. But correctness is going to be hard to confirm, especially since pulled updates will often not be displayed until the next time the plugin runs. I am also not sure if there are any concurrency implications if two background checks kick off close together, or throttling implications since each plugin will trigger an update independently. We can address those latter two points but it will make it more complicated (like serialising update requests and/or or recording a last update time and not re-updating within a timeout). Which I guess is why large-scale servicing processes really do run as actual honest-to-goodness daemons...

@kate-goldenring
Copy link
Contributor

kate-goldenring commented Jun 8, 2023

cloud we do something where the upgrade background process grabs an flock() on some update file in the .spin directory so it always checks if its locked and if so doesn't run.

@itowlson
Copy link
Contributor Author

itowlson commented Jun 8, 2023

Yeah that was the sort of thing I was thinking of for concurrency (and throttling isn't really an issue, nobody will run that many different plugins frequently enough or for long enough and I should get over myself). We'd need some UI for "user tries to update while a background process is updating" but that's no biggie. And of course now I am going into agonies about the background process getting stuck or failing to clean up the lock.

But correctness remains the big worry.

@kate-goldenring
Copy link
Contributor

And of course now I am going into agonies about the background process getting stuck or failing to clean up the lock.

Good point. I think we will have some nice troubleshooting docs for this. I think its fair to proceed with the background process approach and we can iterate based on the behavior people experience.

@lann
Copy link
Collaborator

lann commented Jun 8, 2023

The background process only needs to sync the repo right? Seems like that would use an "update state" file to track last update time for rate limiting, which it can flock during the update process (which will always unlock on process exit). Now you "just" have to worry about the background process hanging...

@itowlson
Copy link
Contributor Author

itowlson commented Jun 8, 2023

@lann

image

@lann
Copy link
Collaborator

lann commented Jun 8, 2023

Don't you have to reboot windows every few hours anyway?

@itowlson
Copy link
Contributor Author

itowlson commented Jun 9, 2023

All right, Rust fs2 file locking uses LockFileEx on Windows, not the dreaded share mode APIs:

If a process terminates with a portion of a file locked or closes a file that has outstanding locks, the locks are unlocked by the operating system.

So we are gold--

However,

MUST WE MICROSOFT MUST WE

Anyway yep I can take a look at this. I'll do it as a separate PR to spin plugins update as it is arguably a general safety feature (and to avoid piling more gunk into the branch). Thanks all for the info, thought-sharing, and operating system banter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants