Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize the build system by correctly house keeping the dirtykeys and rule values [flaky test #4185 #4093] #4190

Merged
merged 53 commits into from
May 10, 2024

Conversation

soulomoon
Copy link
Collaborator

@soulomoon soulomoon commented Apr 22, 2024

What is currently did up to 240254e

The main problem is the out of sync state in the build system. Several states involved, the shake database running result state for key. shake extra's dirty state for key and shake extra's rule values state.
To stablize the build system. This PR force some of the updates of these state into a single STM block.

  1. collect dirtykeys and ship it to session restart directly. No more invalid removal before session restart. Fixing Flaky test failure result in error of GetLinkable #4093 [Verified fixed locally 500 rounds]
  2. move the dirtykey removal and rules values updating to hls-graph by adding a call back fucntion to RunResult. Properly handle the dirtykeys and rule value state after session start and closely followed by another session restart Fixing ghcide-tests' addDependentFile test #4194 [Verified fixed locally 500 rounds]
  3. properly handle clean up by wrapping the asyncWithCleanUp to refreshDeps

possible future improvement (Should be in some other PR)

*. we can avoid giving multiple update to dirty states across multiple session if the key is still dirty but does not marked as new dirty, by sending the collected keys directly to newsession. But garbageCollectKeys is using it, so it is not a good idea to do it now. Maybe in the future.
*. Maybe we can schedule some debouncer to the session restart. I've observed some times the consective session restart might happen.

The problem

Some proof of concept for fixing problem at #4185 #4093 discovered and tracked down by @jhrcek with his impressive bug tracking technique.

Here should be what might be happeing for the problem:

=>IsEvaluating is running and already get the files for evaluations,
=>enqeueue file for evaluation
=>setSomethingModified dirty the IsEvaluating key in shakeExtre
=>IsEvaluating is finished, diry key removed from shakeExtra
=>session restarting and flushing the dirty keys of shakeExtra to hls-graph database
=>IsEvaluating is not updating since hls-graph database do not know it is dirty.

The essential problem is that a running key is being marked as dirty, and finished running and removed itself from the dirtyset in shakeExtra just before a sessionRestart flush it to rerun in the hls-graph database, and hence the dirty mark is incorrectly lost.

The worm bed is that we are bookkeeping two set of dirtiness, one in hls-graph database(has three states, running, dity, clean) and one in shake-extra(two states, dirty, clean).
It is nasty to keep them in sync.

Possible fixes

Some possible solutions:

  1. We are passing keys need to be updated directly to the restartShakeSession instead of just recording it in the dirtykeys of shakeExtra. Which keep them from being removed right before the restartShakeSession.
    But might need to track out each such case and safe guard them to restartShakeSession.

  2. when mark the dirtykeys in shakeExtra, we also mark them in shakeDatabase, but we might need to dirty all its reversedeps to make sure it would break dirtiness of them, I do not see how much better can this be better than shakerestart

  3. When keys being added to dirtykeys in shakeExtra, we restart the shake session at the same time, leaving no gap in between. @michaelpj 'idea that we can use VarT instead of MVar and wrap the key changed and session restart both togather.

  4. we add a running state for keys in shakeExtra, avoid remove the dirtykeys in shakeExtra if the running state is gone. Or some thing similar to expand the possible state that can be represent in shakeExtra. We might need extra book keeping a lot more keys. but simplestz not workable since the update have to be in hls-graph

  5. create a new temp dirtykeyset to record new dirty keys, and only flush it to dirtykeyset inside restartShakeSession

@soulomoon soulomoon marked this pull request as ready for review April 22, 2024 16:18
@soulomoon soulomoon marked this pull request as draft April 22, 2024 20:29
@soulomoon soulomoon closed this Apr 22, 2024
@soulomoon soulomoon changed the title mark the keys dirty before we do async runs to start the database passing keys need to be updated directly to restartShakeSession so it won't get lost Apr 22, 2024
@soulomoon soulomoon reopened this Apr 22, 2024
@michaelpj
Copy link
Collaborator

So basically we don't want to delete the dirty keys outside the withMVar lock on the shakeSession? That makes sense. I guess the thing that's a shame here is that these modifications don't compose - which suggests to me it would be nicer if we could use STM. Would it be possible to make shakeSession a TVar instead of an MVar?

@soulomoon
Copy link
Collaborator Author

soulomoon commented Apr 23, 2024

So basically we don't want to delete the dirty keys outside the withMVar lock on the shakeSession? That makes sense. I guess the thing that's a shame here is that these modifications don't compose - which suggests to me it would be nicer if we could use STM. Would it be possible to make shakeSession a TVar instead of an MVar?

We might have dirty key lost if we are marking them as dirty in shakeExtra when the shake session is running.

That is we are marking those keys that are already running in session as dirty. Then when they are done running, they remove themself from the dirty set.

@soulomoon
Copy link
Collaborator Author

soulomoon commented Apr 23, 2024

Another option would be to expand the state of keys in shakeExtra to have three state
shake-extra(running, dirty, clean), we can get rid of the dirty key lost problem.
Seems like it can fit better into our current code structure.

@wz1000
Copy link
Collaborator

wz1000 commented Apr 23, 2024

This PR seems to have a bunch of unrelated changes.

Could you also write down a commit message explaining the problem and solution when you find the time please?

@michaelpj
Copy link
Collaborator

Another option would be to expand the state of keys in shakeExtra to have three state
shake-extra(running, dirty, clean), we can get rid of the dirty key lost problem.

Yes, that could also work. So then if a rule finishes it updates the state to 'clean' if it was 'running', but not if it has been set back to 'dirty' while it was in progress.

@soulomoon
Copy link
Collaborator Author

soulomoon commented Apr 23, 2024

This PR seems to have a bunch of unrelated changes.

Could you also write down a commit message explaining the problem and solution when you find the time please?

This PR is a proof of concept to fix the problem we have encounter at #4185 for now. Sorry I forget to addresss that, addressed now #4190 (comment) and unrelated changes are removed.

@soulomoon soulomoon changed the base branch from jan/characterize-eval-plugin-race to master April 23, 2024 16:18
@soulomoon soulomoon changed the base branch from master to jan/characterize-eval-plugin-race April 23, 2024 16:18
@soulomoon soulomoon force-pushed the soulomoon/mark-dirty-keys-sync-to-hls-graph branch from 590e71b to 684a850 Compare April 23, 2024 16:27
@soulomoon soulomoon changed the base branch from jan/characterize-eval-plugin-race to master April 23, 2024 16:28
@soulomoon
Copy link
Collaborator Author

Would it be possible to make shakeSession a TVar instead of an MVar?

I review relevant part a bit, it seems to be hard to do so, since a lot of IO is entangling

@soulomoon soulomoon changed the title passing keys need to be updated directly to restartShakeSession so it won't get lost passing keys that need to be updated directly to restartShakeSession so it won't get lost Apr 23, 2024
@soulomoon soulomoon force-pushed the soulomoon/mark-dirty-keys-sync-to-hls-graph branch from 684a850 to 52f85b2 Compare April 23, 2024 20:06
@soulomoon
Copy link
Collaborator Author

soulomoon commented Apr 23, 2024

Another option would be to expand the state of keys in shakeExtra to have three state
shake-extra(running, dirty, clean), we can get rid of the dirty key lost problem.

Yes, that could also work. So then if a rule finishes it updates the state to 'clean' if it was 'running', but not if it has been set back to 'dirty' while it was in progress.

The third solution. This seems to have the least changed, I have pushed this branch. But a slightly altered version, book keeping an extra runningKeys TVar. Just a few lines of code have changed.

@soulomoon soulomoon changed the title passing keys that need to be updated directly to restartShakeSession so it won't get lost Fix dirty key being invalidly removed Apr 23, 2024
@soulomoon soulomoon changed the title Fix dirty key being invalidly removed Fix dirty key being invalidly removed due to race condition #4185 #4093 Apr 23, 2024
@soulomoon soulomoon changed the title Fix dirty key being invalidly removed due to race condition #4185 #4093 Fix dirty key being invalidly removed due to race condition [flaky test #4185 #4093] Apr 23, 2024
@jhrcek
Copy link
Collaborator

jhrcek commented Apr 24, 2024

tracked down by @jhrcek with his impressive bug tracking technique.

That's kind of you to say that, but it's just good old "printf debugging", I wish I knew how to do it more ergonomically 😄

Copy link
Collaborator

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about writing a test for this, and that made me wonder why dirtiness checking isn't part of hls-graph, where it would be easier to test in isolation. Does anyone know?

ghcide/src/Development/IDE/Core/Shake.hs Outdated Show resolved Hide resolved
@michaelpj
Copy link
Collaborator

In fact hls-graph does have dirtiness-related code, e.g.: https:/haskell/haskell-language-server/blob/master/hls-graph/src/Development/IDE/Graph/Database.hs#L40

Why do we have two versions of this? :(

@michaelpj
Copy link
Collaborator

michaelpj commented Apr 24, 2024

@@ -474,10 +474,9 @@ loadSessionWithOptions recorder SessionLoadingOptions{..} dir = do
clientConfig <- getClientConfigAction
extras@ShakeExtras{restartShakeSession, ideNc, knownTargetsVar, lspEnv
} <- getShakeExtras
let invalidateShakeCache :: IO ()
invalidateShakeCache = do
let invalidateShakeCache = do
void $ modifyVar' version succ
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be moved out of the restart.

Copy link
Collaborator

@wz1000 wz1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work and investigation! This should make HLS much more robust.

@michaelpj
Copy link
Collaborator

michaelpj commented May 8, 2024

A thought: can we delegate more to hls-graph? hls-graph does know what things it considers to be dirty at the moment. So, consider:

  • On the HLS side, we track "dirtied during this session" keys instead of "definitely dirty" keys.
  • When we restart the hls-graph session, we re-initialize it with a) the keys we think were made dirty during this session, and b) the keys that hls-graph thinks were dirty at the end of the previous session. We then set our "dirty keys this session" to empty.

That means we never need to "clean" a key on our side, and so we can drop (some of?) the ruleHook stuff and I think also the dirtyKeys TVar. The only risk would be that we might over-approximate, in the case that a key was dirtied and then cleaned before we restart the session. But we already don't want that to happpen! Part of the point of this PR is to ensure that in such cases we still consider the key dirty.

To elaborate a bit: the reason we need dirtyKeys to be persistent across sessions and to clean keys from it is that we are taking full responsibility for tracking (root) dirty keys across sessions. But we don't have to do that: we can delegate the job of telling which keys were dirty last session to hls-graph, and we can just keep track of newly dirty keys. That simplifies the state management.

If we need access to the set of "dirty keys this session" then we can also store what hls-graph tells us is dirty when we restart the session. There may indeed be a risk that we GC things that have been cleaned, though 🤔

@soulomoon
Copy link
Collaborator Author

Yes,this seems to be a potential improvement to do as a follow up to this PR. We can see how it would fit into the monitoring and garbage collect stuff by then

@michaelpj
Copy link
Collaborator

Okay, I'm happy for you to look into this more after this PR!

Copy link
Collaborator

@jhrcek jhrcek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few stylistic suggestions, otherwise LGTM

ghcide/src/Development/IDE/Core/Shake.hs Outdated Show resolved Hide resolved
ghcide/src/Development/IDE/Main.hs Outdated Show resolved Hide resolved
-- Note [Housekeeping rule cache and dirty key out side of hls-graph]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- Hls-graph contains its own internal running state for each key in the shakeDatabase.
-- Rule result cache and dirty key are in ShakeExtras that is not visible to the hls-graph
Copy link
Collaborator

@jhrcek jhrcek May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me which field in ShakeExtras represents rule result cache.
Could you please mention it here?

Suggested change
-- Rule result cache and dirty key are in ShakeExtras that is not visible to the hls-graph
-- ShakeExtras contains ??? field (rule result cache) and dirtyKeys (keys that became dirty in between build sessions) that is not visible to the hls-graph

ghcide/src/Development/IDE/Core/Shake.hs Outdated Show resolved Hide resolved
ghcide/src/Development/IDE/Core/Shake.hs Outdated Show resolved Hide resolved
ghcide/src/Development/IDE/Core/Shake.hs Outdated Show resolved Hide resolved
ghcide/src/Development/IDE/Core/Shake.hs Outdated Show resolved Hide resolved
ghcide/session-loader/Development/IDE/Session.hs Outdated Show resolved Hide resolved
ghcide/src/Development/IDE/Core/Shake.hs Outdated Show resolved Hide resolved
soulomoon and others added 10 commits May 9, 2024 21:29
Co-authored-by: Jan Hrcek <[email protected]>
Co-authored-by: Jan Hrcek <[email protected]>
Co-authored-by: Jan Hrcek <[email protected]>
Co-authored-by: Jan Hrcek <[email protected]>
Co-authored-by: Jan Hrcek <[email protected]>
Co-authored-by: Jan Hrcek <[email protected]>
Co-authored-by: Jan Hrcek <[email protected]>
Co-authored-by: Jan Hrcek <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high High priority item status: needs review This PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants