Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve framework support for cached multi-thread-safe workers #3730

Open
lihaoyi opened this issue Oct 14, 2024 · 0 comments
Open

Improve framework support for cached multi-thread-safe workers #3730

lihaoyi opened this issue Oct 14, 2024 · 0 comments

Comments

@lihaoyi
Copy link
Member

lihaoyi commented Oct 14, 2024

Task.Workers in Mill are intended for managing mutable state, in-memory/on-disk/in-subprocesses. Before Mill was parallel that was easy, but Mill 0.12.0 is parallel-by-default. Thus defining a worker requires that the user return a multi-thread-safe object, but Mill provide no help in doing so.

The basic problem is that producing a multi-thread-safe object containing mutable state that is hard, especially if you don't just synchronize all its members (e.g. because you don't want to only allow only 1 scala module to be compiled at a time), and if you want proper lifecycle management (e.g. caching and re-use of classloaders to allow them to get warn, eviction of the cache to avoid memory leaks, proper closeing of autoclosable workers)

Different workers end up with different ad-hoc solutions, e.g.

  • ZincWorkerImpl with its javaOnlyCompilersCache: mutable.Map[Seq[String], SoftReference[Compilers]], classloaderCache: collection.mutable.LinkedHashMap[Long, SoftReference[ClassLoader]], compilerCache: KeyedLockedCache[Compilers]

  • ScalaJSWorkerImpl with its ScalaJSLinker.cache: mutable.Map[LinkerInput, SoftReference[(Linker, IRFileCache.Cache)]]

  • ScalaNativeWorker with its scalaInstanceCache: Option[(Long, workerApi.ScalaNativeWorkerApi)]

  • VisualizeModule.worker returns a tuple of (LinkedBlockingQueue[(Seq[NamedTask[Any]], Seq[NamedTask[Any]], os.Path)], LinkedBlockingQueue[Result[Seq[PathRef]]]) that it expects you to use to pass arguments to the worker and receive results in a thread-safe manner

These all solve the problems above to varying degrees: some disallow parallelism, some have concurrency bugs. Really none of the concurrency concerns should be handled by users, since they're the same for most workers, and Mill should provide the scaffolding to let users plug in their own logic without a care for concurrency

We should provide a more opinionated API to Task.Worker that allows users to delegate these "generic worker" problems to Mill: exactly what that API would look like is TBD, but perhaps something like

// Helper
def createMultiThreadSafeWorker[T](cacheKey: Any, factory: () => T)

// Usage
def myCompilerWorker = Task.Worker(
  createMultiThreadSafeWorker(
    cacheKey = scalaVersion()
    factory = compilerForVersion(scalaVersion())
  )
)

This would provide the proper concurrent-initialization, caching, re-use, and cache eviction so that the user can just worry about providing a cacheKey and factory, and not have to re-invent multi-threaded-safe caches every time they construct a worker

A generalization of #3641

@lihaoyi lihaoyi changed the title Improve framework support for workers Improve framework support for cached multi-thread-safe workers Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant