You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Task.Workers in Mill are intended for managing mutable state, in-memory/on-disk/in-subprocesses. Before Mill was parallel that was easy, but Mill 0.12.0 is parallel-by-default. Thus defining a worker requires that the user return a multi-thread-safe object, but Mill provide no help in doing so.
The basic problem is that producing a multi-thread-safe object containing mutable state that is hard, especially if you don't just synchronize all its members (e.g. because you don't want to only allow only 1 scala module to be compiled at a time), and if you want proper lifecycle management (e.g. caching and re-use of classloaders to allow them to get warn, eviction of the cache to avoid memory leaks, proper closeing of autoclosable workers)
Different workers end up with different ad-hoc solutions, e.g.
ZincWorkerImpl with its javaOnlyCompilersCache: mutable.Map[Seq[String], SoftReference[Compilers]], classloaderCache: collection.mutable.LinkedHashMap[Long, SoftReference[ClassLoader]], compilerCache: KeyedLockedCache[Compilers]
ScalaJSWorkerImpl with its ScalaJSLinker.cache: mutable.Map[LinkerInput, SoftReference[(Linker, IRFileCache.Cache)]]
ScalaNativeWorker with its scalaInstanceCache: Option[(Long, workerApi.ScalaNativeWorkerApi)]
VisualizeModule.worker returns a tuple of (LinkedBlockingQueue[(Seq[NamedTask[Any]], Seq[NamedTask[Any]], os.Path)], LinkedBlockingQueue[Result[Seq[PathRef]]]) that it expects you to use to pass arguments to the worker and receive results in a thread-safe manner
These all solve the problems above to varying degrees: some disallow parallelism, some have concurrency bugs. Really none of the concurrency concerns should be handled by users, since they're the same for most workers, and Mill should provide the scaffolding to let users plug in their own logic without a care for concurrency
We should provide a more opinionated API to Task.Worker that allows users to delegate these "generic worker" problems to Mill: exactly what that API would look like is TBD, but perhaps something like
This would provide the proper concurrent-initialization, caching, re-use, and cache eviction so that the user can just worry about providing a cacheKey and factory, and not have to re-invent multi-threaded-safe caches every time they construct a worker
Task.Worker
s in Mill are intended for managing mutable state, in-memory/on-disk/in-subprocesses. Before Mill was parallel that was easy, but Mill 0.12.0 is parallel-by-default. Thus defining a worker requires that the user return a multi-thread-safe object, but Mill provide no help in doing so.The basic problem is that producing a multi-thread-safe object containing mutable state that is hard, especially if you don't just
synchronize
all its members (e.g. because you don't want to only allow only 1 scala module to be compiled at a time), and if you want proper lifecycle management (e.g. caching and re-use of classloaders to allow them to get warn, eviction of the cache to avoid memory leaks, properclose
ing of autoclosable workers)Different workers end up with different ad-hoc solutions, e.g.
ZincWorkerImpl
with itsjavaOnlyCompilersCache: mutable.Map[Seq[String], SoftReference[Compilers]]
,classloaderCache: collection.mutable.LinkedHashMap[Long, SoftReference[ClassLoader]]
,compilerCache: KeyedLockedCache[Compilers]
ScalaJSWorkerImpl
with itsScalaJSLinker.cache: mutable.Map[LinkerInput, SoftReference[(Linker, IRFileCache.Cache)]]
ScalaNativeWorker
with itsscalaInstanceCache: Option[(Long, workerApi.ScalaNativeWorkerApi)]
VisualizeModule.worker
returns a tuple of(LinkedBlockingQueue[(Seq[NamedTask[Any]], Seq[NamedTask[Any]], os.Path)], LinkedBlockingQueue[Result[Seq[PathRef]]])
that it expects you to use to pass arguments to the worker and receive results in a thread-safe mannerThese all solve the problems above to varying degrees: some disallow parallelism, some have concurrency bugs. Really none of the concurrency concerns should be handled by users, since they're the same for most workers, and Mill should provide the scaffolding to let users plug in their own logic without a care for concurrency
We should provide a more opinionated API to
Task.Worker
that allows users to delegate these "generic worker" problems to Mill: exactly what that API would look like is TBD, but perhaps something likeThis would provide the proper concurrent-initialization, caching, re-use, and cache eviction so that the user can just worry about providing a
cacheKey
andfactory
, and not have to re-invent multi-threaded-safe caches every time they construct a workerA generalization of #3641
The text was updated successfully, but these errors were encountered: