SPICE-0009: External Readers #10

HT154 · 2024-08-06T23:50:21Z

Implemented in:

bioball

Some high level thoughts:

Using a persistent process with message passing isn't a bad idea--for processes with a higher startup cost (e.g. JVM-based processes), you pay that penalty once for the life of the evaluator, rather than every time a read() happens. However, there's an argument to be made for having the external reader binary be short-lived. For example, this makes it possible to write an external reader that's simply a bash script. And making these things bash-script-able is quite nice:

#!/usr/bin/env bash

set -e

URI="$1"

if [[ "$URI" != ldap:* ]]; then
  >&2 echo "Unexpected URI: $URI. Expected ldap scheme."
  exit 1
fi

ldapsearch -D "${URI:5}"

If it's a persistent process, it should probably be spawned the first time a module or resource is read, then kept alive while the evaluator session is alive (e.g. until it is closed).

I don't know if we need a DiscoverReadersRequest and DiscoverReadersResponse? Wouldn't the evaluator already know about the external readers it needs?

spices/SPICE-0009-external-readers.adoc

stackoverflow

I'm missing an example that shows the whole process. Your example just shows a Pkl file using the reader and a cli call, but it's not clear to me how these things glue together.

Also think having the processes being long-running would put too much hurdles for people who want to contribute a reader, so let's keep that as a future improvement.

For the package support:
That seems to complicated the design quite a bit. Now Pkl has to understand how to properly process packages-as-readers. If my reader is a Ruby script and I want to deploy it as a package, do I have to bundle the script together? Gets even more complicated with executables and multiplatform support. Or maybe I misunderstood you and package readers have to be Pkl-only.
I'm not sure how useful writing readers in Pkl instead of general languages will be. Though perhaps I'm wrong. My point is, maybe pkl packages are not the best distribution mechanism for readers, even though I can't think of a good solution for this.

spices/SPICE-0009-external-readers.adoc

Co-authored-by: Islon Scherer <[email protected]>

HT154 · 2024-08-08T19:16:14Z

Thanks for the review @bioball!

Using a persistent process with message passing isn't a bad idea--for processes with a higher startup cost (e.g. JVM-based processes), you pay that penalty once for the life of the evaluator, rather than every time a read() happens. However, there's an argument to be made for having the external reader binary be short-lived. For example, this makes it possible to write an external reader that's simply a bash script.

I definitely agree that there's utility in enabling the super simple scripting route! My primary motivation for a persistent subprocess is, as you've identified, startup cost.

Here's a real world example: I would like to be able to use an external reader to mediate access to a secret store in a large monorepo. I'd estimate that there are 80+ unique resource URIs that would be read during a "world eval". The process that would implement the external reader currently takes ~0.25 seconds to invoke from the command line (in the happy path), which would add ~20 seconds to the overall evaluation time. Most of that quarter second is process and initialization overhead and the actual read takes ~0.05 seconds, so if the reader process were persistent it would save ~15 seconds from the total evaluation time.

If persistent reader processes are implemented, it also becomes fairly trivial to implement one-shot readers. Spitballing, it might look something like this:

pkl eval <module.pkl> --external-reader 'pkl-cmd ldap=path/to/pkl-ldap.sh ldaps=path/to/pkl-ldap.sh' --allowed-resources ...

Where the pkl-cmd executable implements the external reader message passing API and registers a ResourceReader for each specified scheme->script mapping. I'd be interested in shipping pkl-cmd as part of the Pkl project.

If it's a persistent process, it should probably be spawned the first time a module or resource is read, then kept alive while the evaluator session is alive (e.g. until it is closed).

I don't know if we need a DiscoverReadersRequest and DiscoverReadersResponse? Wouldn't the evaluator already know about the external readers it needs?

The ergonomic motivation for spawning a persistent process on startup and having the readers be discovered at runtime would be to decouple which reader binaries are used from which schemes they provide. If one-shot reader processes are used, the scheme->binary registration is unavoidable, but persistent readers may implement many schemes. Having to perform the same mapping for persistent processes would be very verbose (since setting allowed modules is also required), but would be able to spawn the process on first use instead of during evaluator initialization. Another consideration here is user error: if a user registers a reader for a scheme that it does not provide, how should pkl respond?

HT154 · 2024-08-08T19:31:59Z

Thanks for the review @stackoverflow!

I'm missing an example that shows the whole process. Your example just shows a Pkl file using the reader and a cli call, but it's not clear to me how these things glue together.

I think I've addressed this in other comments, but please let me know if you still have questions.

Also think having the processes being long-running would put too much hurdles for people who want to contribute a reader, so let's keep that as a future improvement.

My main concerns with only targeting "one-shot" reader processes at this time are:

It's not clear how to support globbing (i.e. ListResourcesRequest/ListModulesRequest)
It may make "persistent" readers more complicated to implement in the future from a configuration/CLI perspective
Performance, as outlined above

Above I proposed a pkl-cmd persistent external reader that could function as an "adapter" of sorts to simpler one-shot reader processes. I definitely agree that the simplicity of writing a simple script should be a goal here, and I don't think the more involved persistent reader design rules that out.

For the package support: That seems to complicated the design quite a bit. Now Pkl has to understand how to properly process packages-as-readers.

I may be misinterpreting this concern, but I'm not sure this is so complicated. Reader executables would need to be supplied as package file URIs including the fragment #/path/to/binary.

If my reader is a Ruby script and I want to deploy it as a package, do I have to bundle the script together?

My thought here is to primarily target languages for which (message-passing) bindings already exist. Both Golang and Swift (at least on linux) can produce statically linked binaries that can be packaged dependency-free. If there are bindings for interpreted languages in the future, distribution would likely need to be via the language package manager, which this design does not rule out.

Gets even more complicated with executables and multiplatform support.

Definitely agree, but I think pkl:platform makes this reasonable. I show what this might look like in a PklProject here: #10 (comment)

stackoverflow · 2024-08-09T12:19:34Z

I think I've addressed this in other comments, but please let me know if you still have questions.

You addressed it in the comments, but someone reading the SPICE shouldn't have to read comments to understand it, so it would be nice if that was clear in the SPICE itself.

stackoverflow · 2024-08-09T12:29:45Z

About having readers being persistent:

I think you are approaching this from the wrong side. For Pkl, the reader should just be a process it spawns once it finds a read. If you want it to be persistent (which I consider an edge case) you can run a persistent background process and write a simple script for Pkl to call which would pipe the reads.
This would simplify your proposal and there wouldn't be a need for a pkl-cmd.

Also it's not clear to me how errors and the communication happens. I imagine we can keep the unix tradition of using return codes so 0 would mean success and other numbers would be errors. stderr could contain the error message in case things went wrong and stdout would contain the actual result of the read. But anyway, all of this should be in the SPICE.

bioball · 2024-08-09T22:54:13Z

Josh actually brings up a good point--the process of resolving a read can involve multiple calls to a reader, which can make a short-lived execution quite complicated. For example, resolving a glob star import (e.g. import*("foo:/**.pkl") is linear to the number of directories needed to read.

Additionally, Pkl wants to know three facts about each scheme:

Is it globbable?
Is it hierarchical?
Is it local?

It's quite a rough experience to provide all this to Pkl as CLI args: this is quite long: --external-reader ldap=pkl-ldap --globbable-reader=pkl-ldap --hierarchical-reader=pkl-ldap --local-reader=pkl-ldap.

I echo Islon's concerns about distributing binaries using Pkl packages. I see the value here, but this isn't what packages are designed for. Besides, there are already many ways to distribute binaries; languages typically already offer their own mechanisms for installing executables (go install, npm isntall -g, pip install, etc), and so do OS-level tools (apt-get, rpm, winget, yum, brew). Also: a tarball isn't a sufficient way to distribute executables; some executables need to link to libraries (e.g. glibc), and others require a runtime (e.g. Node.js). It's better to delegate this task to something else.

Some additional thoughts:

Right now, the message passing API is held outside of pkl-core. We did this to reduce the number of dependencies (no msgpack), and to minimize the amount of core that gets shipped in our core library. This feature implies that much of our message passing logic (e.g. MessageTransport) gets moved to pkl-core. However, I think we want this anyway--we've talked about moving BinaryEvaluator into pkl-core, and providing a serializer/deserializer to our Java library users (and perhaps even in-language).
I don't know if we want shelling out to executables to become the norm for Pkl. This feature makes sense to bring feature parity to the CLI, but library users may not want Pkl to spawn executables. However, it still makes sense that library users should be able to configure Pkl to do this if the CLI can do this too.

… future directions

HT154 · 2024-08-17T01:45:21Z

I've taken another pass on this. Here are the major changes:

Moved URI-based reader executables to future directions
Addressed exec-per-read and up-front scheme->reader registration under alternatives considered
Added significant detail to the detailed design section

spices/SPICE-0009-external-readers.adoc

This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.

This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. This is required to allow deferred launch of external reader processes without requiring up-front configuration of scheme->process mappings. More details on why this is required here: apple/pkl-evolution#10 (comment) One concern here (which applies before this change as well) is that the semantics of module key factories and resource readers is reversed. For module key factories, the first factory that answers for a URI scheme "wins", while for resource readers, the _last_ reader that answers for a URI scheme "wins". This PR preserves that behavior, but it may be worth reconsidering this design at this time. This behavior can be observed in practice in `ResourceReadersEvaluatorTest.\`module path\`` where `ResourceReaders.modulePath` is added after the pre-configured `ResourceReaders.classPath` and takes precedence.

This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. This is required to allow deferred launch of external reader processes without requiring up-front configuration of scheme->process mappings. More details on why this is required here: apple/pkl-evolution#10 (comment) One concern here (which applies before this change as well) is that the semantics of module key factories and resource reader factories is reversed. For module key factories, the first factory that answers for a URI scheme "wins", while for resource reader factories, the _last_ reader factory that answers for a URI scheme "wins". This PR preserves that behavior, but it may be worth reconsidering this design at this time. This behavior can be observed in practice in <code>ResourceReadersEvaluatorTest.\`module path\`</code> where `ResourceReaders.modulePath` is added after the pre-configured `ResourceReaders.classPath` and takes precedence. This is a fairly large breaking API change, but in practice most clients will only need to replace a few lines in their calls to `EvaluatorBuilder`, eg. ```java .addResourceReader(ResourceReaders.environmentVariable()) // becomes .addResourceReaderFactory(ResourceReaderFactories.environmentVariable()) ```

[SPICE-0009](apple/pkl-evolution#10) * Add `EvaluatorOptions.ExternalModuleReaders` and `EvaluatorOptions.ExternalResourceReaders`. * Add `ExternalReaderRuntime` to host the child process side of the external reader workflow.

[SPICE-0009](apple/pkl-evolution#10)

[SPICE-0009](apple/pkl-evolution#10) * Add `EvaluatorOptions.ExternalModuleReaders` and `EvaluatorOptions.ExternalResourceReaders`. * Add `ExternalReaderRuntime` to host the child process side of the external reader workflow.

spices/SPICE-0009-external-readers.adoc

bioball · 2024-09-24T13:58:49Z

spices/SPICE-0009-external-readers.adoc

+externalModuleReaders: Mapping<String, ExternalReader>?
+
+externalResourceReaders: Mapping<String, ExternalReader>?
+
+class ExternalReader {
+ /// May be specified as an absolute path to an executable
+ /// May also be specified as just an executable name, in which case it will be resolved according to the PATH environment variable
+ executable: String
+
+ /// Command line arguments that will be passed to the reader process
+ arguments: Listing<String>
+}


I think let's be consistent with clientModuleReaders and clientResourceReaders, and make each of these a List<ExternalReader>, where ExternalReader is:

class ExternalReader { executable: String arguments: Listing<String> scheme: String }

Would this just be for CreateEvaluatorRequest? Or would it apply to pkl:EvaluatorSettings as well? I've got some usability concerns about the latter (mostly around easy reuse of reader configs for differ schemes or between resources/modules).

For CreateEvaluatorRequest specifically: what should the precedence order be? Should the first reader with the listed scheme win or the last? Twist: this behavior is currently different between clientModuleReaders (last wins) and clientResourceReaders (first wins), similar to how it works with EvaluatorBuilder.

bioball · 2024-09-24T14:01:02Z

spices/SPICE-0009-external-readers.adoc

+The arguments are passed as `=`-delimited key-value pairs where the key is the reader's URI scheme.
+The argument values may be passed as space-separated strings where the first element becomes `executable` and any remainder becomes `arguments`.
+
+TBD: It might be best if the argument value is link:https://docs.python.org/3/library/shlex.html#shlex.split[shlex'd] instead of split to support passing arguments to the reader process that contain spaces.


What does this look like?

Let's say that an external reader is configured like so:

--external-resource myScheme='myExecutable arg1 "arg2 with spaces"'

A naive white space trim+split yields

executable = "myExecutable" arguments { "arg1" "\"arg2" "with" "spaces\"" }

While a "shlex" over the input behaves much more similarly to what a user might expect and results in

arguments { "arg1" "arg2 with spaces" }

So without the shlex operator for the CLI args it's actually not possible to express the same configs on the CLI as it is via PklProject.

spices/SPICE-0009-external-readers.adoc

…->Initialize

bioball · 2024-09-24T15:53:04Z

spices/SPICE-0009-external-readers.adoc

 requestId: Int

- /// The scheme of the resource to discover the spec for
+ /// The scheme of the resource to initialize.
 scheme: String


The reader should already know what schemes it should read when it receives a Read*Request. Do we need scheme here?

The initialize messages only request the spec for a single scheme at a time. It would be possible to do an "all at once" sort of thing like the original proposal, but this method is actually super convenient from an implementation standpoint as it enables using the battle-tested ConcurrentHashMap response memoization already widely employed by the message passing API. My initial pass at the implementation that did the "all at once" discovery needed some messy locking; it would be great to pass the buck to the standard library instead.

[SPICE-0009](apple/pkl-evolution#10)

[SPICE-0009](apple/pkl-evolution#10) * Add `EvaluatorOptions.ExternalModuleReaders` and `EvaluatorOptions.ExternalResourceReaders`. * Add `ExternalReaderRuntime` to host the child process side of the external reader workflow.

[SPICE-0009](apple/pkl-evolution#10)

This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.

[SPICE-0009](apple/pkl-evolution#10)

This is preparatory work for [SPICE-0009](apple/pkl-evolution#10). It is being contributed in a separate pull request to ease review. The Message, Message(En|De)coder, and MessageTransport types have been ported to Java and moved to a new `org.pkl.core.messaging` package.

[SPICE-0009](apple/pkl-evolution#10)

[SPICE-0009](apple/pkl-evolution#10) * Add `EvaluatorOptions.ExternalModuleReaders` and `EvaluatorOptions.ExternalResourceReaders`. * Add `ExternalReaderRuntime` to host the child process side of the external reader workflow.

[SPICE-0009](apple/pkl-evolution#10)

SPICE-0009: External Readers

198f6ff

HT154 marked this pull request as ready for review August 7, 2024 01:26

bioball reviewed Aug 7, 2024

View reviewed changes

spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved

spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved

spices/SPICE-0009-external-readers.adoc Show resolved Hide resolved

stackoverflow reviewed Aug 8, 2024

View reviewed changes

spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved

spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved

spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved

HT154 and others added 2 commits August 8, 2024 10:44

Update spices/SPICE-0009-external-readers.adoc

d08001f

Co-authored-by: Islon Scherer <[email protected]>

Update spices/SPICE-0009-external-readers.adoc

bfc092d

Co-authored-by: Islon Scherer <[email protected]>

HT154 and others added 2 commits August 12, 2024 23:43

Document alternatives considered, move URI support for executables to…

ed2dcf3

… future directions

Expand detailed design

82ebff4

HT154 requested review from bioball and stackoverflow August 17, 2024 01:45

bioball reviewed Aug 19, 2024

View reviewed changes

Address review feedback

90261d9

HT154 mentioned this pull request Sep 9, 2024

Refactor message-passing logic into pkl-core apple/pkl#640

Closed

HT154 mentioned this pull request Sep 15, 2024

Refactor ResourceReader to support multiple schemes apple/pkl#646

Closed

HT154 mentioned this pull request Sep 15, 2024

Add ResourceReaderFactory abstraction apple/pkl#647

Closed

HT154 added a commit to HT154/pkl that referenced this pull request Sep 23, 2024

Implement SPICE-0009 External Readers

337ab5a

[SPICE-0009](apple/pkl-evolution#10)

bioball reviewed Sep 24, 2024

View reviewed changes

spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved

bioball reviewed Sep 24, 2024

View reviewed changes

spices/SPICE-0009-external-readers.adoc Outdated Show resolved Hide resolved

Clarify that process termination happens on evaluator close, Discover…

939cc48

…->Initialize

bioball reviewed Sep 24, 2024

View reviewed changes

HT154 added a commit to HT154/pkl that referenced this pull request Sep 24, 2024

Implement SPICE-0009 External Readers

9d8a587

[SPICE-0009](apple/pkl-evolution#10)

HT154 added a commit to HT154/pkl-swift that referenced this pull request Sep 25, 2024

Implement SPICE-0009 External Readers

13ce398

[SPICE-0009](apple/pkl-evolution#10)

HT154 added a commit to HT154/pkl-swift that referenced this pull request Sep 25, 2024

Implement SPICE-0009 External Readers

9f74f3f

[SPICE-0009](apple/pkl-evolution#10)

HT154 added a commit to HT154/pkl-swift that referenced this pull request Sep 25, 2024

Implement SPICE-0009 External Readers

0140c17

[SPICE-0009](apple/pkl-evolution#10)

HT154 added a commit to HT154/pkl-swift that referenced this pull request Sep 25, 2024

Implement SPICE-0009 External Readers

a509e25

[SPICE-0009](apple/pkl-evolution#10)

HT154 mentioned this pull request Sep 25, 2024

Implement SPICE-0009 External Readers apple/pkl-swift#26

Open

Add CloseExternalProcess message, update flow

cd20d2e

HT154 force-pushed the external-readers branch from a237987 to cd20d2e Compare October 2, 2024 23:17

HT154 added a commit to HT154/pkl that referenced this pull request Oct 3, 2024

Implement SPICE-0009 External Readers

4fa8480

[SPICE-0009](apple/pkl-evolution#10)

HT154 added a commit to HT154/pkl that referenced this pull request Oct 7, 2024

Implement SPICE-0009 External Readers

e73c5e9

[SPICE-0009](apple/pkl-evolution#10)

HT154 added a commit to HT154/pkl-swift that referenced this pull request Oct 10, 2024

Implement SPICE-0009 External Readers

317def7

[SPICE-0009](apple/pkl-evolution#10)

Add ExternalReaderRuntime Java API

46ba868

HT154 force-pushed the external-readers branch from 75171b3 to 46ba868 Compare October 11, 2024 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPICE-0009: External Readers #10

SPICE-0009: External Readers #10

HT154 commented Aug 6, 2024 •

edited

Loading

bioball left a comment •

edited

Loading

stackoverflow left a comment

HT154 commented Aug 8, 2024

HT154 commented Aug 8, 2024

stackoverflow commented Aug 9, 2024

stackoverflow commented Aug 9, 2024

bioball commented Aug 9, 2024

HT154 commented Aug 17, 2024

bioball Sep 24, 2024 •

edited

Loading

HT154 Sep 24, 2024

bioball Sep 24, 2024

HT154 Sep 24, 2024

bioball Sep 24, 2024

HT154 Sep 24, 2024

SPICE-0009: External Readers #10

Are you sure you want to change the base?

SPICE-0009: External Readers #10

Conversation

HT154 commented Aug 6, 2024 • edited Loading

bioball left a comment • edited Loading

Choose a reason for hiding this comment

stackoverflow left a comment

Choose a reason for hiding this comment

HT154 commented Aug 8, 2024

HT154 commented Aug 8, 2024

stackoverflow commented Aug 9, 2024

stackoverflow commented Aug 9, 2024

bioball commented Aug 9, 2024

HT154 commented Aug 17, 2024

bioball Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

HT154 Sep 24, 2024

Choose a reason for hiding this comment

bioball Sep 24, 2024

Choose a reason for hiding this comment

HT154 Sep 24, 2024

Choose a reason for hiding this comment

bioball Sep 24, 2024

Choose a reason for hiding this comment

HT154 Sep 24, 2024

Choose a reason for hiding this comment

HT154 commented Aug 6, 2024 •

edited

Loading

bioball left a comment •

edited

Loading

bioball Sep 24, 2024 •

edited

Loading