Engine/client API design: `constraint` and `required` validation #140

eyelidlessness · 2024-06-24T21:26:22Z

This issue is primarily aimed at supporting #130, handling constraint and required validation.

Potential overlap with other concerns

From a user perspective, these validation concerns may have some conceptual overlap with #80, as a broader bucket of other errors/warnings. As I explored possible directions for the core validation functionality, I've also tried to keep these other use cases in mind.

My expectation coming into this was that I would find these concerns have enough natural overlap that a clear API generalization would emerge. I feel less sure of that now than I did initially!

I now believe that there's enough potential complexity to these spec-driven aspects of validation that it'll be more prudent to address them in isolation, but to remain mindful that they may yet have commonality in the future.

Spec implications: `constraint` and `required`

Both constraint and required are specified as arbitrary boolean expressions. The two expressions determine the validity of a node's value:

expression	state	blank	non-blank
`constraint`	`true`*	✅	✅
`constraint`	`false`	✅	❌
`required`	`false`*	✅	✅
`required`	`true`	❌	✅

* = default (expression not defined)
✅ = 'valid'
❌ = 'violation'

Forms may optionally define messages corresponding to each condition, which should be presented to a user when the respective condition is violated:

jr:constraintMsg
jr:requiredMsg

Both cases may be defined either as a string literal, or as a translated expression (jr:itext('id')). Notably: these effectively constitute a large subset of the functionality specified for <label> and <hint> (string literal -> those elements with a bare text child node; ref="jr:itext('id')" otherwise). I don't know if/to what extent Markdown should be supported, but otherwise it seems quite likely the current TextRange APIs are already well suited for both of these message cases.

API directions

Minimal expansion, no new API concepts

A minimal expansion with existing API concepts would essentially add three state properties:

constraint
constraintMsg
requiredMsg

This would be a trivial change from an engine/client API perspective. It would also be a trivial change to implement on the engine side.

A type-level example of this change can be seen in this commit.

While this change would be trivial, it imposes considerable complexity on clients in terms of reasoning about validity:

Determining the actual validity of a node is complex. There is no single point of state which can be checked to determine this. There isn't even a clear combination of states which a client can use to determine it, without inferring some additional details not presently exposed in the engine's API. In more detail...
... while determining the satisfication of the constraint condition is relatively straightforward (is it true?), determining the actual validity of the required condition is much more involved. A client must determine whether the node is required, and if so whether its value is non-blank. For currently represented value node types, there are reasonable heuristics for this, but it's already error prone and could become substantially more so.
Reasoning about the documentation of these states is challenging! So much so that I took at least half a dozen passes to try to simplify their explanations, and I still find them quite convoluted.
There are clear use cases for determining validity not just at a leaf/value node level, but at least at the form level as well. It's quite likely we'll want to facilitate that determination at arbitrary subtree points as well. Doing this at the client level would be reasonable straightforward (tree walking), but it's something that would be disruptive (for multiple clients) to change without some additional engine-level consideration.
There is considerable API ambiguity about when/whether a message will be present for a given validation condition. It's fairly straightforward to express with prose: a message will be present... if the form defines a message for the condition, and if the condition is violated. But the actual implementation logic for consuming that API would be much more involved, particularly due to the complexity of determining whether the required condition is violated.
Violations of each condition are mutually exclusive (a constraint violation can only occur with a non-blank value; a required violation can only occur with a blank value), but this is not at all clear from an API perspective in the minimal approach. Again, this is straightforward to express with prose, but much more complex for a client consuming the API.

Validation-specific expansion, minimal API impact otherwise

A slightly less minimal approach, focused specifically on the validation use cases we want to achieve, could help to address all of the concerns discussed above. An example of this approach can be seen in this diff.

A couple of high-level details of this approach:

Each node has a new node-level validationState property, with validation details specific to the node type: direct validation state on leaf/value nodes, with parent/structural nodes providing reference access to violations of their descendants. This separation isn't strictly necessary! But I think it would be valuable for client implementations (there is one place on every node to look for all validation-related state). And I anticipate that it will have benefits for engine implementation as well (not that we should necessarily let that guide the design).
As mentioned in the previous point, each parent/structural node's validationState contains references to any violations present on any of its descendants (and a few mechanisms to reference the affected nodes from there; we can narrow these down if there's interest). This isn't particularly interesting from an API perspective, but I anticipate it would be helpful for some aspects of the visual design in the web-forms Vue UI client: whether the form itself has validation errors (RootNode.validationState), whether a group or repeat has validation errors (which would allow finer grained guidance/highlighting of those).

Leaf/value nodes' validationState is where this approach gets more interesting (IMO). In validation.ts source order (to cross-reference with the example if helpful):

Consistent presence of message: if a condition is violated, it has a message. If the form does not define one, the engine will provide some sentinel message in its place (distinguished by its messageSource property, which is additive to the underlying TextRange type).
Consistent, binary reasoning about validity state for each condition: if the condition is satisfied, its validity will be 'satisfied'; if it is violated, its validity is 'violation'.
Both conditions (constraint, required) have a validation state at all times, and its validity property can be used to determine the condition's state consistently as a result. (This also determines whether a message is present, per 1).
Clarity of node validity, and that violations are mutually exclusive: if the node has a violation, it is not valid; if its violation is null, the node is valid; the violation may be either a constraint violation or a required violation, never both.

Also notable: BaseNode.ts retains a required state getter, but does not add a constraint getter. There was already good reason for clients to want to know whether a field is required, and that does not change when we add validation logic. But it's unclear whether the result of evaluating the constraint expression has much value to clients (beyond maybe testing purposes, but I think this API approach satisfies those purposes just as well). Instead, clients will probably be more concerned with these questions: Is the condition valid? Is the node valid? Is the form submission valid? Is there a constraint violation message to display (and if so, is it form-defined)?

Consideration for setters: behavior

At present, nothing in either of these proposed directions suggests any change in behavior which would affect client-invoked state changes. More specifically, I assume that:

the engine will and should allow clients to set any value it can already set
the engine will not and should not prevent clients from setting a value if it would violate the node's constraint condition
the engine will not and should not prevent clients from clearing a value if it would violate the node's required condition

We may consider more nuance to these questions, and APIs to accommodate greater nuance, but I expect that these are good default assumptions to start with.

Consideration for setters: return types

Currently, nothing in either approach involves a change to the signature of any client-facing setter method. It does seem likely that we'll want to consider changes to setters' return types in the future. But I think it might be wise to hold off until we take on the broader set of errors (etc) in #80 and beyond.

I do want to take the opportunity to say that I've put considerable thought into where that aspect of the API might go. We've discussed some possibilities like:

Documenting fallibility of state setters at the type level, with Result-like types.
The fact that our current "return everything" approach (each setter method returns RootNode) has been a good forcing function, but doesn't seem to provide much value to clients in real world use, so a typed fallibility approach might be appropriate for multiple state-setting call sites, perhaps all of them.

I think it's very likely both of these instincts will still hold when we expand into other areas of error conditions. I think it's even possible that validation might eventually fit into aspects of these APIs (particularly where we consider other kinds of non-crashing/non-blocking errors).

Consideration for surfacing validation state to clients

Both approaches build on current assumptions about how clients access form state. Namely that:

Clients concerned with point-in-time state will read that state directly
Clients concerned with state changes over time may supply a reactive factory, and state changes will be made reactive by that mechanism

As such, neither approach adds any new validate() method. It is my belief that both approaches provide sufficient information for a client to know whatever it needs/wants to know about validity state throughout a form instance, at the time the client needs/wants to know it.

More succinctly, reading any of these properties should be functionally equivalent to a dedicated validate() method:

AnyNode.currentState.constraint[Msg], AnyNode.currentState.required[Msg]
AnyNode.validationState.*

Consideration for varied "validation modes", filling versus submission workflows

I think it's likely that a user-facing UI client will want to surface validation state (particularly violations and their messages) differently depending on the user's current workflow.

It may be a poor user experience to surface every required violation for every blank field on form load; it may be a good user experience to surface multiple such violations prior to submission
It may be a poor user experience to repeatedly toggle the visibility of a constraint violation message while the user is typing, even if the constraint's validity is actually changing on each key press
It may be a good user experience to surface form-level validation state when a user attempts to submit a form instance, but that probably won't be the best experience midway through filling the form

I've put some thought into a minor supplement for the more expansive API approach, which could potentially aid these distinctions by tracking what kind of action most recently established the validity state for a given node. I think this is possibly worth exploring in more depth. But it seems most likely that this will be best served as a client responsibility, at least to begin with. We can revisit this if there's interest, or if such distinctions seem to be valuable beyond the main web-forms UI client.

Consideration for performance, consistency, synchronous computational guarantees

It's worth noting that all of the APIs proposed here, in both approaches, have the same implied guarantees as all existing aspects of node state:

Validity computation can be read synchronously at any time
Validity messages can be read synchronously at any time
All aspects of validation are intended to be reactive for participating clients

Naively, this is more work that the engine is expected to do at any given time. I think that we have an opportunity to reduce the impact of this by embracing an earlier design decision: every aspect of validation can be lazily computed. It will still be synchronous, but all computation may be deferred until read by a client.

This is certainly possible in the minimal approach (though it would add considerable engine-side implementation complexity). It would be vastly simpler to achieve with the more expansive approach, on the basis that each node's validationState can have an entirely separate approach to reactive synchronization.

The text was updated successfully, but these errors were encountered:

sadiqkhoja · 2024-06-27T22:42:16Z

I like the second approach of having validationState for each node, also the idea of bubbling up the validationState to subtree and root level.

What do you think of calling it Valid with boolean type instead of Validity with 'satisfied' | 'violation' in ConditionSatisfied and ConditionViolation; and ConditionViolated instead of ConditionViolation?
How would validity will behave with relevance? I am assuming non-relevant nodes will always be valid - maybe it doesn't matter 🤷‍♂️

eyelidlessness · 2024-06-28T16:31:28Z

I like the second approach of having validationState for each node, also the idea of bubbling up the validationState to subtree and root level.

🎉

When we last discussed, it sounded like @lognaturel felt good about this approach as well. Sounds like we have a general consensus!

What do you think of calling it Valid with boolean type instead of Validity with 'satisfied' | 'violation' in ConditionSatisfied and ConditionViolation; and ConditionViolated instead of ConditionViolation?

I started here, but hesitated for a few reasons:

Anticipating the possibility that "validity" might actually take on other states beyond this binary.

The most likely prior art for this possibility is Enketo's spec extension to support more than one message for a validation condition. In Enketo, this concept is somewhat dynamic. A use case that I could easily imagine is some set of arbitrary (form-specified) constraint/message pairs. Validity of "constraint" in that framing would definitely be an enumeration (all of the form-defined possibilities, plus one non-violation case). Admittedly, this is all very speculative! But leaving open the possibility didn't seem too onerous either.

More likely, without going beyond spec in any way, it's easy to imagine validity conditions having a state like 'pending': we've deferred validation until some explicit invocation by some API to be determined. Or even 'stale': we previously computed the condition, kept its state in memory, but the value has/may have changed since (and we've deferred recomputation, similar to 'pending').
I was iffy on whether an actual boolean property would give us the type narrowing we'll want for the various API conveniences this design affords. Sometimes I just find it hard to predict when TypeScript will treat a union type as "tagged" for narrowing purposes, and when it will seem to ignore the "tag". But I just took a minute to check, and valid: true; message: ViolationMessage / valid: false; message: null does narrow exactly how we'd want when checking the valid property. So my concern here was moot.
In a vague sense, and with some hindsight, I kind of wish the API were less boolean-heavy in general. As with the first point, it's often the case that some binary state takes on additional enumerations over time. As a generality, this can happen as requirements evolve, as features glom on, or as ideal code gives way to realities like performance optimization.

All of that said, I am happy to go with valid: boolean for now for simplicity's sake. If either of the cases for enumeration become a priority, we can always revisit. I also verified that switching from a boolean to some enum would not be super risky (i.e. if (validationState.valid) { doSomethingWith(validationState.message) } will error if valid: boolean is changed to valid: some | always | truthy | union).

How would validity will behave with relevance? I am assuming non-relevant nodes will always be valid - maybe it doesn't matter 🤷‍♂️

I think from an API perspective, your always-valid instinct is right. And in the engine runtime, I think we can safely short-circuit any non-relevant node's validity computations entirely (which would also make this a reasonable assumption for clients). It probably doesn't matter for web-forms/Vue UI, but it may affect some tests in scenario (and it could definitely be meaningful in hypothetical future tooling-oriented clients).

As discussed in #140 Based on 4c38a22, with some refinements: - Each condition’s validity state is now represented by `valid: boolean` - Minor naming adjustments - Nailed down an initial pair of message constants for engine-fallback messages - Added a method to check whether a message is an engine-fallback, in a way that allows type narrowing and produces those message constants as types - Added a ton of JSDoc documentation clarifying the design, expected usage, some anticipated performance caveats, etc

lognaturel added this to the Next milestone Jun 26, 2024

eyelidlessness mentioned this issue Jul 2, 2024

Engine support for constraint, required validation #154

Merged

eyelidlessness closed this as completed in #154 Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Engine/client API design: `constraint` and `required` validation #140

Engine/client API design: `constraint` and `required` validation #140

eyelidlessness commented Jun 24, 2024

sadiqkhoja commented Jun 27, 2024

eyelidlessness commented Jun 28, 2024

Engine/client API design: constraint and required validation #140

Engine/client API design: constraint and required validation #140

Comments

eyelidlessness commented Jun 24, 2024

Potential overlap with other concerns

Spec implications: constraint and required

API directions

Minimal expansion, no new API concepts

Validation-specific expansion, minimal API impact otherwise

Consideration for setters: behavior

Consideration for setters: return types

Consideration for surfacing validation state to clients

Consideration for varied "validation modes", filling versus submission workflows

Consideration for performance, consistency, synchronous computational guarantees

sadiqkhoja commented Jun 27, 2024

eyelidlessness commented Jun 28, 2024

Engine/client API design: `constraint` and `required` validation #140

Engine/client API design: `constraint` and `required` validation #140

Spec implications: `constraint` and `required`