Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add commit confirmed extension to SetRequest #196

Merged
merged 7 commits into from
Dec 19, 2023

Conversation

kidiyoor
Copy link
Contributor

No description provided.

@hellt
Copy link
Contributor

hellt commented Sep 21, 2023

Hi @kidiyoor
is there a proto defining this proposed extension?
We at Nokia created a custom extension for this as well, but if you're working on a registered extension then it makes sense to converge, I believe.

@kidiyoor
Copy link
Contributor Author

@hellt yes, the goal is to converge on the extension. You can find the proto in the proposed md file. Could you share how you r custom extension look like ?

@hellt
Copy link
Contributor

hellt commented Sep 22, 2023

@kidiyoor yes, we would like to collaborate on this one. I will have to gather some details internally before pasting our extension proto.

@earies
Copy link

earies commented Sep 22, 2023

A few initial comments (will just lump here vs. inline) as we start to breach upon more reinvention of long standing features from other well-defined protocols.

  • What is the motivation behind this being a completely separate extension vs. baked directly into gNMI? I will assume desired behavior for some but not all?
  • The behaviors described here are similar yet diverge RFC6241 style confirmed-commit capabilities. What is the motivation for divergence? While the feature itself is well understood to transactional models of the current/past implementations, can you elaborate on why the precise behaviors of RFC6241 are insufficient requiring divergence across all implementations that have provided this support for various years? (e.g. rational behind the reinvention to fit a concept into gNMI that was unaccounted for prior)
  • Section 2.2.2 states that during the confirm call, a client should send the same configuration followed by subsequent logic to compare the configuration to the currently active running configuration (that was previously committed with a rollback_duration. This is not how RFC6241 style confirmation works but rather is suggesting a new feature that imo wastes bytes on the wire and now requires additional logic and processing on the element. A confirmation in the NETCONF world is just "another commit" that can be performed by any client (either existing session or new) - this subsequent commit could be to confirm or introduce additional changes. (See <persist-id> here as well for the association to client sessions) - This similar logic could be used here to assign an identifier for later confirmation vs. resending an entire payload
  • Section 2.2.3 also diverges behaviors from RFC6241. Why is there a need to prevent another commit/Set whilst the duration timer is in effect?
  • Is the plan to rely solely off error-messaging to understand if an element is currently in a rollback_duration time window? This is where I think a state leaf would be helpful (or 2) to indicate if an element is currently in this state (e.g. commit-rollback=true) and how much time may be left (e.g. commit-rollback-timer=27)

In reality, gNMI Set() abstracts out important distinct transactional operations that the NETCONF world has had for some time. These operations have been part of operational workflows for some time (whether good or bad) and while this may introduce back a feature from this world, other gating factors for various operators have also been rollback as well as any diff capabilities in pre-staging environments (which relies off of datastores more and more). If there is ever consideration of introducing those features back either directly or via extensions, they should also be considered or a best practice guide indicating the reason why these features may never exist for the intentions of gNMI operations.

@robshakir
Copy link
Contributor

Ebben -- whilst I'm sympathetic to "some operators might need additional things", all of the OpenConfig work (and I'd say open source contributions in general) are going to be driven by a use case that exists today. If these operators that want to have a wider transactional framework in gNMI (which I'm unsure that I personally am supportive of, for the record), then can we go ahead and discuss that in the context of their use cases -- along with contributions that move this forward please?

I'm not clear that deviations from 6241 are a first-class problem -- these are different protocols with different designs. Implementation decisions that cause operational overhead/problems, and/or cannot be implemented in shipping implementations are more of concern. I do not consider "we baked this in an RFC" as a yardstick for excellent (or even implementable) technical design.

w.r.t the specific question of "why an extension?". Practically, it is extremely complex for us to keep a single specification that describes everything in gNMI. Extensions allow the protocol to be extended in a manner that allows new use cases to be covered that aren't necessarily relevant to all implementations (see master election for another example). On top of that, it allows us to provide subsets of the protocol that can be described in more constrained specifications. Practically, my preference is to extend gNMI primarily via extensions going forward. In some cases, this isn't possible (see union_replace as something that would be extremely difficult to add through an extension), so base specification changes are required, but generally, I feel if we can be more constrained that will be more comprehensible to users as the use cases gNMI covers expand.

thx,
r.

@dplore
Copy link
Member

dplore commented Oct 4, 2023

+1 on extension being the primary way to update the gNMI proto. The base gNMI proto should be considered a minimum implementation. With extensions we create additional features that are in addition to the base proto. The target for this extension is to be a well known extension

Regarding the requirements for the confirm message, I think we can have a debate about what is efficient and supports the operational use cases. We should avoid referencing prior art as the only justification. One should reference operational use cases (#1) , efficiency and existing implementations.

@hellt
Copy link
Contributor

hellt commented Oct 4, 2023

From Nokia perspective I am aiming to post our proposal and implementation example next week.

Will do this as a PR against the current master branch for an easier review process

rpc/gnmi/gnmi-commit-confirmed.md Outdated Show resolved Hide resolved
rpc/gnmi/gnmi-commit-confirmed.md Outdated Show resolved Hide resolved
@earies
Copy link

earies commented Oct 6, 2023

I do not consider "we baked this in an RFC" as a yardstick for excellent (or even implementable) technical design
...
We should avoid referencing prior art as the only justification

I think there is a misunderstanding of my point. My point is not a blind comparison to an RFC but rather drawing reference to existing standards, implementations and architectures that have been shipping for years since a majority of this work is reinventing the wheel often coming in reverse but choosing to deviate without clear reason.

IMO when reinventing the wheel, there should be an understanding of history, existing implementations and clear problem statements followed with proper justification as to the reason why a prior technology cannot suffice.

The concept in this PR is the same from prior technologies but there is a choice to change some behaviors in a less efficient fashion and that is what I am challenging here is all. You can choose to align to other like technologies or not but let's get some good reasoning behind such. A PR without description or prior reference is also not a good justification.

Many implementations will support all prior technologies concurrently thus obvious, the more alignment, the more likelyhood of desired outcomes. Any misalignment is new implementation which is fine but should be justified with an understanding of co-existance.

@kidiyoor
Copy link
Contributor Author

kidiyoor commented Oct 9, 2023

Above comments already address why this should be an extension over modifications to the core. The initial proposal did deviate from the RFC, as I was not aware of its existence. It makes sense to align with the RFC as much as possible. I have incorporated most of feedbacks and created a new proposal #197. We can probably continue discuss about the open questions in the new issue.

@hellt
Copy link
Contributor

hellt commented Oct 11, 2023

Hi all,
Internally we had developed an extension for commit-confirmed that we would like to share with the wider group #198

The main differences from the proposal in #196 and #197 is the usage of commit_id to provide better session management and prevent concurrent access issues.

We also used a set of actions, expressed as an enum to indicate the procedure call to use.

Updating the proposal as per the Proposal-1 of in [openconfig#197](openconfig#197)
Moving rollback_duration inside *CommitRequest* action eliminates wrong client usage of passing rollback_duration during confirm or cancel call.
@hellt
Copy link
Contributor

hellt commented Nov 15, 2023

I believe the reference should be more verbose. Maybe a good idea to look at #198 and try to minimize potential uncertainties by clarifying the exact workflows/call_flows expected.

@dplore dplore merged commit 5b6570a into openconfig:master Dec 19, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants