Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added in new blueprint for handling interruptions #356

Merged
merged 10 commits into from
Jun 1, 2023

Conversation

Ktmi
Copy link

@Ktmi Ktmi commented Apr 3, 2023

This is intended to resolve kytos-ng/maintenance#59. This provides a blueprint for a mechanism to handle interruptions such as maintenance windows.

@Ktmi Ktmi requested a review from a team as a code owner April 3, 2023 16:46
@Ktmi Ktmi added documentation Improvements or additions to documentation blueprint 2023.1 Kytos-ng 2023.1 labels Apr 3, 2023
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done @Ktmi, great how this has evolved and emerged as interruptions.

Most of it looks good to me, asked minor implementation details clarifications though. Also, raised some questions that needs confirmation from network engineers to also confirm.

docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
@viniarck
Copy link
Member

viniarck commented Apr 4, 2023

Another adjacent thing to consider, once a service has its primary path interrupted but then it eventually ends the interruption, is it expected that the service should be using its primary path if it's available again or it's OK to continue in a backup or failover path?

Currently, mef_eline won't switchover to the primary once it's available. In a production deployment, that means that network engineers would have to redeploy the EVCs if they want to force trying to use the primary path again.

docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
@viniarck
Copy link
Member

Another adjacent thing to consider, once a service has its primary path interrupted but then it eventually ends the interruption, is it expected that the service should be using its primary path if it's available again or it's OK to continue in a backup or failover path?

Currently, mef_eline won't switchover to the primary once it's available. In a production deployment, that means that network engineers would have to redeploy the EVCs if they want to force trying to use the primary path again.

This thread is still unanswered.

docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
@Ktmi
Copy link
Author

Ktmi commented May 8, 2023

Another adjacent thing to consider, once a service has its primary path interrupted but then it eventually ends the interruption, is it expected that the service should be using its primary path if it's available again or it's OK to continue in a backup or failover path?
Currently, mef_eline won't switchover to the primary once it's available. In a production deployment, that means that network engineers would have to redeploy the EVCs if they want to force trying to use the primary path again.

This thread is still unanswered.

I spoke with @italovalcy and the response is that for dynamic paths, when the original interruption that affected them ends, continue to use the recomputed path, unless the recomputed path was using flexible metrics and didn't satisfy all criteria. For static paths, we switch back to the original path.

docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
docs/blueprints/EP035.rst Outdated Show resolved Hide resolved
@viniarck
Copy link
Member

viniarck commented May 8, 2023

For static paths, we switch back to the original path.

Cool. Please, later on map this requirement as an issue on mef_eline that's only desired circuits with static paths, that way one day it can be prioritized accordingly.

@viniarck
Copy link
Member

viniarck commented May 8, 2023

@Ktmi there's also a conflict with docs/blueprint/EP035.rst, can you bump this as EP037? I've also closed most of the threads that I opened, followed up with minor ones to confirm we're on the same page.

@viniarck
Copy link
Member

@Ktmi, nicely done consolidating this blueprint, it's getting very close to land, I'm going to approve once these minor threads are closed/resolved:

Thanks, David.

Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done consolidating this blueprint @Ktmi.

If anything else on mef_eline regarding provisioning paths during a maintenance is needed you can create a new issue, but since that won't impact much this blueprint, since the blueprint ended up very well agnostically solving the problem with interruptions, I'll go ahead and already leave it approved. If nobody has any extra comments or reviews we can merge this PR by Monday.

Copy link

@italovalcy italovalcy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have a few questions/comments that I discussed with David/Vinicius on slack (specially the need for SERVICE_PROVIDERS setting versus using callback functions on top of existing events enhanced with a dry-run parameter). However, overall, the blueprint is good to go!

docs/blueprints/EP037.rst Outdated Show resolved Hide resolved
Co-authored-by: Italo Valcy S Brito <[email protected]>
@viniarck
Copy link
Member

viniarck commented Jun 1, 2023

Let's ship it then. Congrats @Ktmi, and thanks @italovalcy for you feedback.

If anything else minor is needed it'll be augmented in a next iteration.

@viniarck viniarck merged commit 0e48eda into master Jun 1, 2023
@viniarck viniarck deleted the blueprint/interruptions branch June 1, 2023 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.1 Kytos-ng 2023.1 blueprint documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refine existing blueprint
3 participants