design-proposal: VirtualMachineInstanceMigration - Live migration to a named node #320

tiraboschi · 2024-09-03T16:00:03Z

What this PR does / why we need it:
Adding a design proposal to extend VirtualMachineInstanceMigration
object with an additional API to let a cluster admin
try to trigger a live migration of a VM injecting
on the fly and additional NodeSelector constraint.
The additional NodeSelector can only restrict the set
of Nodes that are valid target for the migration
(eventually down to a single host).
All the affinity rules defined on the VM spec are still
going to be satisfied.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes https://issues.redhat.com/browse/CNV-7075

Special notes for your reviewer:
Something like this was directly proposed/implemented with kubevirt/kubevirt#10712 getting already discussed there.

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Design: A design document was considered and is present (link) or not required
PR: The PR description is expressive enough and will help future contributors
Code: Write code that humans can understand and Keep it simple
Refactor: You have left the code cleaner than you found it (Boy Scout Rule)
Upgrade: Impact of this change on upgrade flows was considered and addressed if required
Testing: New code requires new unit tests. New features and bug fixes require at least on e2e test
Documentation: A user-guide update was considered and is present (link) or not required. You want a user-guide update if it's a user facing feature / API change.
Community: Announcement to kubevirt-dev was considered

Release note:

design-proposal: VirtualMachineInstanceMigration - Live migration to a named node

kubevirt-bot · 2024-09-03T16:00:06Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign davidvossel for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Follow-up and derived from: kubevirt#10712 Implements: kubevirt/community#320 TODO: add functional tests Signed-off-by: zhonglin6666 <[email protected]> Signed-off-by: Simone Tiraboschi <[email protected]>

dankenigsberg

Lovely to see this clear design proposal (even if I don't like anything that assumes a specific node is long-living). I have two questions, though.

dankenigsberg · 2024-09-03T16:59:09Z

design-proposals/migration-target.md

+
+## Goals
+- A user allowed to trigger a live-migration of a VM and list the nodes in the cluster is able to rely on a simple and direct API to try to live migrate a VM to a specific node.
+- The explict migration target overrules a nodeSelector or affinity and anti-affinity rules defined by the VM owner.


I find this odd, as the VM and the application in it may not function well (or at all) if affinity is ignored. Can you share more about the origins of this goal? I'd expect the target node to be ANDed with existing anti/affinity rules.

I tend to think that as a cluster admin that is trying to force a VM to migrate to named node this is the natural and expected behaviour:
if I explicitly select a named node, I'm expecting that my VM will be eventually migrated there and nowhere else (such as on a different node selected by the scheduler according to a weighted combination of affinity criteria and resource availability and so on); then I can tolerate that the live migration will fail since I chose a wrong node, but the controller should only try to live-migrate it according to what I'm explicitly asking for.
And by the way this is absolutely consistent with the native k8s behaviour for pods.
spec.nodeName for pods is under spec for historical reasons but it's basically controlled by the scheduler:
when a pod is going to be executed, the scheduler is going to check it and, according to available cluster resources, nodeselectors, weighted affinity and anti-affinity rules and so on, it's going to select a node and write it on spec.nodeName on the pod objects. At this point the kubelet on the named node will try to execute the Pod on that node.
If the user explicitly sets spec.nodeName on a pod (or in the template in a deployment and so on), the scheduler is not going to be involved in the process since the pod is basically already scheduled for that node and nothing else and so the kubelet on that node will directly try to execute it there eventually failing.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename explictly state:

If the nodeName field is not empty, the scheduler ignores the Pod and the kubelet on the named node tries to place the Pod on that node.
Using nodeName overrules using nodeSelector or affinity and anti-affinity rules.

And this in my opinion is exactly how we should treat a Live migration attempt to a named node.

Let's take the following example (this is a real world use-case):

An admin is adding a new node to a cluster to take it into prod. This node has a taint to prevent workloads to immediately land there.

The admin wants to migrate a VM to this now to validted it is working properly.

If we AND a new selector for this node, then the migration will not take place, because there is the taint. We'd also need to add a toleration to get the vm scheduled to that node.

With spec.nodeName it would be no issue - initially - it could become one if Require*atRuntime effects are used.
However, with spec.nodeName all other validations - CPU caps, extended, storage, and local resources etc will be ignored. We are asking a VM to not start.
Worse: It would be really hard now to understand WHY the vm is not launching.

Thus I think we have to AND to the node selector, but need code to understand taints specifically (because taints keep workloads away).
Then we still need to think about a generic mechanism to deal with reasons of why a pod can not be placed on the selected node.

I do not like taking examples from the historically-understandable Pod.spec.nodeName. Node identity is not something that should have typically been exposed to workload owners.

Can you summarize your reasoning into the proposal? I think I understand it now, but I am not at ease with it. For example, a cluster admin may easily violate anti/affinity rules that are important for app availability.

@fabiand with taints is a bit more complex: the valid effects for a taint are NoExecute, NoSchedule and PreferNoSchedule.
Bypassing the scheduler directly setting spec.nodeName will allow us to bypass taints with NoSchedule and PreferNoSchedule effect but, AFAIK, it will be still blocked by a NoExecute that is also enforced by the Kubelect with eviction.

@dankenigsberg yes, this is a critical aspect of this design proposal so we should carefully explore and weight the different alternatives tracking them down in the design proposal itself as a future reference.

In my opinion the choice strictly depends on the use case and the power we want to offer to the cluster admin when creating a live migration request to a named node.

Directly setting spec.nodeName on the target pod will completely bypass all the scheduling hints (spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution) and constraints (spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution) meaning that the target pod will be started on the named nome regardless how the VM is actually configured.

Another option is trying to append/merge (this sub-topic deserves by itself another discussion) something like

spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchFields: - key: metadata.name operator: In values: - <nodeName>

to the affinity rules already defined on the VM.
My concern with this choice is that affinity/anti-affinity grammar is pretty complex so, if the VM owner already defined some affinity/anti-affinity rules, we can easily end up with a set of conflicting rules so that the target pod cannot be scheduled on the named node as on any other node.

If the use case that we want to address is giving to the cluster admin the right to try migrating a generic VM to a named node (for instance for maintenance/emergency reasons), this is approach is not fully addressing it with many possible cases where the only viable option is still about manually overriding affinity/anti-affinity rules set by the VM owner.

I still tend to think that the always bypass the scheduler with a spec.nodeName is the K.I.S.S. approach here if try to forcing a live migration to a named node is exactly what the cluster admin is trying to do.

I summarized this considerations into the proposal itself, let's continue from there.

design-proposals/migration-target.md

dankenigsberg · 2024-09-03T17:08:29Z

design-proposals/migration-target.md

+
+# Implementation Phases
+A really close attempt was already tried in the past with https:/kubevirt/kubevirt/pull/10712 but the Pr got some pushbacks.
+A similar PR should be reopened, refined and we should implement functional tests.


Would you outline the nature of the pushback? Do we currently have good answers to the issues raised back then?

Trying to summarize (@EdDev please keep me honest on this), it was somehow considered a semi-imperative approach and it was pointed out that a similar behavior could already indirectly be achieved modifying on the fly and then reverting affinity rules on the VM object.
see: kubevirt/kubevirt#10712 (comment)
and: kubevirt/kubevirt#10712 (comment)

How much this is imperative is questionable: at the end we already have a VirtualMachineInstanceMigration object that you can use to declare that you want to trigger a live migration, this is only about letting you also declare that you want to have a live migration to a named host.

The alternative approach based on amending the affinity rules on the VM object and waiting for the LiveUpdate rollout strategy to propagate it to the VMI before trying a live migration is described, pointing out its main drawback, in the Alternative design section in this proposal.

Can you inline this succinctly? E.g, that Pr got some pushbacks because it was not clear why a new API for one-off migration is needed. We give here a better explanation why this one-off migration destination request is necessary.

The "one-time" operation convinced me.

The reasoning for the real need is hard for me, but I did feedback on this proposal what is convincing me.

iholder101 · 2024-09-04T10:38:12Z

/cc

davidvossel · 2024-09-04T13:58:09Z

design-proposals/migration-target.md

+- Cluster-admin: the administrator of the cluster
+
+## User Stories
+- As a cluster admin I want to be able to try to live-migrate a VM to specific node for maintenance reasons eventually overriding what the VM owner set 


I'd like to see more fleshed out user stories. It's unclear to me based on these user stories why the existing methods wouldn't suffice.

As a cluster admin I want to be able to try to live-migrate a VM to specific node for maintenance reasons eventually overriding what the VM owner set

For example, why wouldn't the cluster admin taint the source node and live migrate the vms away using the existing methods? Why would the admin need direct control over the exact node the VM goes to? I'd like to see a solid answer for why this is necessary over existing methods.

That's where this discussion usually falls apart and why it hasn't seen progress through the years. I'm not opposed to this feature, but I do think we need to articulate clearly why the feature is necessary

I expanded this section

davidvossel · 2024-09-05T15:50:01Z

design-proposals/migration-target.md

+## User Stories
+- As a cluster admin I want to be able to try to live-migrate a VM to specific node for various possible reasons such as:
+ - I just added to the cluster a new powerful node and I want to migrate a selected VM there without trying more than once according to scheduler decisions
+ - I'm not using any automatic workload rebalancing mechanism and I periodically want to manually rebalance my cluster according to my observations
+ - Foreseeing a peak in application load (e.g. new product announcement), I'd like to balance in advance my cluster according to my expectation and not to current observations
+ - During a planned maintenance window, I'm planning to drain more than one node in a sequence, so I want to be sure that the VM is going to land on a node that is not going to be drained in a near future (needing then a second migration) and being not interested in cordoning it also for other pods
+ - I just added a new node and I want to validate it trying to live migrate a specific VM there


nice! these are good reasons that hadn't been explored during previous discussions, thanks

davidvossel · 2024-09-05T16:19:22Z

design-proposals/migration-target.md

+When a pod is going to be executed, the scheduler is going to check it and, according to available cluster resources, nodeselectors, weighted affinity and anti-affinity rules and so on,
+the scheduler is going to select a node and write its name on `spec.nodeName` on the pod object. At this point the kubelet on the named node will try to execute the Pod on that node.
+
+If `spec.nodeName` is already set on a pod object as in this approach, the scheduler is not going to be involved in the process since the pod is basically already scheduled for that node and only for tha named node and so the kubelet on that node will directly try to execute it there eventually failing.


I think using pod.spec.nodeName is likely the most straightforward approach. This does introduce some new failure modes that might not be obvious to admins.

For example, today if a target pod is unschedulable due to lack of resources, the migration object will time out due to the pod being stuck in "pending". This information is feed back to admin as an k8s event associated with the migration object.

However, by setting the pod.spec.NodeName directly, we'd be bypassing the checks that ensure the required resources are available on the node (like the node having the "kvm" device available for instance), and the pod would likely get scheduled and immediately fail. I don't think we are currently bubbling up these types of errors to the migration object, so this could leave admins wondering why their migration failed.

I guess what I'm trying to get at here is, I like this approach, let's make sure the new failure modes get reported back on the migration object so the Admin has some sort of clue as to why a migration has failed.

@davidvossel We already report the failure reason on the VMIM. This is part of the VMIM status.

pod.spec.nodeName entirely bypassed the scheduler making AAQ unusable as it relies on "pod scheduling readiness".

From my pov, bypassing the scheduler is a no go.

From my pov, bypassing the scheduler is a no go.

luckily we have also another option as described on:
### B. appending/merging an additional nodeAffinity rule on the target virt-launcher pod (merging it with VM owner set affinity/anti-affinity rules)

This will add an additional constraint for the scheduler summing it up with existing constraints/hints.
In case of mismatching/oppositing rules, the destination pod will not be scheduled and the migration will fail.

@vladikr @davidvossel +1.

spec.nodeName is a horrible field that is not being removed from Kubernetes only due to backward compatibility and causes a lot of trouble. I agree that it should be considered as a no-go.

vladikr

I understand the intention behind introducing the nodeName field, but I fail to see how something like this may work at scale. It seems to me that most, if not all, of the user stories listed in the proposal can already be achieved through existing methods. Adding this field could potentially cause confusion for admins and lead to unnecessary friction with the Kubernetes scheduler and descheduler flows. I'd prefer to see solutions to the user stories to be aligned closely with established patterns. (descheduler policies or scheduler plugins )

vladikr · 2024-09-05T16:26:02Z

design-proposals/migration-target.md

+
+## User Stories
+- As a cluster admin I want to be able to try to live-migrate a VM to specific node for various possible reasons such as:
+ - I just added to the cluster a new powerful node and I want to migrate a selected VM there without trying more than once according to scheduler decisions


I wonder what would be so special about these VMs that cannot be handled by a descheduled?
Also, how would the admin know that the said descheduler did not remove these VMs at a later time?

vladikr · 2024-09-05T16:32:17Z

design-proposals/migration-target.md

+ - I'm not using any automatic workload rebalancing mechanism and I periodically want to manually rebalance my cluster according to my observations
+ - Foreseeing a peak in application load (e.g. new product announcement), I'd like to balance in advance my cluster according to my expectation and not to current observations
+ - During a planned maintenance window, I'm planning to drain more than one node in a sequence, so I want to be sure that the VM is going to land on a node that is not going to be drained in a near future (needing then a second migration) and being not interested in cordoning it also for other pods
+ - I just added a new node and I want to validate it trying to live migrate a specific VM there


This could be achieved today by modifying the VM's node selector or creating a new VM. New nodes will be the schedulers' very likely target for new pods already.

Right,
from a pure technical perspective this feature can be already simply achieved directly manipulating the node affinity rules on the VM object. Now we have LiveUpdate rollout strategy and so the new affinity rules will be quickly propagated to the VMI and so consumed on the target pod of the live-migration.
No doubt, on the technical side it will work.

But the central idea of this proposal is about allowing a cluster admin doing that without touching the VM object.
This for two maina reasons:

separation of personas: the VM owner can set rules on his VM, a cluster admin could be still interested in migrating a VM without messing up or altering the configuration set by the owner on the VM object.

separating what it a one-off configuration for the single migration attempt (so set on the VirtualMachineInstanceMigration object) that is relevant only for this single migration attempt but it should not produce any side effect in the future from what is a long-term configuration that is going to stay there and be applied also later on (future live migrations, restarts).

This comment applies to all the user stories here.

vladikr · 2024-09-05T16:35:41Z

design-proposals/migration-target.md

+## User Stories
+- As a cluster admin I want to be able to try to live-migrate a VM to specific node for various possible reasons such as:
+ - I just added to the cluster a new powerful node and I want to migrate a selected VM there without trying more than once according to scheduler decisions
+ - I'm not using any automatic workload rebalancing mechanism and I periodically want to manually rebalance my cluster according to my observations


This is also doable today as the default scheduler will try to choose the least busy node to schedule the target pod.

vladikr · 2024-09-05T16:40:46Z

design-proposals/migration-target.md

+- As a cluster admin I want to be able to try to live-migrate a VM to specific node for various possible reasons such as:
+ - I just added to the cluster a new powerful node and I want to migrate a selected VM there without trying more than once according to scheduler decisions
+ - I'm not using any automatic workload rebalancing mechanism and I periodically want to manually rebalance my cluster according to my observations
+ - Foreseeing a peak in application load (e.g. new product announcement), I'd like to balance in advance my cluster according to my expectation and not to current observations


Could you please elaborate on this?
How would the cluster look like to the admins' expectations?
Couldn't a taint be placed on some nodes to resolve capacity before the new product announcement?

Same, I do not want to argue with an admin on how the cluster should be managed, but this is surely not a recommended way we want to encourage/support.

tiraboschi · 2024-09-05T17:08:11Z

It seems to me that most, if not all, of the user stories listed in the proposal can already be achieved through existing methods.

Right, I also added this note:

Note

technically all of this can be already achieved manipulating the node affinity rules on the VM object, but as a cluster admin I want to keep a clear boundary between what is a long-lasting setting for a VM, defined by the VM owner, and what is single shot requirement for a one-off migration

vladikr · 2024-09-07T11:35:47Z

I spoke with @fabiand offline.
Perhaps we can simply copy any provided Affinity and/or Tolerations set but the admin on the VMIM to the target pod -
instead of offering a dedicated API field.

My main concern with this proposal is that it may promote a wrong assumption that manual cluster balancing is preferred instead of relying on the scheduler/descheduler - while this is just a local minimum.

tiraboschi · 2024-09-09T10:39:10Z

I spoke with @fabiand offline. Perhaps we can simply copy any provided Affinity and/or Tolerations set but the admin on the VMIM to the target pod - instead of offering a dedicated API field.

I think that exposing the whole node affinity/anti-affinity (+ tolerations + ...) grammar on the VirtualMachineInstanceMigration object is by far too much.
At the end, as a cluster admin I want to only to try to migrate that VM to a named node. All the other uses cases are out of scope and should be addressed correctly setting/amending the node affinity on the VM.
I still think that exposing an optional nodeName string on the VirtualMachineInstanceMigration spec is all of what we need to accomplish all the use cases here.

My main concern with this proposal is that it may promote a wrong assumption that manual cluster balancing is preferred instead of relying on the scheduler/descheduler - while this is just a local minimum.

I think it's up to us to emphasize this assumption in the API documentation making absolutely clear that the nodeName field is optional and we recommend to keep it empty to let the scheduler find the best node (if trying to migrate to a specific named node is not strictly needed).

I'm proposing something like:

// NodeName is a request to try to migrate this VMI to a specific node.
// If it is non-empty, the migration controller simply try to configure the target VMI pod to be started onto that node,
// assuming that it fits resource, limits and other node placement constraints; it will override nodeSelector and affinity
// and anti-affinity rules set on the VM.
// If it is empty, recommended, the scheduler becomes responsible for finding the best Node to migrate the VMI to.
// +optional
NodeName string `json:"nodeName,omitempty"`

I'm adding it to this proposal.

vladikr · 2024-09-09T13:36:06Z

I spoke with @fabiand offline. Perhaps we can simply copy any provided Affinity and/or Tolerations set but the admin on the VMIM to the target pod - instead of offering a dedicated API field.

I think that exposing the whole node affinity/anti-affinity (+ tolerations + ...) grammar on the VirtualMachineInstanceMigration object is by far too much. At the end, as a cluster admin I want to only to try to migrate that VM to a named node.

Setting affinity and toleration is exactly what any other user would need to do to allow scheduling a workload on tainted node, not sure why we need to facilitate this in the migration case.
Also, taking this route would not require us to add any new logic to the migration controller.

Generally speaking, Affinity and nodeSector are the most acceptable ways to influence scheduling decisions.

All the other uses cases are out of scope and should be addressed correctly setting/amending the node affinity on the VM. I still think that exposing an optional nodeName string on the VirtualMachineInstanceMigration spec is all of what we need to accomplish all the use cases here.

My main concern with this proposal is that it may promote a wrong assumption that manual cluster balancing is preferred instead of relying on the scheduler/descheduler - while this is just a local minimum.

I think it's up to us to emphasize this assumption in the API documentation making absolutely clear that the nodeName field is optional and we recommend to keep it empty to let the scheduler find the best node (if trying to migrate to a specific named node is not strictly needed).

From my pov, we could get away without any API changes and without advertising this option at all - making it available for special cases and not a mainstream.

I'm proposing something like:

// NodeName is a request to try to migrate this VMI to a specific node.
// If it is non-empty, the migration controller simply try to configure the target VMI pod to be started onto that node,
// assuming that it fits resource, limits and other node placement constraints; it will override nodeSelector and affinity
// and anti-affinity rules set on the VM.
// If it is empty, recommended, the scheduler becomes responsible for finding the best Node to migrate the VMI to.
// +optional
NodeName string `json:"nodeName,omitempty"`

I'm adding it to this proposal.

tiraboschi · 2024-09-09T14:45:19Z

Setting affinity and toleration is exactly what any other user would need to do to allow scheduling a workload on tainted node, not sure why we need to facilitate this in the migration case. Also, taking this route would not require us to add any new logic to the migration controller.
...
From my pov, we could get away without any API changes and without advertising this option at all - making it available for special cases and not a mainstream.

I'm sorry but now I'm a bit confused.
As for Kubevirt documentation,
in order to initiate a live migration I'm supposed to create VirtualMachineInstanceMigration (VMIM) object like:

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceMigration
metadata:
  name: migration-job
spec:
  vmiName: vmi-fedora

or, more imperatively, executed something like:

$ virtctl migrate vmi-fedora

that under the hood is going to create a VirtualMachineInstanceMigration for me.

This proposal is now about extending it with the optional capability to try to live migrate to a named node.
So thi is proposing to allow the creation of something like:

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceMigration
metadata:
  name: migration-job
spec:
  vmiName: vmi-fedora
  nodeName: my-new-target-node

or executing something like:

$ virtctl migrate vmi-fedora --nodeName=my-new-target-node

and this because one of the key point here is that the cluster admin is not supposed to be required to amend the spec of VMs owned by other users in order to try to migrate them to named nodes.

The migration controller will simply notice that nodeName on the VirtualMachineInstanceMigration is not empty and it will inject/replace (still under discussion, we have two alternatives here) something like:

spec:
  nodeName: <nodeName>

or

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchFields:
              - key: metadata.name
                operator: In
                values:
                  - <nodeName>

on the target virt-launcher pod.

Can you please summarize what do you exactly mean with

Perhaps we can simply copy any provided Affinity and/or Tolerations set but the admin on the VMIM to the target pod - instead of offering a dedicated API field.

?

fabiand · 2024-09-16T07:20:02Z

Migration Policies are something that we can look at. Let's not make them mandatory to this functinality
Within KubeVirt we should clarify the persona which is doing VMIM. Can we explore to make this an explicit admin feature?

tiraboschi · 2024-09-16T07:42:02Z

Migration Policies are something that we can look at. Let's not make them mandatory to this functinality

Unfortunately a MigrationPolicy is even not really an option to solve this:
here we are talking about configuring a NodeSelector on the specific VirtualMachineInstanceMigration to restrict the list of nodes that are suitable as migration target.
And this is fully declarative and it will interest just that one-off migration attempt with no future side effects.

A migration policy is actually designed to be matched to VMs using NamespaceSelector and/or VirtualMachineInstanceSelector LabelSelectors.
This means that the LabelSelector used to match the selected VM before the migration are still going to match it after the one-off migration attempt and since we'd like to target down to a named node, if the migration to that node was successful, the VM will become not migratable until the MigrationPolicy is removed/amended or the labels on the VM are altered to avoid matching that MigrationPolicy a second time.

So in order to achieve this, the cluster admin would have to define a MigrationPolicy, trigger the live-migration, wait for the migration controller and then remove the MigrationPolicy to prevent future unwanted side effects.
So this would be a 3 steps imperative approach with potential risks while having a NodeSelector directly for the one-off VirtualMachineInstanceMigration is a single shot fully declarative attempt with no future side effects by design.

I'll track this into the alternative design section to mark it down that we considered but then discarded it explaining why as future reference.

tiraboschi · 2024-09-16T07:57:35Z

@tiraboschi You're right that it is currently possible to set affinities to VMIs, but using them in order to pick a specific node by name to schedule to is a very bad practice (in Kubernetes as well). In your comment you mainly explained that "the strongest reason behind this proposal is that as a cluster admin I want to be able to "force" a VM to migrate to a specific named node", however to explain the use-case we need to understand why the admin would want to do so in the first place.

Every user can already restrict a VM or even a pod to run on a restricted set of nodes that can be eventually even one.

Regarding this topic, the k8s documentation simply states:

Often, you do not need to set any such constraints; the scheduler will automatically do a reasonable placement (for example, spreading your Pods across nodes so as not place Pods on a node with insufficient free resources). However, there are some circumstances where you may want to control which node the Pod deploys to

On my opinion we should simply take this as our starting point.
And indeed we have the affinity concept on the VM as it exists for generic k8s pods.
Now for VMs we also have an additional capability that doesn't really exist for generic pods, the live migration, and a live migration can be declaratively triggered VirtualMachineInstanceMigration.
Now I don't see why having affinity on the VirtualMachine makes sense while we need to identify specific different use cases when talking about VirtualMachineInstanceMigration.

tiraboschi · 2024-09-16T14:58:13Z

design-proposals/migration-target.md

+From a coding perspective the additional logic required on the migration controller is pretty simple.
+Pseudocode:
+```go
+if migration.Spec.AddedNodeSelector != nil {
+ if templatePod.Spec.Affinity.NodeAffinity == nil {
+ templatePod.Spec.Affinity.NodeAffinity = &k8sv1.NodeAffinity{
+ RequiredDuringSchedulingIgnoredDuringExecution: migration.Spec.AddedNodeSelector,
+ }
+ } else if templatePod.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution == nil {
+ templatePod.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution = migration.Spec.AddedNodeSelector
+ } else {
+ mergedNodeAffinity := &k8sv1.NodeSelector{
+ NodeSelectorTerms: append(templatePod.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms, migration.Spec.AddedNodeSelector.NodeSelectorTerms...),
+ }
+ templatePod.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution = mergedNodeAffinity
+ }
+}
+```


Blame on me,
I already pointed it out here: #320 (comment)
but then I somehow underestimated it.

k8s doc clearly states:

If you specify multiple terms in nodeSelectorTerms associated with nodeAffinity types, then the Pod can be scheduled onto a node if one of the specified terms can be satisfied (terms are ORed).

If you specify multiple expressions in a single matchExpressions field associated with a term in nodeSelectorTerms, then the Pod can be scheduled onto a node only if all the expressions are satisfied (expressions are ANDed).

So if we will simply append an additional nodeSelectorTerm, it will ORed producing wrong results.
Since we want it to be ANDed the only option is taking the expressions on the nodeSelectorTerm added to the VMIM and add it to all the existing nodeSelectorTerms (which are already ORed!)

Follow-up and derived from: kubevirt#10712 Implements: kubevirt/community#320 TODO: add functional tests Signed-off-by: zhonglin6666 <[email protected]> Signed-off-by: Simone Tiraboschi <[email protected]>

xpivarc · 2024-09-17T11:44:21Z

2. Within KubeVirt we should clarify the persona which is doing VMIM. Can we explore to make this an explicit admin feature?

@fabiand +1

We need to acknowledge that, if we agree that VMIM should have been cluster wide and we want to mitigate it, adding new functionality will make the maintainability harder.

Follow-up and derived from: kubevirt#10712 Implements: kubevirt/community#320 TODO: add functional tests Signed-off-by: zhonglin6666 <[email protected]> Signed-off-by: Simone Tiraboschi <[email protected]>

fabiand · 2024-09-24T07:30:12Z

How about addressing 2 by a tightened RBAC only - without API changes.
Admins could still decide to permit regular users to use it.

fabiand · 2024-09-24T11:30:29Z

If these cases are so exceptional, having to perform a couple more steps sounds reasonable to me.

Today there is simply no way for an admin to achoeve this functionality without accessing the VM.

@vladikr @davidvossel I'd appreciate your concern or approval for this design.

my point is: this is SUCH a common functionality, this is to me not undermining any existing scheduling work and management patterns (aka existing migration continues to work as it does), thus to me this is to take away pain from adminstrators.

I'd really like to converge this.

iholder101 · 2024-09-24T12:48:41Z

this is SUCH a common functionality

FWIW, from my perspective, this is indeed the only "use-case" / reason to support this, thus:

The proposal should clearly outline that this is the only use-case. Except for admins being used to it, I don't see another compelling use-case.
We should think whether this alone is a compelling argument for supporting this. After all, Kubernetes is different from other platforms which has many upsides. I wonder if we should "take away the pain" of getting used to something new, or rather educate the admins to use Kubevirt in a way that is more beneficial to them.

This is just my pov though.

vladikr · 2024-09-24T14:46:16Z

If these cases are so exceptional, having to perform a couple more steps sounds reasonable to me.

Today there is simply no way for an admin to achoeve this functionality without accessing the VM.

@vladikr @davidvossel I'd appreciate your concern or approval for this design.

my point is: this is SUCH a common functionality, this is to me not undermining any existing scheduling work and management patterns (aka existing migration continues to work as it does), thus to me this is to take away pain from administrators.

If I understand correctly we are only speaking about which object to edit whether it is to edit the VM and trigger the migration (restore it after the migration) or adjust the migration object directly.
However, the functionality is there already, to me, the level of pain is not that high.

I'd really like to converge this.

I still don't see a clear definition of a use case, stating that this is an emergency feature.

Also, I am very much puzzled about the API.
I don't see a reason why not to add a nodeSelector to the VMIM that will later be handled in the controller - the same way as it is exposed on the VM/VMI
I don't see any reference in the proposal to tolerations. If we are speaking about an emergency feature should this be added to the VMIM API?

I think before we add more complexity to the migration controller we need to adjust the control over who can trigger a migration.
Migrations should be an admin only feature. This should be part of the proposal.

tiraboschi · 2024-09-24T16:05:05Z

If I understand correctly we are only speaking about which object to edit whether it is to edit the VM and trigger the migration (restore it after the migration) or adjust the migration object directly. However, the functionality is there already, to me, the level of pain is not that high.

Today a cluster admin has to edit the VM spec, wait for it to be propagated down to the VMI, trigger a live migration, wait for its result and then restore the original node affinity that was on the VM to have it migratable again.
This is a long, cumbersome, error prone and imperative flow that motivates this design proposal.

I still don't see a clear definition of a use case, stating that this is an emergency feature.

Avoid a long, cumbersome, error prone and imperative flow offering a pure declarative API for one-off migration only restricting the pool of candidate nodes.

Also, I am very much puzzled about the API. I don't see a reason why not to add a nodeSelector to the VMIM that will later be handled in the controller - the same way as it is exposed on the VM/VMI

I added an additional section named Why addedNodeSelectorTerm and not simply addedNodeSelector that clearly explains it.

I don't see any reference in the proposal to tolerations. If we are speaking about an emergency feature should this be added to the VMIM API?

Injecting additional tolerations can enlarge the pool of candidate nodes selecting something that was not originally planned (although we could argue that VM owner could be not aware/interested about node taints) so I'm not for adding it here.
Then an additional toleration can be added to the VM in a single shot and there is no need to remove it just after the migration so from this point of view is less problematic.
I added a section about that in the design proposal.

I think before we add more complexity to the migration controller we need to adjust the control over who can trigger a migration. Migrations should be an admin only feature. This should be part of the proposal.

In my opinion this is completely a separate discussion that deserves its own independent design proposal.

tiraboschi · 2024-09-24T16:06:41Z

this is SUCH a common functionality

FWIW, from my perspective, this is indeed the only "use-case" / reason to support this, thus:

The proposal should clearly outline that this is the only use-case. Except for admins being used to it, I don't see another compelling use-case.

Today a cluster admin has to edit the VM spec, wait for it to be propagated down to the VMI, trigger a live migration, wait for its result and then restore the original node affinity that was on the VM to have it migratable again.
This is a long, cumbersome, error prone and imperative flow that motivates this design proposal.

tiraboschi · 2024-09-24T16:08:30Z

We need to acknowledge that, if we agree that VMIM should have been cluster wide and we want to mitigate it, adding new functionality will make the maintainability harder.

Can you please elaborate it?
From my point of view, they are two orthogonal discussions (about the same object) and this is not preventing nor making the second anyhow harder.

Follow-up and derived from: kubevirt#10712 Implements: kubevirt/community#320 TODO: add functional tests Signed-off-by: zhonglin6666 <[email protected]> Signed-off-by: Simone Tiraboschi <[email protected]>

iholder101 · 2024-09-24T16:54:56Z

this is SUCH a common functionality

FWIW, from my perspective, this is indeed the only "use-case" / reason to support this, thus:

The proposal should clearly outline that this is the only use-case. Except for admins being used to it, I don't see another compelling use-case.

Today a cluster admin has to edit the VM spec, wait for it to be propagated down to the VMI, trigger a live migration, wait for its result and then restore the original node affinity that was on the VM to have it migratable again. This is a long, cumbersome, error prone and imperative flow that motivates this design proposal.

I agree.
The question is why the admin would want to do that in the first place.

I've outlined above why I'm not sure the use-cases listed in this proposal are compelling. I think that the fact that it's a common functionality is the best argument to adopt this feature, I'm just not entirely convinced it's enough.

Migrations should be an admin only feature. This should be part of the proposal.

+1
I'm not even sure it makes sense it's not an admin feature even before this proposal.

tiraboschi · 2024-09-24T17:08:37Z

I don't see a reason why not to add a nodeSelector to the VMIM that will later be handled in the controller

I also added a sub-section about the advantages of *k8sv1.NodeSelectorTerm over NodeSelector map[string]string

tiraboschi · 2024-09-24T17:18:20Z

Today a cluster admin has to edit the VM spec, wait for it to be propagated down to the VMI, trigger a live migration, wait for its result and then restore the original node affinity that was on the VM to have it migratable again. This is a long, cumbersome, error prone and imperative flow that motivates this design proposal.

I agree. The question is why the admin would want to do that in the first place.

I've outlined above why I'm not sure the use-cases listed in this proposal are compelling. I think that the fact that it's a common functionality is the best argument to adopt this feature, I'm just not entirely convinced it's enough.

For instance the standard k8s scheduler is not really aware of the gap between resource allocation and actual resource utilization (k8s scheduler it's not looking at real time usage metrics but only at planned capacity) and maybe I don't want to reconfigure my whole cluster using a Real Load Aware Scheduling like https:/kubernetes-sigs/scheduler-plugins/tree/master/kep/61-Trimaran-real-load-aware-scheduling but as an expert cluster admin I simply want to live migrate a couple of selected VMs according my observations of the KubeVirt metrics on VMs.

Neither the descheduler is really acting according to actual node resources but only to resource requirements, see: kubernetes-sigs/descheduler#225
and maybe this is enough for regular pods, but VM lifecicle is ideally longer and we have live migration, so why not offering an additional, safe, and easy to achieve, capability?

fabiand · 2024-09-24T18:40:27Z

@iholder101 sorry for being picky, but it is not "used to it" - There is simply no way to address the outlined use-cases from an admin perspective. And e should reocgnize that the use-cases are simply problems that administrators face when running virtualization platforms. Thus to some extent we could say that KubeVirt is only concerned about parts of the use-case, because some of the causes are actually platform driven, and KubeVirt is just a platform add-on.

Existing functionality allows VM owners to place workloads.
There is no functionality on a VM level, that allows an administrator to place a workload.

EdDev · 2024-09-24T17:56:54Z

design-proposals/migration-target.md

+- Workload balancing solution doesn't always work as expected
+ > I have configured my cluster with the descheduler and a load aware scheduler (trimaran), thus by default, my VMs will be regularly descheduled if utilization is not balanced, and trimaran will ensure that my VMs will be scheduled to underutilized nodes. Often this is working, however, in exceptional cases, i.e. if the load changes too quickly, or only 1 VM is suffering, and I want to avoid that all Vms on the cluster are moved, I need - for exception - a tool to move one VM, once to deal with this exceptional situation.
+- Troubleshooting a node
+- Validating a new node migrating there a specific VM


Such an admin should probably run a checkup [1].

[1] https:/kiagnose/kubevirt-dpdk-checkup

I still see this proposed API as an enabler for such eventual automated check.
Without this additional API, the checkup code should handle an imperative flow amending the spec ofthe named VM a few times in a sequence to trigger the behaviour to be verified.

EdDev · 2024-09-24T18:08:21Z

design-proposals/migration-target.md

+
+## Goals
+- A user allowed to trigger a live-migration of a VM and list the nodes in the cluster is able to rely on a simple and direct API to try to live migrate a VM to a specific node (or a node within a set of nodes identified by adding node affinity constraints).
+- The live migration then can successfully complete or fail for various reasons exactly as it can succeed of fail today for other reasons.


I do not understand this goal.
It is already the case today, is there a new goal beyond what exists today?

This is stating that in term of exit status, a migration to selected node via the optional API should behave exactly as migration behaves today when the target node is selected by the scheduler: no side effects or try this first and then eventually let the scheduler decide or anything like that.

EdDev · 2024-09-24T18:10:14Z

design-proposals/migration-target.md

+## Goals
+- A user allowed to trigger a live-migration of a VM and list the nodes in the cluster is able to rely on a simple and direct API to try to live migrate a VM to a specific node (or a node within a set of nodes identified by adding node affinity constraints).
+- The live migration then can successfully complete or fail for various reasons exactly as it can succeed of fail today for other reasons.
+- The target node that is explicitly required for the actual live migration attempt should not influence future live migrations or the placement in case the VM is restarted. For long-lasting placement, nodeSelectors or affinity/anti-affinity rules directly set on the VM spec are the only way to go.


For long-lasting placement, nodeSelectors or affinity/anti-affinity rules directly set on the VM spec are the only way to go.

As a goal, there is no need to explain how to do something else. I think this is already mentioned well in the motivation.

EdDev · 2024-09-24T18:22:44Z

design-proposals/migration-target.md

+ - I'm not using any automatic workload rebalancing mechanism and I periodically want to manually rebalance my cluster according to my observations
+ - Foreseeing a peak in application load (e.g. new product announcement), I'd like to balance in advance my cluster according to my expectation and not to current observations
+ - During a planned maintenance window, I'm planning to drain more than one node in a sequence, so I want to be sure that the VM is going to land on a node that is not going to be drained in a near future (needing then a second migration) and being not interested in cordoning it also for other pods
+ - I just added a new node and I want to validate it trying to live migrate a specific VM there


I recall this from memory, but it is a real world request from multiple end users.

The need of the admin is clear.
But we have better tooling for the admin to use: https:/kiagnose/kiagnose

EdDev · 2024-09-24T18:34:10Z

design-proposals/migration-target.md

+ - I just added to the cluster a new powerful node and I want to migrate a selected VM there without trying more than once according to scheduler decisions
+ - I'm not using any automatic workload rebalancing mechanism and I periodically want to manually rebalance my cluster according to my observations
+ - Foreseeing a peak in application load (e.g. new product announcement), I'd like to balance in advance my cluster according to my expectation and not to current observations
+ - During a planned maintenance window, I'm planning to drain more than one node in a sequence, so I want to be sure that the VM is going to land on a node that is not going to be drained in a near future (needing then a second migration) and being not interested in cordoning it also for other pods


I second @iholder101 .

I do not think Kubevirt is interested in suggesting such an action to a cluster admin.
While I have no interest in arguing with an admin on how one should manage the cluster, I prefer not to try and convince our-self that this makes sense and feel it is a valid scenario.

EdDev · 2024-09-24T18:40:11Z

design-proposals/migration-target.md

+ - I just added a new node and I want to validate it trying to live migrate a specific VM there
+> [!NOTE]
+> technically all of this can be already achieved manipulating the node affinity rules on the VM object, but as a cluster admin I want to keep a clear boundary between what is a long-lasting setting for a VM, defined by the VM owner, and what is single shot requirement for a one-off migration
+- As a VM owner I don't want to see my VM object getting amended by another user just for maintenance reasons


I do not understand this one.

Let's think about dev-ops, infrastructure as code or platform engineering: as a VM owner (or as the owner of a complex application composed by VMs, regular pods, services...) I want to store and control its configuration by an external mean and I do not like to have the admin of the KubeVirt cluster interfering with my objects to configure the target for a live migration: this should be eventually done on the migration object, not on my own object.
It's a matter of separation of boundaries.

EdDev · 2024-09-24T18:58:07Z

design-proposals/migration-target.md

+- VM owner: the user who owns a VM in his namespace on a Kubernetes cluster with KubeVirt
+- Cluster-admin: the administrator of the cluster
+
+## User Stories


I find the user-stories always a challenge.
As I see it, the stories (and even the goals) are suppose to describe the need of the user and attempt to avoid providing the solution. Here, the stories provide the solution upfront: Migrate a VM to a selected node.

If we drop that solution, you may see that stories can be resolved by other means, weakening the solution to migrate to a specific node.

Said all that, there is one user story which is not specified here and is convincing me:

As a cluster admin that managed various VMM systems, I have well defined processes and steps that proved useful when managing VMs.
One of the basic actions I require to continue following my processes and steps is to have control over the node target when migrating a VM.
With time, I expect to learn and trust Kubevirt to achieve my needs through other means, but until that time, I would like to keep the capabilities I learned to trust so far.

At its base, I acknowledge that Kubevirt is not a classic VMM, it does things very differently, however it has interest to make it easy for VMs from other platforms to be migrated to it. That means making the admins happy.

I agree @EdDev. This indeed sounds much more compelling than the other use-cases.
I do think that if that's the rationale I do think we should make it clear documentation-wise.

No, this is not simply about laziness on learning something new.
Many if not all of the user stories presented here are not directly achievable if not installing optional or even to be developed scheduler and descheduler plugins able to observe real time metrics or using imperative multi steps flows.

EdDev · 2024-09-24T19:22:20Z

design-proposals/migration-target.md

+On the other side `pod.spec.nodeSelector` is only matching labels and the predefined `kubernetes.io/hostname` [label is not guaranteed to be reliable](https://kubernetes.io/docs/reference/node/node-labels/#preset-labels).
+`NodeSelectorTerm` offers more options, and in particular:


Reliability can be mitigated by using a label we control.

Having more options is also a con. It opens up options which may be too wide, increasing the chance we will need to help users to use it correctly.
While I agree we need to keep this as generic as possible, having a simple solution is an advantage.

@iholder101 , you mention that nodeSelector is bad and it is left there for backward compatibility reasons. Is this written in any formal location? If it is, then this section can ref it as a strong no-go reason.

Anyway, I would add these points even if the previous option is chosen/better.

@EdDev I think you're referring to what I wrote about .spec.nodeName. Unlike node affinities or nodeSelectors, it is used to bypass the scheduler, which is a huge problem and breaks stuff very quickly. It's probably almost a bad idea to use it, except for very extreme cases.

You can look this this conversation which is interesting regarding the topic: kubernetes/enhancements#3521 (comment)

That’s it, thanks. ~~This should be added here as the main con then.~~

Edit: strike that, now I see my confusion

This is already tracked in the Exposing spec.nodeName and directly setting it to the target pod bypassing the k8s scheduler section on the proposal.

design-proposals/migration-target.md

EdDev · 2024-09-24T20:09:59Z

design-proposals/migration-target.md

+
+# Implementation Phases
+A really close attempt was already tried in the past with https:/kubevirt/kubevirt/pull/10712 but the Pr got some pushbacks.
+A similar PR should be reopened, refined and we should implement functional tests.


The "one-time" operation convinced me.

The reasoning for the real need is hard for me, but I did feedback on this proposal what is convincing me.

EdDev · 2024-09-24T20:27:04Z

I missed to add to my review that I really like this design proposal. The investment in the motivation and alternative solutions is impressive. 💪

iholder101 · 2024-09-25T06:55:40Z

@tiraboschi

but as an expert cluster admin I simply want to live migrate a couple of selected VMs according my observations of the KubeVirt metrics on VMs.

I feel we're always circling back to "the admin just wants to do that".
When I talk about use-cases, this is not what I mean.
I understand that some admins ask for this, but I think the more important thing is to understand why they want it, for which purpose. This is important, since then we can think of providing different ways of achieving the end-goal they seek to achieve with different mechanisms. That should be the basis of this discussion.

Neither the descheduler is really acting according to actual node resources but only to resource requirements,

A descheduler plugin could support that.
As a continuation to the above, if the admin wants to re-balance the cluster according to a certain logic and this is the reason he wants to migrate VMs to a named node in the first place - this mechanism is a good alternative. Even if as you outline the current plugins aren't good enough, it might be in the interest of Kubevirt to write one.

@fabiand

There is simply no way to address the outlined use-cases from an admin perspective

I've added comments on how I think we should address these use-cases above.

And e should reocgnize that the use-cases are simply problems that administrators face when running virtualization platforms.

This is correct, however, sometimes an admin wants to achieve a certain goal but chooses to use the wrong mechanisms in order to do so. This is more likely to happen when the admin comes from different virt platforms which makes him less experienced with the Kubernetes world.

@fabiand
If I try to stay practical about this proposal, I think that these are needed to be resolved in order to move forward:

@EdDev's reason to support this is the most compelling in my opinion. The proposal should get rid of the other use-cases which are not compelling IMO.
We should document that, and perhaps even be more vocal (i.e. using alerts, events, etc) about the fact that using this mechanism is discouraged and encourage alternative mechanisms to achieve the same goals.
We should resolve who should be allowed to trigger a migration, i.e. if it should be an admin-only operation.
We should consider introducing an alpha non-API mechanism first (e.g. through annotations?) in order to present this to some users and have an opportunity to gain feedback to ensure the door is opened to make drastic changes in the future or even drop this if proved unuseful.

I believe my views are thoroughly expressed here already, so I'll shut up now and let others express their opinions as well.
@alicefr @jean-edouard @xpivarc @davidvossel @rmohr

…specific nodes Adding a design proposal to extend VirtualMachineInstanceMigration object with an additional API to let a cluster admin try to trigger a live migration of a VM injecting on the fly an additional NodeSelectorTerm constraint. The additional NodeSelectorTerm can only restrict the set of Nodes that are valid target for the migration (eventually down to a single host); in order to achieve this, all the `NodeSelectorRequirements` defined on the additional `NodeSelectorTerm` set on the VMIM object should be appended to all the `NodeSelectorTerms` already defined on the VM object. All the affinity rules defined on the VM spec are still going to be satisfied. Signed-off-by: Simone Tiraboschi <[email protected]>

tiraboschi · 2024-09-25T09:07:43Z

@EdDev's reason to support this is the most compelling in my opinion. The proposal should get rid of the other use-cases which are not compelling IMO.

Other cases are here because they are valid by themselves and not directly addressable without relying on multi-steps imperative flows.
The only use case that is not really helpful in the discussion is let's make the admins happy because they are used to it.

We should document that, and perhaps even be more vocal (i.e. using alerts, events, etc) about the fact that using this mechanism is discouraged and encourage alternative mechanisms to achieve the same goals.

same.

We should resolve who should be allowed to trigger a migration, i.e. if it should be an admin-only operation.

I don't see any real implication of that regarding this API.
That topic deserves an independent discussion.

We should consider introducing an alpha non-API mechanism first (e.g. through annotations?) in order to present this to some users and have an opportunity to gain feedback to ensure the door is opened to make drastic changes in the future or even drop this if proved unuseful.

Honestly abusing annotations to avoid defining APIs is a bad practice.
I think that this deserves to be a first class citizen.
Let's agree on how we can define a good API, but then let's properly implement it.

I believe my views are thoroughly expressed here already, so I'll shut up now and let others express their opinions as well. @alicefr @jean-edouard @xpivarc @davidvossel @rmohr

EdDev · 2024-09-26T10:48:15Z

@EdDev's reason to support this is the most compelling in my opinion. The proposal should get rid of the other use-cases which are not compelling IMO.

Other cases are here because they are valid by themselves and not directly addressable without relying on multi-steps imperative flows.

I think there is a disagreement here on the solution, not on the need.
You have provided use cases where choosing a target node solves needs.
Most feedback I saw on this proposal is about solving the need through other means.

Perhaps you can emphasize the troubleshooting need. I guess it will be hard to find alternative solutions to it.
The problem is that some may not see it as worth the effort and API change.

The only use case that is not really helpful in the discussion is let's make the admins happy because they are used to it.

It is helpful if it convinces others that there is no other alternative solution to it.
I agree it will be better to find a need that the only solution (or the best solution out of other alternatives) is this target node option. And that need should be convincing to invest in adding it to the project.

tiraboschi · 2024-10-03T13:35:18Z

I think there is a disagreement here on the solution, not on the need.
You have provided use cases where choosing a target node solves needs.
Most feedback I saw on this proposal is about solving the need through other means.

Personally I still think that letting the cluster admin inject a NodeSelectorTerm that is valid only on the specific one-off migration attempt is a simple and smart solution: it's a simple and smart solution because the same simple solution applies exactly in the same way to all the use cases detailed in this proposal.
We can for sure assume that each individual use case tracked here can be otherwise solved with a different ad-hoc solution.
But I like the proposed solution because is the most intuitive (at least for cluster admins with a solid background in traditional virtualization systems) one and it applies exactly in the same way to all the use cases mentioned here.

Perhaps you can emphasize the troubleshooting need. I guess it will be hard to find alternative solutions to it.
The problem is that some may not see it as worth the effort and API change.

I want also to better understand the reasons why we should not accept it.
Are we talking about a crazy new API that doesn't exists in the Kubernetes ecosystem?
No, NodeSelectorTerm is part of Kubernetes core v1

Are we saying that propagating that NodeSelectorTerm is to complex/risky for KubeVirt controllers?
But the KubeVirt code is already using exactly this pattern to handle HostModelCpuModel and ForbiddenFeaturePolicy: here we are only talking about using an existing mechanism to let the user influence the target node of a migration attempt but from coding perspective the logic is already there.

kubevirt-bot requested review from aburdenthehand and jobbler September 3, 2024 16:00

kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/M labels Sep 3, 2024

tiraboschi mentioned this pull request Sep 3, 2024

WIP: live migration to a named node kubevirt/kubevirt#12725

Open

8 tasks

dankenigsberg suggested changes Sep 3, 2024

View reviewed changes

kubevirt-bot assigned dankenigsberg Sep 3, 2024

tiraboschi force-pushed the migration_target branch from 2989e12 to 900eb23 Compare September 4, 2024 08:20

kubevirt-bot requested a review from iholder101 September 4, 2024 10:38

davidvossel reviewed Sep 4, 2024

View reviewed changes

tiraboschi force-pushed the migration_target branch from 900eb23 to cb9fb47 Compare September 4, 2024 15:04

kubevirt-bot added size/L and removed size/M labels Sep 4, 2024

tiraboschi force-pushed the migration_target branch 3 times, most recently from 311710d to 339f9f5 Compare September 5, 2024 15:45

davidvossel reviewed Sep 5, 2024

View reviewed changes

vladikr reviewed Sep 5, 2024

View reviewed changes

tiraboschi force-pushed the migration_target branch 3 times, most recently from 63818ed to a937ba2 Compare September 6, 2024 16:53

tiraboschi force-pushed the migration_target branch from a937ba2 to 812d69f Compare September 9, 2024 10:52

tiraboschi force-pushed the migration_target branch from 3daff98 to 31eb929 Compare September 16, 2024 13:19

tiraboschi commented Sep 16, 2024

View reviewed changes

tiraboschi force-pushed the migration_target branch from 31eb929 to 8bfe452 Compare September 24, 2024 15:55

tiraboschi force-pushed the migration_target branch from 8bfe452 to 7dacf7e Compare September 24, 2024 17:05

EdDev reviewed Sep 24, 2024

View reviewed changes

tiraboschi force-pushed the migration_target branch from 7dacf7e to f45827b Compare September 25, 2024 09:01

		On the other side `pod.spec.nodeSelector` is only matching labels and the predefined `kubernetes.io/hostname` [label is not guaranteed to be reliable](https://kubernetes.io/docs/reference/node/node-labels/#preset-labels).
		`NodeSelectorTerm` offers more options, and in particular:

design-proposal: VirtualMachineInstanceMigration - Live migration to a named node #320

Are you sure you want to change the base?

design-proposal: VirtualMachineInstanceMigration - Live migration to a named node #320

Conversation

tiraboschi commented Sep 3, 2024 • edited Loading

kubevirt-bot commented Sep 3, 2024

dankenigsberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iholder101 commented Sep 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladikr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiraboschi commented Sep 5, 2024

vladikr commented Sep 7, 2024

tiraboschi commented Sep 9, 2024

vladikr commented Sep 9, 2024

tiraboschi commented Sep 9, 2024 • edited Loading

fabiand commented Sep 16, 2024

tiraboschi commented Sep 16, 2024

tiraboschi commented Sep 16, 2024

Choose a reason for hiding this comment

xpivarc commented Sep 17, 2024

fabiand commented Sep 24, 2024 • edited Loading

fabiand commented Sep 24, 2024

iholder101 commented Sep 24, 2024

vladikr commented Sep 24, 2024

tiraboschi commented Sep 24, 2024 • edited Loading

tiraboschi commented Sep 24, 2024

tiraboschi commented Sep 24, 2024

iholder101 commented Sep 24, 2024 • edited Loading

tiraboschi commented Sep 24, 2024

tiraboschi commented Sep 24, 2024

fabiand commented Sep 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EdDev Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EdDev commented Sep 24, 2024

iholder101 commented Sep 25, 2024

tiraboschi commented Sep 25, 2024 • edited Loading

EdDev commented Sep 26, 2024

tiraboschi commented Oct 3, 2024

tiraboschi commented Sep 3, 2024 •

edited

Loading

tiraboschi commented Sep 9, 2024 •

edited

Loading

fabiand commented Sep 24, 2024 •

edited

Loading

tiraboschi commented Sep 24, 2024 •

edited

Loading

iholder101 commented Sep 24, 2024 •

edited

Loading

EdDev Sep 25, 2024 •

edited

Loading

tiraboschi commented Sep 25, 2024 •

edited

Loading