-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(aws-eks): Neuron device plugin is not installed when instance type is Trainium #29131
Closed
freschri opened this issue
Feb 16, 2024
· 2 comments
· Fixed by #29155 · May be fixed by NOUIY/aws-solutions-constructs#98, NOUIY/aws-solutions-constructs#99 or NOUIY/aws-solutions-constructs#101
Closed
(aws-eks): Neuron device plugin is not installed when instance type is Trainium #29131
freschri opened this issue
Feb 16, 2024
· 2 comments
· Fixed by #29155 · May be fixed by NOUIY/aws-solutions-constructs#98, NOUIY/aws-solutions-constructs#99 or NOUIY/aws-solutions-constructs#101
Labels
@aws-cdk/aws-eks
Related to Amazon Elastic Kubernetes Service
bug
This issue is a bug.
effort/medium
Medium work item – several days of effort
p2
Comments
freschri
added
bug
This issue is a bug.
needs-triage
This issue or PR still needs to be triaged.
labels
Feb 16, 2024
github-actions
bot
added
the
@aws-cdk/aws-eks
Related to Amazon Elastic Kubernetes Service
label
Feb 16, 2024
Yeah we could add it in the instance types. We welcome any PRs for this. |
pahud
added
p2
effort/medium
Medium work item – several days of effort
and removed
needs-triage
This issue or PR still needs to be triaged.
labels
Feb 16, 2024
|
This was referenced May 23, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
@aws-cdk/aws-eks
Related to Amazon Elastic Kubernetes Service
bug
This issue is a bug.
effort/medium
Medium work item – several days of effort
p2
Describe the bug
if instance type is Trainium the neuron device plugin is wrongfully not installed
Expected Behavior
if instance type is Trainium the neuron device plugin is installed
Current Behavior
if instance type is Trainium the neuron device plugin is NOT installed
Reproduction Steps
use an instance of type Trainium
Possible Solution
No response
Additional Information/Context
Instance types of family Trainium have recently been added here: https:/aws/aws-cdk/blame/main/packages/aws-cdk-lib/aws-ec2/lib/instance-types.ts
BUT:
[packages/aws-cdk-lib/aws-eks/lib/instance-types.ts] does not include them:
export const INSTANCE_TYPES = {
gpu: ['p2', 'p3', 'g2', 'g3', 'g4'],
inferentia: ['inf1', 'inf2'],
graviton: ['a1'],
graviton2: ['c6g', 'm6g', 'r6g', 't4g'],
graviton3: ['c7g'],
};
causing the check in packages/aws-cdk-lib/aws-eks/lib/cluster.ts to fail and the plugin not being installed:
function nodeTypeForInstanceType(instanceType: ec2.InstanceType) {
return INSTANCE_TYPES.gpu.includes(instanceType.toString().substring(0, 2)) ? NodeType.GPU :
INSTANCE_TYPES.inferentia.includes(instanceType.toString().substring(0, 4)) ? NodeType.INFERENTIA :
NodeType.STANDARD;
}
public addNodegroupCapacity(id: string, options?: NodegroupOptions): Nodegroup {
const hasInferentiaInstanceType = [
options?.instanceType,
...options?.instanceTypes ?? [],
].some(i => i && nodeTypeForInstanceType(i) === NodeType.INFERENTIA);
if (hasInferentiaInstanceType) {
this.addNeuronDevicePlugin();
}
...
CDK CLI Version
2.128.0
Framework Version
No response
Node.js Version
v21.6.1
OS
sonoma 14.3
Language
TypeScript
Language Version
No response
Other information
No response
The text was updated successfully, but these errors were encountered: