Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SageMaker support #2493

Closed
jckuester opened this issue Nov 30, 2017 · 53 comments
Closed

Add SageMaker support #2493

jckuester opened this issue Nov 30, 2017 · 53 comments
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/sagemaker Issues and PRs that pertain to the sagemaker service.

Comments

@jckuester
Copy link
Contributor

jckuester commented Nov 30, 2017

Amazon has announced the new service Sage​Maker to build, train, and deploy machine learning models at scale.

Prerequisite

aws-sdk-go v1.12.36 (#2474)

Affected Resource(s)

The following resources are needed.

For building model and training:

For the deployment/hosting part:

Expected Behavior

Create, update, delete, and import for above listed SageMaker resources of deployment part:

resource "aws_sagemaker_endpoint" "foo" {
	name = "endpoint-foo"
	endpoint_config_name = "${aws_sagemaker_endpoint_configuration.ec.name}"
}

resource "aws_sagemaker_endpoint_configuration" "ec" {
	name = "endpoint-config-foo"

	production_variants {
                variant_name            = "variant-1"
		model_name = "${aws_sagemaker_model.m.name}"
                initial_instance_count  = 1
                instance_type           = "m3.xlarge"
                initial_variant_weight  = 1
	}
}

resource "aws_sagemaker_model" "m" {
    name = "my-model"

    primary_container {
       image = "111111111111.ecr.us-west-2.amazonaws.com/my-docker-image:latest"
       model_data_url  = "s3://111111111111-foo/model.tar.gz"
    }
}

Example

A terraform example that shows how to use the above resources to deploy your own model is here: #2585

References

@randomcamel randomcamel added the enhancement Requests to existing resources that expand the functionality or scope. label Nov 30, 2017
@darrenhaken
Copy link
Contributor

Has anyone picked this one up yet?

@jckuester
Copy link
Contributor Author

@darrenhaken yes, I am working on it. See the linked PRs in the issue :-)

@darrenhaken
Copy link
Contributor

@jckuester I can't see a PR for the notebook instances, are you working on that too?

@jckuester
Copy link
Contributor Author

No, please start working on them if you like :)

@ddcprg
Copy link
Contributor

ddcprg commented Dec 20, 2017

Hi @jckuester @darrenhaken , anyone working on the training job resource? I'd like to contribute to this if possible

@jckuester
Copy link
Contributor Author

@ddcprg I am only working on the linked PRs above (i.e. for deployment). So please contribute the training job resource if you like.

@ddcprg
Copy link
Contributor

ddcprg commented Jan 4, 2018

cool @jckuester ! I'll be taking a look soon and comment back on this ticket with an initial prototype

@ddcprg
Copy link
Contributor

ddcprg commented Jan 10, 2018

@jckuester I now you've done this in your PR but I've raised #2924 just to add the vendor as per the README

I have the training job nearly done, I'm working on testing and I'll add the PR to this ticket as soon as I consider it ready

@ddcprg
Copy link
Contributor

ddcprg commented Jan 12, 2018

Training Job PR -> #2955

If no one has picked up the notebook resource yet I'll take it

@ddcprg
Copy link
Contributor

ddcprg commented Jan 15, 2018

Notebook Instance PR -> #2999

@radeksimko radeksimko added the service/sagemaker Issues and PRs that pertain to the sagemaker service. label Jan 28, 2018
@ddcprg
Copy link
Contributor

ddcprg commented May 16, 2018

How can we help to put some traction on these PR's? I'm going to go over the code and add any new features included by AWS recently if needed

@ddcprg
Copy link
Contributor

ddcprg commented May 16, 2018

@jckuester would you mind adding a new item to the list above to include support for "lifecycle configurations"? I'll take a look and post the PR number later

@ddcprg
Copy link
Contributor

ddcprg commented Jun 13, 2018

@randomcamel @radeksimko how can we help to push these PR's forward?

@smrtslckr
Copy link

+1, very interested in this functionality

@ashleyjkell
Copy link

Not been any movement on this for a while (looks like failed integration tests), any chance of getting it pushed on?

@ddcprg
Copy link
Contributor

ddcprg commented Aug 20, 2018

This has been open for long time, we are still waiting feedback from Hashicorp. The failures in the build are due to other issues in master at the time of merging, they may well have gone away by now

@jckuester
Copy link
Contributor Author

jckuester commented Aug 21, 2018

@ashleyjkell from time to time, I rebased my PRs (#2477 - #2479) to make integration test green again (I guess they still are). But as long as there is no feedback coming from HashiCorp I will not wasting any more energy here to keeping PRs uptodate/resolving conflicts with main branch.

@rainmanh
Copy link

rainmanh commented Oct 5, 2018

Any updates on this one?
Anyone from Hashicorp?

@chasmosis
Copy link

I am interested in this functionality as well. Updates?

@ddcprg
Copy link
Contributor

ddcprg commented Nov 1, 2018

I've just updated my PR's so they are green again. It'd be nice to have some feedback from @bflad or other TF committers. The PR's have been opened for about 10 months now

@bcatubig
Copy link
Contributor

bcatubig commented Nov 5, 2018

@bflad Is there any way to get these PR's approved and merged?

@t-lanigan
Copy link

Also adding my voice to this one. Interested in getting this merged

@mbfrahry
Copy link
Member

mbfrahry commented Nov 8, 2018

Hello! I'm sorry for the delay and I'm actively looking at getting SageMaker support merged into the aws provider.

@mbfrahry
Copy link
Member

mbfrahry commented Nov 8, 2018

I've just finished my review of the instance and training job resources and will be moving down the list noted here shortly.

@ddcprg
Copy link
Contributor

ddcprg commented Nov 9, 2018

awesome @mbfrahry ! I'll try to go through you review today and make the necessary changes

@mbfrahry
Copy link
Member

Hey @reynico, you won't have to do anything with modifying the version of Terraform. If you're pinned to a specific version of the aws provider, then you'll have to modify that to get these changes when they're released.

@rainmanh
Copy link

Any estimate at all when this will be released?

@ddcprg
Copy link
Contributor

ddcprg commented Jan 4, 2019

Hello everyone. Unfortunately I've not been able to do the modifications as per @mbfrahry 's review and right now my access to AWS is limited and won't be able to test the code with the changes.

If anyone is willing to pick the first two bullets in the description up that'd be great, the PR is #2999 Perhaps @jckuester ? Whoever takes this on can squash my commits or start from scratch if you prefer.

@mbfrahry
Copy link
Member

mbfrahry commented Jan 7, 2019

Hey @ddcprg, sorry to hear about your troubles. I'll take this on with @tracypholmes if it hasn't been picked up yet

@ddcprg
Copy link
Contributor

ddcprg commented Jan 8, 2019

Sure @mbfrahry pick it up, apologies for the inconveniences

@bflad
Copy link
Contributor

bflad commented Jan 16, 2019

The new aws_sagemaker_notebook_instance resource has been released in version 1.56.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@smrtslckr
Copy link

That is a great milestone! I am sorry to be "that person", but I have been more interested in the 3 outstanding deployment and hosting PRs: 2477, 2478, and 2479. Is there a current ETA on those now too?

@jckuester
Copy link
Contributor Author

@Erstwild we are very close :) I addressed all comments on the sagemaker PRs (which are in the 2nd feedback round now) and am now waiting for approval.

@rainmanh
Copy link

@julesjcraske
Is there any plan for implementing the ability of disabling the Direct internet access
And what about of attaching Lifecycle configurations ?

Many thanks

@bcatubig
Copy link
Contributor

bcatubig commented Jan 22, 2019

I would say the following are required in a followup pr.

  • LifecycleConfigName
  • DirectInternetAccess
  • DefaultCodeRepository
  • AdditionalCodeRepositories
  • VolumeSizeInGB (?)
  • AcceleratorTypes (?)

@saritajoshi9389
Copy link

Hi there,
Do we have any specific roadmap or timelines by which we are certain about PRs: 2477, 2478, and 2479? Along with attaching Lifecycle configurations ?

@bflad
Copy link
Contributor

bflad commented Feb 8, 2019

The new aws_sagemaker_model resource has been released in version 1.58.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@jckuester
Copy link
Contributor Author

jckuester commented Feb 15, 2019

Added a new PR for the aws_sagemaker_lifecycle_config resource (#7585). I also have updates for the aws_sagemaker_notebook_instance in the pipeline (@saritajoshi9389, @bcatubig), as we need those resources in our company @yoyolabsio.

@bcatubig
Copy link
Contributor

Added a PR for DirectInternetAccess support #7884 . Would love some feedback to see if things could be improved.

@jckuester
Copy link
Contributor Author

jckuester commented Mar 19, 2019

Here is another PR (#8011) that adds all missing attributes to aws_sagemaker_instance, ie. adapts the resource to the latest AWS API.

@bflad
Copy link
Contributor

bflad commented Mar 29, 2019

The aws_sagemaker_endpoint_configuration resource has been released in version 2.4.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@gattil
Copy link

gattil commented Apr 4, 2019

It would be really great to have the notebook lifecycle implemented. This is crucial for one of my clients.

@bflad
Copy link
Contributor

bflad commented Apr 5, 2019

Hi folks 👋 The following will be releasing in version 2.5.0 of the Terraform AWS Provider shortly:

  • New Resource: aws_sagemaker_endpoint
  • New Resource: aws_sagemaker_notebook_instance_lifecycle_configuration
  • resource/aws_sagemaker_notebook_instance: Add lifecycle_config_name argument

@rverma-nikiai
Copy link

@bflad Is this released?

@tdmalone
Copy link
Contributor

@rverma-nikiai The resources available for the AWS provider can be found at https://www.terraform.io/docs/providers/aws/ - if you type 'sagemaker' in the search box at the top left the resources mentioned above (and the new argument) have indeed been released.

@bcatubig
Copy link
Contributor

Created a new PR for DirectInternetAccess support as I nuked my previous PR by mistake. The new PR is #8618

@LarsNeR
Copy link

LarsNeR commented Aug 1, 2019

Is it possible to change the timeout for creating the sagemaker endpoint? For me it stops after 1m saying

ResourceNotReady: failed waiting for successful resource state

Usually creating the endpoint I have takes around 15 minutes. Adding
timeouts { create = "30m" delete = "30m" }
like it is possible on other resources gives me:

Error decoding timeout: Timeout Key (create) is not supported

@bflad
Copy link
Contributor

bflad commented Aug 1, 2019

Hi @LarsNeR 👋 My recommendation would be to open submit new GitHub issues following the bug report and feature request templates for existing functionality. These large "support service X" GitHub issues tend to not have a clear definition of done and usually get closed out in preference of more targeted issues for tracking individual asks.

@LarsNeR
Copy link

LarsNeR commented Aug 1, 2019

I've just seen it was an error on my site. Works as expected now.

@bflad
Copy link
Contributor

bflad commented Jan 29, 2020

Hi folks 👋

We just merged direct_internet_access argument support into the aws_sagemaker_notebook_instance resource. It will release with version 2.47.0 of the Terraform AWS Provider, tomorrow. Thanks to @bcatubig for the implementation. 👍

Please note that I'm going to close this GitHub issue since the majority of the original work for it has been completed and since there is not a clear definition of done over time. If you have specific bug reports with existing functionality or feature requests for additional functionality please use those issue templates. Thanks.

@bflad bflad closed this as completed Jan 29, 2020
@ghost
Copy link

ghost commented Mar 27, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/sagemaker Issues and PRs that pertain to the sagemaker service.
Projects
None yet
Development

No branches or pull requests