Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Comments: Design of new sam build command #743

Merged
merged 18 commits into from
Nov 19, 2018

Conversation

sanathkr
Copy link
Contributor

@sanathkr sanathkr commented Oct 30, 2018

💻 Rendered Design Doc is here

🗓 RFC Open for feedback until: 5 Nov, 2018

Description of changes:

Adding a design document for discussion. In this document, I explain the motivation and technical design behind a new command to create deployable artifacts for Lambda functions from source - sam build.

Requesting comments on every aspect of the design, from user experience to implementation details. Once the discussion is closed, I will update the Task Breakdown section with actual list of tasks to implement this feature and create corresponding Github Issues.

This document is based on the design document template I added in #742

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@heitorlessa heitorlessa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some UX comments, additional task for sam init to create a consistent experience and brought in some DotNet Serverless folks

designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Outdated Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
+=============+========================================================================================+
| Resolve | No Op |
+-------------+----------------------------------------------------------------------------------------+
| Compile | ``dotnet lambda package --configuration release --output-package $BUILD/package.zip`` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @sliedig, @blomskog and @pcolazurdo to share their DotNet experience.

I remember having a chat with @sliedig and @pcolazurdo once that dotnet lambda package includes additional files that are not necessarily needed for Lambda to function -- Perhaps there's a flag or something well known in that space that we can include

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback here would be super helpful!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding Mark @Ubiguchi as he also has great experience in this area for comments

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version of the Amazon.Lambda.Tools creates a very slim version of the package so I would trust this. The tool creates a zip with all the required dependencies. If project has static files and they are included in the csproj definition, they will be automatically included in the zip

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the standard microsoft/dotnet docker image to compile. It would need to add dependencies to work with Lambda so a Dockerfile can be included that would add these dependencies on package creation. Not sure what is the model with other languages, but I think this would be a good solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcolazurdo Thanks for the review. Its great to know we are in the right direction here.

Can you elaborate on your comment about using standard microsoft/dotnet image? With native dependencies (like C libraries), we need to compile against a close version of Amazon Linux that Lambda runs. Doesn't dotnet have a similar requirement?

Also, for dependencies, wouldn't you be adding them to your .csproj file or something? Why do you need the Dockerfile?

Copy link
Contributor

@jfuss jfuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I really like this direction. Building is a pain in some languages and this helps complete the workflow for testing locally and being able to deploy to the cloud seamlessly.

designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved

#. Each Lambda function in SAM template gets built

#. Produce stable builds (best effort): If the source files did not change, built artifacts should not change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Are we building when the source files did not change?

Maybe a better phrasing is, Identical sources file will produce identical build artifacts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Taking a step further is how are we going to decide on that? Hashing? Hashing stored locally?

Copy link
Contributor

@benbridts benbridts Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical sources file will produce identical build artifacts

This can be confusing for dependency updates, which you may want to include even of your source code stays the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this for more details: https:/sanathkr/aws-sam-cli/blob/build-cmd-design/designs/sam_build_cmd.rst#stable-builds

In the initial release, I don't see much value in using hashing to control artifacts because most of the build process is outside of SAM CLI's control (NPM/Maven etc does the job). But it will change in future as we understand how ppl use this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stable build should be based only on source changes. The user could run clean/build to force dependencies update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.. This is the intention too.

designs/sam_build_cmd.rst Show resolved Hide resolved
Out-of-Scope
------------
#. Supports adding data files ie. files that are not referenced by the package manager (ex: images, css etc)
#. Support to exclude certain files from the built artifact (ex: using .gitignore or using regex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumption: Customers will be using these commands together sam build | sam package.

Given my assumption, would this create larger packages that customers deploy? Or is this just stating arbitrary files will not be excludable?

Are there any files we will be excluding by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will exclude certain files & folders by default like *.pyc that makes sense, but we won't be able to support parsing an arbitrary .gitignore file. So it is possible that we don't create the slimmest zip file. But this could be a fast follow feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another benefit of using NPM for node, is that this is automatically handled by NPM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. It also creates a problem where we don't have a consistent experience across different programming languages (ie. Python won't respect .gitignore but Nodejs builds will). May be that is acceptable?

designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Outdated Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Outdated Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Outdated Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
designs/sam_build_cmd.rst Show resolved Hide resolved
+-------------+--------------------------------------+
| Action | Command |
+=============+======================================+
| Resolve | ``npm install`` |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the way you're going with this, so we (from Claudia.js) would like to ask if you'd like us to share the code from our claudia pack command?

As just an npm install with excluding node_modules is not going to be enough. You will have other files, such as .eslintrc, still present everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES YES YES 🔥Thank you so much for offering. This is exciting to get deeper Nodejs build support into the command. The current technical design does not support such integrations. I will rework and send an updated PR.

Broadly, here are some questions I am trying to answer:

  • Can we support build actions written natively in other languages (JS/Golang/C#) etc?
  • How can we auto-detect certain frameworks and invoke appropriate build actions for that framework because SAM CLI's built-in functionality is never going to be as deep and opinionated as the framework's? (ex: call claudia pack for ClaudiaJS apps, or zappa package for Zappa apps)
  • How can I let other non-SAM CLI developers write custom build actions?
  • How can we build all the custom build actions into SAM CLI so customers don't have to install a plugin?
  • If the build command is a collection of 20-30 different build actions for various frameworks, does it make sense to create a separate repository and vend it as a library that SAM CLI and other tools can consume? I know we are never going to ever support all frameworks/opinions applying the 80:20 rule, what would a good 80% look like?

I might be digressing from the original intent of the build command, but seeing the community interest and going through other build/package implementations, I feel like we can all benefit from each other if we collect our efforts.

Thoughts?

Copy link
Contributor

@gojko gojko Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanathkr claudia pack will do the proper packaging for any nodejs runtime (dependency installation & rewriting for local deps and removing test resources), so it's not just for projects made with claudia. eg for video puppet, I'm using cloudformation to deploy lambdas with nodejs (not even SAM, because of nested templates not working), but I'm using claudia pack to create deployable zips.

you could use directly claudia pack for any lambda that is marked with a nodejs runtime, or we could transfer the relevant parts of the code to a SAM sub-project. The only real dependency there is the NPM executable. Just let us know what you'd prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't know claudia pack is generic. Yes, it would be fantastic to be able to adopt that functionality into SAM CLI.

We have two options:

Option 1: Include a JS module in SAM CLI's Python source
To build Node projects, SAM CLI just invoke the JS module node pack.js.

Pros:

  • Easy to lift & shift
  • Easy to use language specific libraries that can support deeper integrations in future like webpack build or running gulp scripts
  • Sets precedence for other runtimes like Java which might need reflexion to create the package
  • Easier to get help from JS community who is more familiar with building JS packages.

Cons:

  • Vending JS files in Python package
  • Might take dependency on certain version of Node. We can't enforce that customers have this version of Node on their system.
  • Might have to webpack all dependencies, minify and vend one file that we just run using node pack.js.
  • Need to spawn & manage processes
  • Could become a tech debt if this approach doesn't scale.

Option 2: Rewrite the claudia pack functionality in Python

Pros:

  • Native implementation.
  • Implementation might create general features that can be used by build actions for other runtimes
  • Faster

Cons:

  • Takes a lot more work. Can't lift & shift
  • Won't get to use language native features.

What do you guys think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can start with Node.js (option 1), and rewrite in Python if needed later.

For Node.js version, with Claudia.js we try to fully support versions supported by Lambda, but it works in all major versions of Node. Also it’s easy to have multiple versions of Node.js with tools like nvm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe to assume that people deploying node.js lambda projects have node.js installed locally. If not, then neither of the two solutions will work, unless you want to fully rewrite NPM in python as well.

Regarding your option 2 and second con, I don't think that is relevant. if you're just thinking of packaging up lambdas, there are no language-native features required for it (apart from loading JSON, which I assume you can do with Python as well). It's just a matter of moving files around, zipping them up, and invoking NPM a few times with the right options.

How about having an install and package steps for language-dependent builders? install could prepare the builder from python, by effectively creating an local executable in a runtime-specific way (for example, in case of node.js, this would run npm install on a subdirectory where you have the packaging script, to pull the right dependencies; in case of Java it could grab the maven shade plugin...). If that step fails, SAM can complain and tell the user that the local environment needs aren't met, so it can't build the project, and potentially direct them to a URL documenting what local tools are required for building that runtime. If that step succeeds, you know that it's possible to build it correctly, and SAM would just need to invoke a system executable produced/configured by the first step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stojanovic Yes, let's do that. The one thing I underestimated is the level of help existing Javascript libraries will provide to package JS apps. Ex: You could just import a gulpfile as module and run actions without treating it like a CLI.

This creates an interesting state where SAM CLI's Python package will contain JS libraries, but I think that is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gojko I like your idea of splitting the install & package phases. I would even split the install phase into prepare, resolve-dependencies, copy-source phases where we need to call NPM to simply resolve-dependencies and for everything else we will have default Python implementations.

When we consume Claudia's pack functionality, we can work through the details of where to draw the line for each phase. But I think it is worth keeping the pack code in Javascript so we can add support for advanced building like running Gulpscripts or Webpack etc without re-implementing all these libraries in Python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added information about how this will be implemented to the design doc: https:/sanathkr/aws-sam-cli/blob/build-cmd-design/designs/sam_build_cmd.rst#implementation-of-build-actions

designs/sam_build_cmd.rst Outdated Show resolved Hide resolved
@sanathkr
Copy link
Contributor Author

sanathkr commented Nov 5, 2018

I have addressed all comments and updated design document to reflect the latest discussion. Can you guys approve the PR so I can start implementation?

@jfuss @thesriram @simalexan @gojko @stojanovic

@willyg302
Copy link
Contributor

See below the notes from our security-focused review.

Background notes

  • .samrc and custom build actions are out of scope. This removes a large portion of the attack surface.
  • The library of build actions will start with the defaults documented. In the future, the hope is to grow to 20-30 builders covering 90% of use cases (for example, a webpack builder). Tentatively, SAM CLI will use a combination of runtime and manifest to decide which builder takes precedence. Because the library is curated, we can guarantee precedence behavior.
  • For now, data files and ignoring can be delegated to the "native" command such as the npm install script.
  • Consider moving built-template.yaml to build/template.yaml.

Discussion 1: protecting customers

We want to prevent customers from shooting themselves in the foot. Consider the following scenario:

  1. A malicious actor publishes an open-source Node.js SAM app to GitHub, complete with a package.json that contains a malicious command in the install script (for example, rm -rf /).
  2. A customer hears about SAM and the app from 1. They install SAM CLI, clone the app, and read that to deploy the app to their own AWS account they must sam build/package/deploy.
  3. Upon running sam build, something bad happens.

Note that 3 would have also happened if the customer ran just npm install, and is a general problem with package managers that support arbitrary logic. However, there is a layer of indirection and it may not be immediately obvious what sam build will do without reading documentation beforehand.

Mitigations:

  • Print/log commands that SAM CLI runs as it runs them.
  • Check for sudo. Only run as root if --force or --native (since that's in a container already) are present.
  • Recommend that customers sandbox builds by using --native.

Other ideas:

  • Sanitizing commands before running them. This is hard and brittle, and could lead to other unintended consequences.
  • Prompting customers before running commands and supporting e.g. sam build --yes for automation. This is annoying. Customers will learn to pass --yes, giving them no added protection.
  • Make --native the default. This is a breaking change since it assumes Docker is installed, which currently isn't necessary for the package/deploy lifecycle.

Discussion 2: exec logic

The library of builders will bundle all dependencies, so that there is no post-install needed. For initial launch there should be no bundled dependencies. In the future the tenet is to assume library-specific dependencies (such as webpack) are installed and available on SAM CLI's path -- this path will be forwarded.

There are two points of execution:

  • SAM CLI --> builder script. Using Python's subprocess module. Already used in SAM CLI here, will follow similar approach (no shell=True, etc.).
  • builder script --> native (npm, etc.). Logic here is builder-dependent. The strategy going forward is likely to treat the builder library package as a true dependency, with separate reviews. SAM CLI will pin to every major release of that package and pull in added builders as a batch each time.

These are reasonable approaches. If a future pen test of SAM CLI is conducted, remember to call out these integration points to focus on.

@sanathkr
Copy link
Contributor Author

sanathkr commented Nov 7, 2018

Created a separate Github Repository to implement each build action: https:/awslabs/aws-lambda-builders. This helps evolve the builders independent of the CLI. Contributors writing build actions need not know/use SAM CLI.

@gojko
Copy link
Contributor

gojko commented Nov 8, 2018

Created a separate Github Repository to implement each build action: https:/awslabs/aws-lambda-builders. This helps evolve the builders independent of the CLI. Contributors writing build actions need not know/use SAM CLI.

Do you want us to start migrating claudia pack Code there, or do you want to prototype the python builder first to get the interface right?

@sanathkr
Copy link
Contributor Author

sanathkr commented Nov 8, 2018

@gojko Actually hang on for a few days. I want to prototype Python first so we have the right scaffolding before we take the JS lib.

@sanathkr
Copy link
Contributor Author

sanathkr commented Nov 9, 2018

Security Review Notes

Threats:

  1. Abusing sam build to escalate privileges
  2. Sam build relies on third party dependencies with known security vulnerabilities.
  3. External customers having a bad experience with sam build
    1. Locally (Unexpected behavior)
    2. The generated lambda artifacts are invalid or don’t work

Recommendations:

  1. Implement sudo check when sam build is invoked. When sudo rights are detected sam does a hard fail. The hard fail behavior can be overwritten by an environment variable.
  2. Writing temp folders with 700
  3. Provide mechanism to inform customers about security vulnerabilities in sam cli (probably thru a message when CLI is used)
  4. When using sam build to compile native dependencies in amazon linux we use docker. In order to move the build artifacts out of docker we recommend to perform an explicit copy from inside the container to the workstation using docker cp.
  5. Proper unit & integration tests on various platforms to ensure customers don't get a bad experience w.r.t the generated artifacts.

@edtbl76
Copy link

edtbl76 commented Dec 6, 2018

Any update on this?

@sriram-mv
Copy link
Contributor

@edtbl76 sam build is available today for Python Runtimes, try giving the latest version of aws-sam-cli a whirl.

@edtbl76
Copy link

edtbl76 commented Dec 6, 2018 via email

@aws aws locked as resolved and limited conversation to collaborators Dec 7, 2018
@jfuss
Copy link
Contributor

jfuss commented Dec 7, 2018

@edtbl76 You can find other language support in our issues. Those issues will be closed after we release support. You can track progress there.

Locking this issue to reduce noise and asks about language support. This was a design for sam build and will track further language supports through issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.