Skip to content
This repository has been archived by the owner on Jul 12, 2023. It is now read-only.

Log into Server via Http #95

Closed
colbymorrison opened this issue Jul 5, 2020 · 17 comments · Fixed by #115
Closed

Log into Server via Http #95

colbymorrison opened this issue Jul 5, 2020 · 17 comments · Fixed by #115
Assignees
Labels
kind/question Questions about the project

Comments

@colbymorrison
Copy link

colbymorrison commented Jul 5, 2020

Question

Hi, I'm volunteering over at Covid Watch. As mentioned in my colleague @ibeckermayer 's PR, we have a web app that we'd like to request verification codes from this server. I wrote a solution for our short-term/demo use that integrates our app with the server as it exists now, using http requests. Though preferably, we'd like to request verification codes via a programmatic interface as opposed to using axios requests. In my short-term solution, we create one user in the verification server for our web app. When a user in our app requests a code, our backend (Firebase Cloud Functions) logs into the server as that user via http and gets a new verification code. Relevant code in our repo is here. As far as we can tell, the username/password are acting effectively as an API key, and we can use them to generate verification codes programmatically. We are generally curious as to what differentiates this approach from the API-key approach, particularly from a security perspective. Would our ability to get codes this way not let someone generate new verification codes relatively rapidly, in the manner @mikehelmick was concerned with in his response to our API-key based approach? Thanks so much!

@colbymorrison colbymorrison added the kind/question Questions about the project label Jul 5, 2020
@mikehelmick
Copy link
Contributor

/cc @ibeckermayer

Thanks - I think the use case for unattended access makes sense. It sounds like in your console, there is an authenticated and trusted human who is logged in and responsible for issuing diagnosis verification codes to end users.

Having all users from your frontend come in as a single API key. We're scheduling additional quota and abuse detection work. In that model, having your frontend communicate to our frontend API via an API key would meant that quota would get all bundled up under that one "user."

The other thing that we're doing is expanding this implementation to be multi-tenant - allowing for multiple PHAs to be present in a single installation.

This just means that these API keys would need to be scoped correctly (just like user accounts will need to be scoped correctly).

So, if we go this route, I think it needs to be a different API key than the one we're using for exchanges.

Two questions:

  1. What's the reason to not have direct case workers to the UI for this server? This seems like an unnecessary layer of indirection given that our API will constrain you to exactly the type of information we accept.

  2. Do you plan to operate this verification server yourself? If so, are there other methods we could employ. For example: Cloud Run supports Google IAM credentials for invocation access. We could have a separate issue API server that uses that external authentication instead of application level authentication. Same could be done for deploying that service as a cluster local k8s service.

@mikehelmick mikehelmick self-assigned this Jul 7, 2020
@ibeckermayer
Copy link

Thanks - I think the use case for unattended access makes sense. It sounds like in your console, there is an authenticated and trusted human who is logged in and responsible for issuing diagnosis verification codes to end users.

Yes. We have our own human authentication system, and the verification code cloud function will only be invoke-able by "human-level" authenticated users.

Having all users from your frontend come in as a single API key. We're scheduling additional quota and abuse detection work. In that model, having your frontend communicate to our frontend API via an API key would meant that quota would get all bundled up under that one "user."

The other thing that we're doing is expanding this implementation to be multi-tenant - allowing for multiple PHAs to be present in a single installation.

This just means that these API keys would need to be scoped correctly (just like user accounts will need to be scoped correctly).

So, if we go this route, I think it needs to be a different API key than the one we're using for exchanges.

Apologies, I'm not quite up to speed on what PHA means, and couldn't determine it from the context + google-search. (If it's not super relevant to our work then no need to get into the gritty details on my behalf, see response to other q's below.)

  1. What's the reason to not have direct case workers to the UI for this server? This seems like an unnecessary layer of indirection given that our API will constrain you to exactly the type of information we accept.

Our primary reason is that we already have our own custom branded Covid Watch (React) app we've been building out for several months now for this purpose: source code. For a variety of reasons its important to our organization to send case workers to this branded interface. We could of course fork this repo and build our custom branding into this UI, but that would mean simultaneously throwing out and creating a substantial chunk of work, and so long as there's a secure way to strap it into what we've already built we'd much prefer to go that route.

  1. Do you plan to operate this verification server yourself? If so, are there other methods we could employ. For example: Cloud Run supports Google IAM credentials for invocation access. We could have a separate issue API server that uses that external authentication instead of application level authentication. Same could be done for deploying that service as a cluster local k8s service.

Yes we do plan to operate this server ourself, and in the future possibly multiple other instances. Just to give some context: the first customer we are rolling out with is the University of Arizona, for whom we'll run an instance of the server. In the future, we'll likely roll out with other regions/organizations far from Arizona, and give them their own separate instances. If there's some simpler path along these IAM/Cloud Run lines then I'm open to it; I'm not an expert in Google Cloud's various offerings so I'm largely deferring to your (at least relative) expertise on precisely how to go about this.

A top concern I have with whichever route we choose is that it doesn't diverge so much from your development plan that it becomes a massive chore for either of our teams to maintain. As much as we can, we'd like to keep the codebase of our live instances up to date with yours so that we can easily integrate updates/new-features/etc.

FWIW my intuition based on living in this "competing contact tracing apps milieu" for the past few months is that there are likely other groups out there that will want to take a similar approach to the one we're taking.

@mikehelmick
Copy link
Contributor

mikehelmick commented Jul 8, 2020

ok, if you're operating the verification server yourself, I think that opens up a few more options.
My recommendation would be to run this in a mode where the API for issuing verification codes is not on the public internet. In our model, where we're running the frontend and making the ajax calls to the apiserver, it needs to be public.

@ibeckermayer - re your PR. I think the best path is to create a 4th server that runs just the /api/issue endpoint (for discussion, call it the issue server).

Then in your configuration, you could run the issue server (private), the apiserver (public).

As for authenticator on that, two options that I think are viable

  1. I think it's fine to document that this sever is intended to be on a private network and/or have external authentication.

  2. Use API keys as you have proposed, but add a column on the API key table that indicates if it is usable for generating codes or not. I Think it would be a good idea to have separation between the device (token/certificate API calls). The on device API key is at more risk of being compromised and if that were to happy and that API could be used to generate codes, you'll be in a bad state.

Let me know what you think.

@mikehelmick
Copy link
Contributor

I think option 2 that I mentioned would be good for enabling e2e test scenarios as well - so probably a point in favor of that choice.

@ibeckermayer
Copy link

@mikehelmick Thanks for this advice, it's extremely helpful. Between those two options, I'm inclined to go with option 1 in order to prevent a the probably-inevitable headache where somebody mistakenly exposes an API key in one of our open source repos, and we wind up compromising the notification system for an entire state or country. Plus in general it makes sense to keep private APIs private.

Looking through Google Cloud's documentation, I came up with the following configuration:

  1. Create a Virtual Private Cloud (VPC) network
  2. Run this repo's database on a VM that allows only internal connections (default-allow-internal)
  3. Run the issue server on a VM that allows only internal connections, that talks to the database
  4. Run the apiserver on a VM exposed to the internet (with API key protection, basically just exactly how its already built), that talks to the database
  5. Connect our Firebase cloud functions to the VPC so that our auth protected getVerificationCode cloud function can make calls to the issue server.

Assuming that all checks out, how would you prefer we go about creating the issue server? Is your team willing to add that to the main branch of this repo in the near future? Or would you prefer that I do it myself and make a PR? Or should Covid Watch just keep that to our fork?

@mikehelmick
Copy link
Contributor

We're going to end up implementing option 2 for end to end tests internally. We just won't deploy that configuration in production.

I think it ends up being the same thing - new server that supports just the issue API and optionally has APIKey middleware installed.

Re: setup. Our default setup (See /terraform/ directory) uses Cloud Run instead of VMs. We do connect to the CloudSQL database over VPC, using serverless VPC connectors.

We chose Cloud Run over VMs because this workload lends itself well to autoscaling (including scale to zero). Cloud Run supports service-to-service auth via IAM control: https://cloud.google.com/run/docs/authenticating/service-to-service

Getting IAM authenticators is easier within GCP (metadata server available on all compute), but can be done externally as well.

@ibeckermayer
Copy link

We're going to end up implementing option 2 for end to end tests internally. We just won't deploy that configuration in production.

I think it ends up being the same thing - new server that supports just the issue API and optionally has APIKey middleware installed.

Right, in that case we'll await your updates and just unplug the api key middlewear.

Re: setup. Our default setup (See /terraform/ directory) uses Cloud Run instead of VMs. We do connect to the CloudSQL database over VPC, using serverless VPC connectors.

We chose Cloud Run over VMs because this workload lends itself well to autoscaling (including scale to zero). Cloud Run supports service-to-service auth via IAM control: https://cloud.google.com/run/docs/authenticating/service-to-service

Getting IAM authenticators is easier within GCP (metadata server available on all compute), but can be done externally as well.

Thanks for pointing that out, I hadn't looked at it. I think we'll try and mimic your set up then, my suggestion of VM's over Cloud Run was based on ignorance of the latter.

I think this settles all of our open questions on the Covid Watch side, feel free to close this issue and we'll keep an eye out for the impending option 2 update.

@ibeckermayer
Copy link

@mikehelmick do you have an estimate for when you think this setup will be implemented in the main branch?

We're hoping to roll out a production ready server within the next 2 weeks for the University of Arizona. Totally understandable if it's lower priority for you team, just trying to decide whether I should keep watching for PR's, or whether we should just go ahead and spin it up ourselves in our fork.

@mikehelmick
Copy link
Contributor

Right now google/exposure-notifications-server#663 is the overall top priority.

I can get someone on this after that. If you want to revise your PR along these lines, that would be most welcome.
Just let me know so we don't duplicate efforts.

@ibeckermayer
Copy link

Got it, in that case I’ll take another stab at it.

@mikehelmick
Copy link
Contributor

@ibeckermayer - I'll have a PR shortly that will add a second API key type - so that's part way there.

@ibeckermayer
Copy link

Ok good to know. I was working on that but it’s slow going, being new to Go and the codebase generally. I’ll wait for your PR before going any further, thank you for keeping me in the loop.

@mikehelmick
Copy link
Contributor

SG. if you want me to finish this, I can probably have it done today.

First PR is out for review. #112

@ibeckermayer
Copy link

If you can tackle it that would be amazing. I was taking it on because Covid Watch is coming up on a hard deadline and wasn't sure how long your top priority might take. Based on your PR I was on the right track, but painfully slow going.

@mikehelmick
Copy link
Contributor

k. working on it now.

@mikehelmick
Copy link
Contributor

fyi - I have this working, want to add some more tests / clean things up a bit before I Send the PR

@mikehelmick
Copy link
Contributor

mikehelmick commented Jul 16, 2020

PR is out to create the new server.

Still TODO

  • add terraform config
  • update build/deploy/promote scripts for new service

@google google locked and limited conversation to collaborators Oct 6, 2020
flagxor pushed a commit that referenced this issue Aug 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/question Questions about the project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants