Log into Server via Http #95

colbymorrison · 2020-07-05T01:01:03Z

Question

Hi, I'm volunteering over at Covid Watch. As mentioned in my colleague @ibeckermayer 's PR, we have a web app that we'd like to request verification codes from this server. I wrote a solution for our short-term/demo use that integrates our app with the server as it exists now, using http requests. Though preferably, we'd like to request verification codes via a programmatic interface as opposed to using axios requests. In my short-term solution, we create one user in the verification server for our web app. When a user in our app requests a code, our backend (Firebase Cloud Functions) logs into the server as that user via http and gets a new verification code. Relevant code in our repo is here. As far as we can tell, the username/password are acting effectively as an API key, and we can use them to generate verification codes programmatically. We are generally curious as to what differentiates this approach from the API-key approach, particularly from a security perspective. Would our ability to get codes this way not let someone generate new verification codes relatively rapidly, in the manner @mikehelmick was concerned with in his response to our API-key based approach? Thanks so much!

mikehelmick · 2020-07-07T20:42:30Z

/cc @ibeckermayer

Thanks - I think the use case for unattended access makes sense. It sounds like in your console, there is an authenticated and trusted human who is logged in and responsible for issuing diagnosis verification codes to end users.

Having all users from your frontend come in as a single API key. We're scheduling additional quota and abuse detection work. In that model, having your frontend communicate to our frontend API via an API key would meant that quota would get all bundled up under that one "user."

The other thing that we're doing is expanding this implementation to be multi-tenant - allowing for multiple PHAs to be present in a single installation.

This just means that these API keys would need to be scoped correctly (just like user accounts will need to be scoped correctly).

So, if we go this route, I think it needs to be a different API key than the one we're using for exchanges.

Two questions:

What's the reason to not have direct case workers to the UI for this server? This seems like an unnecessary layer of indirection given that our API will constrain you to exactly the type of information we accept.
Do you plan to operate this verification server yourself? If so, are there other methods we could employ. For example: Cloud Run supports Google IAM credentials for invocation access. We could have a separate issue API server that uses that external authentication instead of application level authentication. Same could be done for deploying that service as a cluster local k8s service.

ibeckermayer · 2020-07-07T21:23:05Z

Thanks - I think the use case for unattended access makes sense. It sounds like in your console, there is an authenticated and trusted human who is logged in and responsible for issuing diagnosis verification codes to end users.

Yes. We have our own human authentication system, and the verification code cloud function will only be invoke-able by "human-level" authenticated users.

Having all users from your frontend come in as a single API key. We're scheduling additional quota and abuse detection work. In that model, having your frontend communicate to our frontend API via an API key would meant that quota would get all bundled up under that one "user."

The other thing that we're doing is expanding this implementation to be multi-tenant - allowing for multiple PHAs to be present in a single installation.

This just means that these API keys would need to be scoped correctly (just like user accounts will need to be scoped correctly).

So, if we go this route, I think it needs to be a different API key than the one we're using for exchanges.

Apologies, I'm not quite up to speed on what PHA means, and couldn't determine it from the context + google-search. (If it's not super relevant to our work then no need to get into the gritty details on my behalf, see response to other q's below.)

What's the reason to not have direct case workers to the UI for this server? This seems like an unnecessary layer of indirection given that our API will constrain you to exactly the type of information we accept.

Our primary reason is that we already have our own custom branded Covid Watch (React) app we've been building out for several months now for this purpose: source code. For a variety of reasons its important to our organization to send case workers to this branded interface. We could of course fork this repo and build our custom branding into this UI, but that would mean simultaneously throwing out and creating a substantial chunk of work, and so long as there's a secure way to strap it into what we've already built we'd much prefer to go that route.

Do you plan to operate this verification server yourself? If so, are there other methods we could employ. For example: Cloud Run supports Google IAM credentials for invocation access. We could have a separate issue API server that uses that external authentication instead of application level authentication. Same could be done for deploying that service as a cluster local k8s service.

Yes we do plan to operate this server ourself, and in the future possibly multiple other instances. Just to give some context: the first customer we are rolling out with is the University of Arizona, for whom we'll run an instance of the server. In the future, we'll likely roll out with other regions/organizations far from Arizona, and give them their own separate instances. If there's some simpler path along these IAM/Cloud Run lines then I'm open to it; I'm not an expert in Google Cloud's various offerings so I'm largely deferring to your (at least relative) expertise on precisely how to go about this.

A top concern I have with whichever route we choose is that it doesn't diverge so much from your development plan that it becomes a massive chore for either of our teams to maintain. As much as we can, we'd like to keep the codebase of our live instances up to date with yours so that we can easily integrate updates/new-features/etc.

FWIW my intuition based on living in this "competing contact tracing apps milieu" for the past few months is that there are likely other groups out there that will want to take a similar approach to the one we're taking.

mikehelmick · 2020-07-08T22:19:14Z

ok, if you're operating the verification server yourself, I think that opens up a few more options.
My recommendation would be to run this in a mode where the API for issuing verification codes is not on the public internet. In our model, where we're running the frontend and making the ajax calls to the apiserver, it needs to be public.

@ibeckermayer - re your PR. I think the best path is to create a 4th server that runs just the /api/issue endpoint (for discussion, call it the issue server).

Then in your configuration, you could run the issue server (private), the apiserver (public).

As for authenticator on that, two options that I think are viable

I think it's fine to document that this sever is intended to be on a private network and/or have external authentication.
Use API keys as you have proposed, but add a column on the API key table that indicates if it is usable for generating codes or not. I Think it would be a good idea to have separation between the device (token/certificate API calls). The on device API key is at more risk of being compromised and if that were to happy and that API could be used to generate codes, you'll be in a bad state.

Let me know what you think.

mikehelmick · 2020-07-08T22:33:03Z

I think option 2 that I mentioned would be good for enabling e2e test scenarios as well - so probably a point in favor of that choice.

ibeckermayer · 2020-07-09T22:20:39Z

@mikehelmick Thanks for this advice, it's extremely helpful. Between those two options, I'm inclined to go with option 1 in order to prevent a the probably-inevitable headache where somebody mistakenly exposes an API key in one of our open source repos, and we wind up compromising the notification system for an entire state or country. Plus in general it makes sense to keep private APIs private.

Looking through Google Cloud's documentation, I came up with the following configuration:

Create a Virtual Private Cloud (VPC) network
Run this repo's database on a VM that allows only internal connections (default-allow-internal)
Run the issue server on a VM that allows only internal connections, that talks to the database
Run the apiserver on a VM exposed to the internet (with API key protection, basically just exactly how its already built), that talks to the database
Connect our Firebase cloud functions to the VPC so that our auth protected getVerificationCode cloud function can make calls to the issue server.

Assuming that all checks out, how would you prefer we go about creating the issue server? Is your team willing to add that to the main branch of this repo in the near future? Or would you prefer that I do it myself and make a PR? Or should Covid Watch just keep that to our fork?

mikehelmick · 2020-07-09T22:38:23Z

We're going to end up implementing option 2 for end to end tests internally. We just won't deploy that configuration in production.

I think it ends up being the same thing - new server that supports just the issue API and optionally has APIKey middleware installed.

Re: setup. Our default setup (See /terraform/ directory) uses Cloud Run instead of VMs. We do connect to the CloudSQL database over VPC, using serverless VPC connectors.

We chose Cloud Run over VMs because this workload lends itself well to autoscaling (including scale to zero). Cloud Run supports service-to-service auth via IAM control: https://cloud.google.com/run/docs/authenticating/service-to-service

Getting IAM authenticators is easier within GCP (metadata server available on all compute), but can be done externally as well.

ibeckermayer · 2020-07-10T15:17:06Z

We're going to end up implementing option 2 for end to end tests internally. We just won't deploy that configuration in production.

I think it ends up being the same thing - new server that supports just the issue API and optionally has APIKey middleware installed.

Right, in that case we'll await your updates and just unplug the api key middlewear.

Re: setup. Our default setup (See /terraform/ directory) uses Cloud Run instead of VMs. We do connect to the CloudSQL database over VPC, using serverless VPC connectors.

We chose Cloud Run over VMs because this workload lends itself well to autoscaling (including scale to zero). Cloud Run supports service-to-service auth via IAM control: https://cloud.google.com/run/docs/authenticating/service-to-service

Getting IAM authenticators is easier within GCP (metadata server available on all compute), but can be done externally as well.

Thanks for pointing that out, I hadn't looked at it. I think we'll try and mimic your set up then, my suggestion of VM's over Cloud Run was based on ignorance of the latter.

I think this settles all of our open questions on the Covid Watch side, feel free to close this issue and we'll keep an eye out for the impending option 2 update.

ibeckermayer · 2020-07-14T15:16:57Z

@mikehelmick do you have an estimate for when you think this setup will be implemented in the main branch?

We're hoping to roll out a production ready server within the next 2 weeks for the University of Arizona. Totally understandable if it's lower priority for you team, just trying to decide whether I should keep watching for PR's, or whether we should just go ahead and spin it up ourselves in our fork.

mikehelmick · 2020-07-14T15:26:15Z

Right now google/exposure-notifications-server#663 is the overall top priority.

I can get someone on this after that. If you want to revise your PR along these lines, that would be most welcome.
Just let me know so we don't duplicate efforts.

ibeckermayer · 2020-07-14T16:05:13Z

Got it, in that case I’ll take another stab at it.

mikehelmick · 2020-07-15T18:37:08Z

@ibeckermayer - I'll have a PR shortly that will add a second API key type - so that's part way there.

ibeckermayer · 2020-07-15T18:40:27Z

Ok good to know. I was working on that but it’s slow going, being new to Go and the codebase generally. I’ll wait for your PR before going any further, thank you for keeping me in the loop.

mikehelmick · 2020-07-15T19:10:02Z

SG. if you want me to finish this, I can probably have it done today.

First PR is out for review. #112

ibeckermayer · 2020-07-15T19:30:15Z

If you can tackle it that would be amazing. I was taking it on because Covid Watch is coming up on a hard deadline and wasn't sure how long your top priority might take. Based on your PR I was on the right track, but painfully slow going.

mikehelmick · 2020-07-15T20:16:44Z

k. working on it now.

mikehelmick · 2020-07-15T23:12:05Z

fyi - I have this working, want to add some more tests / clean things up a bit before I Send the PR

mikehelmick · 2020-07-16T01:42:17Z

PR is out to create the new server.

Still TODO

add terraform config
update build/deploy/promote scripts for new service

- Fixes GH-95 - Fixes GH-130

colbymorrison added the kind/question Questions about the project label Jul 5, 2020

mikehelmick mentioned this issue Jul 7, 2020

Feature/issue api #88

Closed

mikehelmick self-assigned this Jul 7, 2020

mikehelmick mentioned this issue Jul 15, 2020

Admin API Keys #112

Merged

mikehelmick mentioned this issue Jul 16, 2020

Add new API server for admin actions. #114

Merged

mikehelmick mentioned this issue Jul 16, 2020

update terraform/build for adminapi #115

Merged

google-oss-robot closed this as completed in #115 Jul 16, 2020

google locked and limited conversation to collaborators Oct 6, 2020

flagxor pushed a commit that referenced this issue Aug 11, 2021

Require nonce and timestamp bounds (#132)

3d1d35a

- Fixes GH-95 - Fixes GH-130

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log into Server via Http #95

Log into Server via Http #95

colbymorrison commented Jul 5, 2020 •

edited

Loading

mikehelmick commented Jul 7, 2020

ibeckermayer commented Jul 7, 2020

mikehelmick commented Jul 8, 2020 •

edited

Loading

mikehelmick commented Jul 8, 2020

ibeckermayer commented Jul 9, 2020

mikehelmick commented Jul 9, 2020

ibeckermayer commented Jul 10, 2020

ibeckermayer commented Jul 14, 2020

mikehelmick commented Jul 14, 2020

ibeckermayer commented Jul 14, 2020

mikehelmick commented Jul 15, 2020

ibeckermayer commented Jul 15, 2020

mikehelmick commented Jul 15, 2020

ibeckermayer commented Jul 15, 2020

mikehelmick commented Jul 15, 2020

mikehelmick commented Jul 15, 2020

mikehelmick commented Jul 16, 2020 •

edited

Loading

Log into Server via Http #95

Log into Server via Http #95

Comments

colbymorrison commented Jul 5, 2020 • edited Loading

Question

mikehelmick commented Jul 7, 2020

ibeckermayer commented Jul 7, 2020

mikehelmick commented Jul 8, 2020 • edited Loading

mikehelmick commented Jul 8, 2020

ibeckermayer commented Jul 9, 2020

mikehelmick commented Jul 9, 2020

ibeckermayer commented Jul 10, 2020

ibeckermayer commented Jul 14, 2020

mikehelmick commented Jul 14, 2020

ibeckermayer commented Jul 14, 2020

mikehelmick commented Jul 15, 2020

ibeckermayer commented Jul 15, 2020

mikehelmick commented Jul 15, 2020

ibeckermayer commented Jul 15, 2020

mikehelmick commented Jul 15, 2020

mikehelmick commented Jul 15, 2020

mikehelmick commented Jul 16, 2020 • edited Loading

colbymorrison commented Jul 5, 2020 •

edited

Loading

mikehelmick commented Jul 8, 2020 •

edited

Loading

mikehelmick commented Jul 16, 2020 •

edited

Loading