Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch for knative istio probe issue #6962 #1137

Merged
merged 5 commits into from
May 8, 2020

Conversation

yuzisun
Copy link
Member

@yuzisun yuzisun commented Apr 26, 2020

Which issue is resolved by this Pull Request:
Resolves kserve/kserve#760

Description of your changes:
Import knative patch for istio probe issue, I have tested on GCP IAP kfserving test cluster and confirmed that probe failures like HTTP 401/403 are no longer blocking knative to mark the service as ready.

Checklist:

  • Unit tests have been rebuilt:
    1. cd manifests/tests
    2. make generate-changed-only
    3. make test

@kubeflow-bot
Copy link
Contributor

This change is Reviewable

@yuzisun
Copy link
Member Author

yuzisun commented Apr 26, 2020

@jlewi Do you have any instruction on how to get the programmatic token and send request to KFServing on GCP IAP? I verified the knative istio probes are succeeding now but still need to send a request to the protected endpoint to test.

@jlewi
Copy link
Contributor

jlewi commented Apr 29, 2020

Here are docs on programmatic authentication
https://cloud.google.com/iap/docs/authentication-howto

There is some example code here
https:/kubeflow/kfctl/blob/466bd223cda33eb85a11a136b18b366ac6372802/py/kubeflow/kfctl/testing/util/gcp_util.py#L81

This is making a simple http request to the central dashboard over IAP and verifying we get a 200.

@Jeffwan
Copy link
Member

Jeffwan commented May 1, 2020

Do we still need kfserving-gateway anymore with this change? Maybe we can delete them as well

@yuzisun
Copy link
Member Author

yuzisun commented May 2, 2020

@jlewi thanks for the links! I think I am able to get authenticated now but I am getting a routing issue as following, seems like it did not hit the istio ingress gateway.

python iap_request.py https://kfserving.endpoints.kfserving-248310.cloud.goog/v1/models/sklearn-iris:predict 
165686043417-vq3k1nd60m4svttrj8f1r749augem3fl.apps.googleusercontent.com --input=./docs/samples/sklearn/iris-input.json

default backend - 404
Traceback (most recent call last):
  File "iap_request.py", line 141, in <module>
    main()
  File "iap_request.py", line 135, in main
    resp.status_code, resp.headers, resp.text))
Exception: Bad response from application: 404 / {'Date': 'Sat, 02 May 2020 20:37:08 GMT', 'Content-Length': '21', 'Content-Type': 'text/plain; charset=utf-8', 'Via': '1.1 google', 'Alt-Svc': 'clear'} / 'default backend - 404'

I set the host header sklearn-iris.kubeflow-yuzi-dan.example.com on the request and this is KFServing's virtual service, does IAP support host based routing?

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"serving.kubeflow.org/v1alpha2","kind":"InferenceService","metadata":{"annotations":{},"name":"sklearn-iris","namespace":"kubeflow-yuzi-dan"},"spec":{"default":{"predictor":{"sklearn":{"storageUri":"gs://kfserving-samples/models/sklearn/iris"}}}}}
  creationTimestamp: "2020-05-02T18:34:53Z"
  generation: 1
  name: sklearn-iris
  namespace: kubeflow-yuzi-dan
  ownerReferences:
  - apiVersion: serving.kubeflow.org/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: InferenceService
    name: sklearn-iris
    uid: 109cfee2-8ca3-11ea-97be-42010a8e005d
  resourceVersion: "16200"
  selfLink: /apis/networking.istio.io/v1alpha3/namespaces/kubeflow-yuzi-dan/virtualservices/sklearn-iris
  uid: 9d95ae41-8ca3-11ea-97be-42010a8e005d
spec:
  gateways:
  - kubeflow-gateway.kubeflow
  hosts:
  - sklearn-iris.kubeflow-yuzi-dan.example.com
  http:
  - match:
    - uri:
        prefix: /v1/models/sklearn-iris:predict
    route:
    - destination:
        host: istio-ingressgateway.istio-system.svc.cluster.local
        port: {}
      headers:
        request:
          set:
            Host: sklearn-iris-predictor-default.kubeflow-yuzi-dan.example.com
      weight: 100

@yuzisun
Copy link
Member Author

yuzisun commented May 2, 2020

hmm, now I changed to use path based routing and able to hit the ingress gateway and getting 403, but that seems to be istio rbac rules which blocks it since istio sidecar injection is turned on by default in kubeflow user namespaces.

[2020-05-02T21:04:42.804Z] "POST /kfserving/kubeflow-yuzi-dan/sklearn-iris HTTP/1.1" 403 - "-" 82 19 4 2 "70.23.46.131, 34.102.252.219,10.0.1.1" "python-requests/2.22.0" "b827cc94-59a5-9bd3-8e57-45ffc50f1899" "sklearn-iris-predictor-default.kubeflow-yuzi-dan.example.com" "10.0.1.11:80" outbound|80||istio-ingressgateway.istio-system.svc.cluster.local - 10.0.1.11:80 10.0.1.1:64793 -

So as is kubeflow does not work with KFServing out of the box, I had to add following virtual service to get it working.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: kfserving-kubeflow-yuzi-dan-kfserving-test
  namespace: kubeflow-yuzi-dan
spec:
  gateways:
  - kubeflow/kubeflow-gateway
  hosts:
  - '*'
  http:
  - match:
    - uri:
        prefix: /kfserving/kubeflow-yuzi-dan/sklearn-iris
    rewrite:
        uri: /v1/models/sklearn-iris:predict
    route:
    - destination:
        host: istio-ingressgateway.istio-system.svc.cluster.local
      headers:
        request:
          set:
            Host: sklearn-iris-predictor-default.kubeflow-yuzi-dan.example.com
      weight: 100
    timeout: 300s

@yuzisun
Copy link
Member Author

yuzisun commented May 2, 2020

Finally after I disable istio sidecar, I get KFServing working on GCP IAP e2e !

python iap_request.py https://kfserving.endpoints.kfserving-248310.cloud.goog/kfserving/kubeflow-yuzi-dan/sklearn-iris 165686043417-vq3k1nd60m4svttrj8f1r749augem3fl.apps.googleusercontent.com --input=./docs/samples/sklearn/iris-input.json
/opt/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
{"predictions": [1, 1]}

@krishnadurai
Copy link
Contributor

/assign

@yuzisun
Copy link
Member Author

yuzisun commented May 4, 2020

Do we still need kfserving-gateway anymore with this change? Maybe we can delete them as well

yes, I have removed them.

@yuzisun
Copy link
Member Author

yuzisun commented May 4, 2020

@krishnadurai @jlewi @Jeffwan Can you help review this?

kfdef/kfctl_aws.v1.0.2.yaml Outdated Show resolved Hide resolved
@@ -221,7 +221,7 @@ spec:
sidecar.istio.io/inject: "false"
labels:
app: networking-istio
serving.knative.dev/release: "v0.11.1"
serving.knative.dev/release: "v0.11.2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see knative serving version bump in this PR? is it in the scope?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a minor version bump of 0.11 release and only networking istio deployment image is changed with the probing fix.

Copy link
Contributor

@krishnadurai krishnadurai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than changing pinned manifests.

kfdef/kfctl_aws.v1.0.2.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@krishnadurai krishnadurai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@jlewi
Copy link
Contributor

jlewi commented May 6, 2020

@ellis-bigelow could you take a look at this please on my behalf?

@@ -89,8 +89,8 @@ data:
}
ingress: |-
{
"ingressGateway" : "knative-ingress-gateway.knative-serving",
"ingressService" : "kfserving-ingressgateway.istio-system.svc.cluster.local"
"ingressGateway" : "kubeflow-gateway.kubeflow",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ingressgateway called the kubeflow-gateway.kubeflow?

Isn't the istio gateway the only one we need?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubeflow uses thiskubeflow-gateway in kubeflow namespace. istio-ingress gateway is not being used as it's in istio-system.

This PR reverts some changes in #949 and bump the version.

@ellistarn
Copy link

/lgtm
/approve

@jlewi
Copy link
Contributor

jlewi commented May 7, 2020

Adding the approved label
/hold
I added a hold in case someone else still needs to review this. If not @yuzisun you can go ahead and remove it to merge it.

@yuzisun
Copy link
Member Author

yuzisun commented May 7, 2020

I will leave up to tomorrow, if no one else comments on then I am going to unhold.

@yuzisun
Copy link
Member Author

yuzisun commented May 8, 2020

/unhold

@jlewi
Copy link
Contributor

jlewi commented May 8, 2020

/lgtm
/approve

@yuzisun Thank you so much for driving this!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ellis-bigelow, jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jeffwan
Copy link
Member

Jeffwan commented May 8, 2020

some error msg.

INFO     root:util.py:72 failed to apply:  (kubeflow.error): Code 500 with message: coordinator Apply failed for gcp:  (kubef
low.error): Code 400 with message: gcp apply could not update deployment manager: could not update deployment manager entries
; Creating kfctl-835d error(400): BAD REQUEST

@yuzisun
Copy link
Member Author

yuzisun commented May 8, 2020

/retest

@yuzisun
Copy link
Member Author

yuzisun commented May 8, 2020

INFO     root:util.py:72 time="2020-05-08T18:34:35Z" level=error msg="Creating kfctl-e1e9 error: &{Code:RESOURCE_ERROR Location:/deployments/kfctl-e1e9/resources/
kfctl-e1e9 Message:{\"ResourceType\":\"gcp-types/container-v1beta1:projects.locations.clusters\",\"ResourceErrorCode\":\"403\",\"ResourceErrorMessage\":{\"code\":
403,\"message\":\"Insufficient regional quota to satisfy request: resource \\\"IN_USE_ADDRESSES\\\": request requires '2.0' and is short '1.0'. project has a quot
a of '69.0' with '1.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=kubeflow-ci-deployment.\",\"statu
s\":\"PERMISSION_DENIED\",\"statusMessage\":\"Forbidden\",\"requestPath\":\"https://container.googleapis.com/v1beta1/projects/kubeflow-ci-deployment/locations/us-
central1-a/clusters\",\"httpMethod\":\"POST\"}} ForceSendFields:[] NullFields:[]}" filename="gcp/gcp.go:388"
INFO     root:util.py:72 Error: failed to apply:  (kubeflow.error): Code 500 with message: coordinator Apply failed for gcp:  (kubeflow.error): Code 400 with mess
age: gcp apply could not update deployment manager: could not update deployment manager entries; Creating kfctl-e1e9 error(400): BAD REQUEST
INFO     root:util.py:72 Usage:

@yuzisun
Copy link
Member Author

yuzisun commented May 8, 2020

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Protect KFServing endpoint
8 participants