[BUG] too many (unnecessary) requests to the metadata URL #1842

pc-jedi · 2022-05-17T14:47:27Z

What is the bug?
Whenever there is a write requests (POST, PUT, DELETE) towards the /_plugins/_security/ API each OpenSearch node is requesting a fresh copy of SAML metadata from the IdP.
No matter if it is a role, tenant, user, etc. that gets created/updated/deleted.

Since we have a lot of OpenSearch clusters and we manage the internal user, roles, tenants, etc. for them, there are thousand of requests towards the IdPs metadata.

E.g. if we rotate the passwords for 100 technical user in a cluster with 15 nodes, then 1500 requests get fired towards the same metadata URL.

How can one reproduce the bug?

Use the SAML example setup from here
Bump the OpenSearch version to 1.3.1 and run docker-compose up -d.
Run docker logs -f SAML-IdP to only see the logs of the IdP.
Execute a request agains the /_plugins/_security/ API.
E.g. create a new internal users /_plugins/_security/api/internalusers/foobar

Check the IdP logs and you will see something like that:

localhost:80 172.19.0.5 - - [17/May/2022:14:26:50 +0000] "GET /simplesaml/saml2/idp/metadata.php HTTP/1.1" 200 1737 "-" "Apache-HttpClient/4.5.13 (Java/11.0.14.1)"
localhost:80 172.19.0.4 - - [17/May/2022:14:26:50 +0000] "GET /simplesaml/saml2/idp/metadata.php HTTP/1.1" 200 1737 "-" "Apache-HttpClient/4.5.13 (Java/11.0.14.1)"

For each OpenSearch node a single requests gets fired.
Try it with other requests on the /_plugins/_security/ API.

What is the expected behavior?
A single requests towards the metadata URL, with a cache of 60 minute (?!). Only be invalidated by an update of the metadata URL or update of the IdP config (entity_id, etc) (?!)

Some thoughts
I'm not sure if there has to be a request towards the metadata if a role or rolemapping gets updated. Is this really necessary?
Can the metadata be stored in the cluster somehow, so not every node has to fetch the metadata on its own?
Can we have a proper cache?

RC
I suspect that the "clear cache" method is called, which might invalidate also the metadata cache and therefore causes the requests.

What is your host/environment?

OS: k8s with OpenSearch images
Version: 1.3.1
Plugins: The default ones that are in the docker images.

The text was updated successfully, but these errors were encountered:

pc-jedi · 2022-05-18T13:27:09Z

@peternied I'm very sorry to just ping you here. But this bug is bothering us quite a lot.

peternied · 2022-05-18T14:51:18Z

Thanks for reaching out, @opensearch-project/security Could we have someone look into this?

DarshitChanpura · 2022-05-20T19:39:05Z

[Triage] Thank you for filing this. We'll get to this as soon as we can. In the meantime, if we can get as much information as we can it would be very helpful.

pc-jedi · 2022-05-23T06:42:12Z

In the meantime, if we can get as much information as we can it would be very helpful.

Is there any particular information you would like to see?

Meanwhile I will test the issue with RC of 2.0.

pc-jedi · 2022-05-24T16:24:04Z

After some intensive testing with 2.0 RC I can confirm it has the same behaviour.

Whenever you do one of the following requests it will request the metadata again.

PUT    /_plugins/_security/api/internalusers/<user>
DELETE /_plugins/_security/api/cache
PUT    /_plugins/_security/api/roles/<role>

peternied · 2022-05-27T15:57:44Z

@pc-jedi Thank you for your patience while we looked into this. I have confirmed this behavior, the SAML metadata endpoint will be refreshed on any configuration change, easiest confirmation is a cache flush, docs.

This is the design of the security configuration index, upon any change, all nodes refresh and rebuild configuration objects associated with the security configuration. While a more nuanced refresh pattern, this ensures the configuration is always verified. During the SAML initialization process a new http client is created and then metadata is queried, there is no persistent cache and the request is issued.

I briefly looked into how this could be added, but am concerned about our lack of testing in this space where over-caching would be a considerable risk. While 1,500 requests is no small number, could you help me understand how this is impacting cluster stability or the saml services stability?

pc-jedi · 2022-06-03T08:30:02Z

I saw that the SamlHTTPMetadataResolver is using the set cache functions iif the extended HTTPMetadataResolver.

Nevertheless, we have 500-5000 clusters in our company, most of them using the internal IdP.
Since IdPs usually have something like an intrusion detection in place, all alarms go off if there are around 100.000 requests per second towards the IdP.

Wouldn't it be possible to only invalidate/request the metadata if the metadata URL changes?
Because I see absolute no reason to request the metadata again for all nodes if there is only a new role introduced.

Besides that we provision new internal tech users on those clusters quite a lot. Thats why the number of changes on the opendistro-security index is high.

Different question, is that considered a "normal" use-case if there is a high frequency of updates to the opendistro-security index? Will this impact performance a lot? How does the "old versions" of that document get cleaned up, because I think you always UPDATE the document, which will leave old version in the file structure, right?

peternied · 2022-06-07T22:03:57Z

Wouldn't it be possible to only invalidate/request the metadata if the metadata URL changes?

Unfortunately, the architecture of the security configuration, on update a full recreation of all configuration derived objects is performed. This is an issue because the internal idP shares the same configuration as the security config. Believe me, I would very much like to fix this problem, but its a large body of work.

Different question, is that considered a "normal" use-case if there is a high frequency of updates to the opendistro-security index? Will this impact performance a lot? How does the "old versions" of that document get cleaned up, because I think you always UPDATE the document, which will leave old version in the file structure, right?

@cliu123 do you have context around the scale capabilities for the internal idP?

pc-jedi · 2022-06-29T14:37:07Z

Any news here?

We mitigated the issue by having a SAML metadata cache as a separate component in palace. We did this by rewriting the SAML metadata URL so it points to our cache, which is then responding to the OpenSearch nodes.
That is by far not perfect.

Besides changing the SAML metadata URL, we still see some OpenSearch Nodes still trying to connect to the previous URL.
We could not find a clear indication when this happens.
Are only clusters effected that were migrated/upgraded from OpenDistro or does it only happen if the clusters had lots of changes on the .opendistro-security index?

peternied · 2022-06-29T17:37:47Z

No news on this front, at this point I am inclined to close out this issue by design. I know this isn't the desired outcome, but it is a reality of the current implementation. I am open to examine proposals/pull requests to resolve these fundamental issues.

pc-jedi added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized labels May 17, 2022

DarshitChanpura removed the untriaged Require the attention of the repository maintainers and may need to be prioritized label May 20, 2022

peternied added the help wanted Community contributions are especially encouraged for these issues. label May 27, 2022

peternied closed this as not planned Won't fix, can't repro, duplicate, stale Jun 29, 2022

peternied mentioned this issue Feb 17, 2023

[FEATURE] Decouple security configuration from user data storage #2443

Open

stephen-crawford mentioned this issue Jul 7, 2023

[Meta] Validate plugins compatibility with segment replication opensearch-project/OpenSearch#8211

Closed

37 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] too many (unnecessary) requests to the metadata URL #1842

[BUG] too many (unnecessary) requests to the metadata URL #1842

pc-jedi commented May 17, 2022 •

edited

Loading

pc-jedi commented May 18, 2022

peternied commented May 18, 2022

DarshitChanpura commented May 20, 2022

pc-jedi commented May 23, 2022

pc-jedi commented May 24, 2022

peternied commented May 27, 2022

pc-jedi commented Jun 3, 2022

peternied commented Jun 7, 2022

pc-jedi commented Jun 29, 2022

peternied commented Jun 29, 2022

[BUG] too many (unnecessary) requests to the metadata URL #1842

[BUG] too many (unnecessary) requests to the metadata URL #1842

Comments

pc-jedi commented May 17, 2022 • edited Loading

pc-jedi commented May 18, 2022

peternied commented May 18, 2022

DarshitChanpura commented May 20, 2022

pc-jedi commented May 23, 2022

pc-jedi commented May 24, 2022

peternied commented May 27, 2022

pc-jedi commented Jun 3, 2022

peternied commented Jun 7, 2022

pc-jedi commented Jun 29, 2022

peternied commented Jun 29, 2022

pc-jedi commented May 17, 2022 •

edited

Loading