Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add caching headers for posts #786

Open
igalic opened this issue Jun 19, 2020 · 13 comments
Open

add caching headers for posts #786

igalic opened this issue Jun 19, 2020 · 13 comments

Comments

@igalic
Copy link
Contributor

igalic commented Jun 19, 2020

until we've switched to using the async version of Rocket, we'll have serious performance issues.

Most installations of Plume are already behind a (caching) proxy server. we can use this to fix the current performance issues by sending caching headers for posts pages:

How?

the posts page should send an ETag with each response.
The advantage here is that we can have updates, and comments, since this means that we send new ETag header.
to spare a lookup in the database, we could store data about whether the post's cache is invalidated within rocket (that would be a 304 response to a HEAD request)

n.b.: The (caching) proxy server admin could adjust the time themselves, we wouldn't be sending any ourselves.

Alternatives

are currently a lot more effort:

and there's also: rwf2/Rocket#1343
which is similar to #777, but on rocket level

those aren't really alternatives, since this suggestion is operating at HTTP level, it's more orthogonal.

@elegaanz
Copy link
Member

elegaanz commented Jun 19, 2020

We already have some kind of caching for static assets : https:/Plume-org/Plume/blob/master/src/routes/mod.rs#L207-L270

But adding it to templates and/or federation pages could improve performances too.

@igalic
Copy link
Contributor Author

igalic commented Jun 19, 2020

does rocket derive #[head()] for these routes itself?

@elegaanz
Copy link
Member

Yes

@igalic
Copy link
Contributor Author

igalic commented Jun 19, 2020

if we need a DB lookup to construct a body which might still be in a downstream cache, then we should define HEAD ourselves

@elegaanz
Copy link
Member

Maybe we can have a Vec<String> storing the slugs of the posts that need an update in Rocket's state, and only show an updated version when needed.

Another solution would be to render each post to a given HTML or AP file when the post or comments change, and serve this files statically, using the modification date to generate the cache headers.

@trinity-1686a
Copy link
Contributor

trinity-1686a commented Jun 21, 2020

There is already caching for any templated page, with the following limitations :

  • reply that page is cached only for GET request (should probably be HEAD too actually)
  • page must not contains form (due to csrf tokens, but actually this should be removed due to next point)
  • it actually does db lookup and generate the page, then hash it, so it saves only bandwidth, not cpu/io

This should be removed when more intelligent caching get implemented

Implementation is in src/template_utils.rs

@igalic
Copy link
Contributor Author

igalic commented Jun 21, 2020

can you propose a patch out of this situation?

and do you believe it would help us with our current performance issues, before we switch to async rocket?

@trinity-1686a
Copy link
Contributor

trinity-1686a commented Jun 21, 2020

I'll look into it if I got some time.

Honestly I have no idea, I'm not operating any instances so I don't have neither logs nor a flamegraph of an instance under non-synthetic load, both of which would help with a diagnosis. If someone have a day or a week worth of logs from fediverse.blog that would be interesting

I just noticed the Subscribe button on any post is actually a form, so it is never cached by clients 😐
I also noticed that for some reason, caching, for static or templated resources, seems broken, so it might be a good first step to find why and how much it helps
So it's broken on most public servers I found, but not when running a local instance, without reverse proxy or with Caddy. It might be related to this discussion

@igalic
Copy link
Contributor Author

igalic commented Jun 21, 2020

https://mailman.nginx.org/pipermail/nginx/2013-September/040425.html

The ETag header is removed if nginx changes a response returned.
That is, if you have gzip, gunzip, sub, addition, ssi or xslt
filters applied to responses returned.

so… instead of adding it's own ETag for such filters, nginx just drops them?

n.b.: Apache HTTPd does not do that.

Subscribe button on any post is actually a form,

this is bad, probably. let's open a bug for that.

@trinity-1686a
Copy link
Contributor

Apparently it does not really remove etags, but change them from strong to weak (see rfc for the exact meaning), which means it matching for "tag" or W/"tag" should maybe work.
See patch if someone want to test

patch
From 9916d1692ed8ef12fd1d3ee941ed5796345a2b97 Mon Sep 17 00:00:00 2001
From: Trinity Pointard <[email protected]>
Date: Sun, 21 Jun 2020 21:17:33 +0200
Subject: [PATCH] try fixing cache with nginx

---
 src/template_utils.rs | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/template_utils.rs b/src/template_utils.rs
index 26bf58b..5af9f76 100644
--- a/src/template_utils.rs
+++ b/src/template_utils.rs
@@ -63,7 +63,10 @@ impl<'r> Responder<'r> for Ructe {
         let etag = format!("{:x}", hasher.finish());
         if r.headers()
             .get("If-None-Match")
-            .any(|s| s[1..s.len() - 1] == etag)
+            .any(|s| {
+                s[1..s.len() - 1] == etag ||
+                s[3..s.len() - 1] == etag
+            })
         {
             Response::build()
                 .status(Status::NotModified)
-- 
2.27.0

@elegaanz
Copy link
Member

I only have the last six hours of the access logs of fediverse.blog, but it case it helps, I can send them to you (and I can configure nginx to keep them longer).

Also, do you want me to deploy the above patch on fediverse.blog to see if it helps?

@trinity-1686a
Copy link
Contributor

if it's only 6 hours, could you take them around 8-10pm cest, so that there is afternoon and evening? That's when I guess there is the most load.

You can deploy it, it won't break anything for sure, but I don't know if it'll help, it should only reduce bandwidth usage

@elegaanz
Copy link
Member

I'll give you the logs this evening then. I'll deploy the patch after that, to see if there is a difference. Thank you for your help!

elegaanz added a commit that referenced this issue Jun 25, 2020
elegaanz added a commit that referenced this issue Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants