Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow bypassing proxy for some domains with ProxyAllImages #4874

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sunaurus
Copy link
Collaborator

@sunaurus sunaurus commented Jun 28, 2024

This PR introduces a new settings parameter, proxy_bypass_domains, which allows admins to configure some domains for which image proxying will be skipped.

The skipping logic is within the image_proxy/ endpoint itself, so URLs do not have to change whenever the bypass list is modified by admins. If any domain is on this bypass list, then instead of trying to proxy the image through pict-rs, Lemmy will just issue a HTTP redirect to the source image.

There are a couple of reasons such configuration can be useful. For example, imgur will very quickly start blocking all requests from Lemmy when running with ProxyAllImages, due do its rate limits. Additionally, a few Lemmy instances provide their own external image hosting service, this setting will allow them to not duplicate the same images in two places.

.pictrs_config()?
.proxy_bypass_domains
.iter()
.any(|s| url.domain().expect("Invalid URL").starts_with(s))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panicing for non-domain hosts is unnecessary

Suggested change
.any(|s| url.domain().expect("Invalid URL").starts_with(s))
.any(|s| url.domain().is_some_and(|domain| domain.starts_with(s)))

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've moved a lot of settings like this to the DB, so that they can be live updated in UIs. Then a lemmy restart wouldn't be required if you want to add one later.

But that can always be done later, and this is good for now.

.iter()
.any(|s| url.domain().expect("Invalid URL").starts_with(s))
{
return Ok(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just do Ok(....)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return is required here because the if statement is not the last expression

@Nutomic
Copy link
Member

Nutomic commented Jul 2, 2024

I would consider bypassing some domains by default, such as imgur you mentioned. Because most admins wont know which ones can break. Though it should still be possible to remove the default values. If thats too complicated, its also good to list problematic domains somewhere in documentation.

@dullbananas
Copy link
Collaborator

There should be a change to how requests for proxying are done:

  • Retry requests when a rate limit duration is returned
  • In a horizontally scaled setup, if the current process is being affected by a rate limit that will end later than for another process that has a different IP address, then make that other process do the request
  • Maybe allow a rate limiting domain to temporarily bypass the proxy if it has enough reputation, which can be determined by how many existing posts link to the domain
    • Only count local posts, since other instances could easily exceed the limit if they are malicious or have a weaker post creation rate limit
    • I think it should be allowed for domains that are used in at least 100 posts and at least 5% of all posts (the 100 posts requirement makes a difference if there's less than 2000 posts)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants