Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked uploads do not deduplicate the Host header field when provided by the user #3637

Closed
persiaAziz-zz opened this issue Oct 25, 2016 · 10 comments

Comments

@persiaAziz-zz
Copy link

resquests.Session() adds extra Host field to the header dictionary passed as argument even if that field is already specified in the dictionary. This happens when transfer-encoding is specified in the header dictionary. Having duplicate host field in the header confuses apache traffic server.

@Lukasa
Copy link
Member

Lukasa commented Oct 25, 2016

Requests should not send an extra Host header field. I can't reproduce this bug at all. Can you provide reproduction code?

@persiaAziz-zz
Copy link
Author

persiaAziz-zz commented Oct 25, 2016

import gevent
import socket
import requests
import os
from threading import Thread
import sys
bSTOP = False
def handleResponse(response,*args, **kwargs):
    print(response.status_code)

def gen():
    yield 'pforpersia,champaignurbana'.encode('utf-8')
    yield 'there'.encode('utf-8')

def txn_replay():
    try:
        request_session = requests.Session()
        hostname = "127.0.0.1"
        port = "8080"
        request_session.proxies =  {"http": "http://{0}:{1}".format(hostname, port)}
        hdr = {'Host':'www.blabla.com','content-type': 'application/json',
, 'Content-MD5':'5f4308e950ab4d7188e96ddf740855ec', 'Transfer-Encoding':'Chunked'}
        body=gen()
        response = request_session.get('http://blabla.com/blabla', headers=hdr, stream=True, data=body)

    except UnicodeEncodeError as e:
        print("UnicodeEncodeError exception")

    except requests.exceptions.ContentDecodingError as e:
        print("ContentDecodingError",e)
    except:
        e=sys.exc_info()
        print("ERROR in requests: ",e)

def main():
    txn_replay()

if __name__ == '__main__':
    main()

@persiaAziz-zz
Copy link
Author

persiaAziz-zz commented Oct 25, 2016

Wireshark:

`1a
pforpersia,champaignurbana
5
there
GET http://blabla.com/blabla HTTP/1.1
Host: blabla.com
Accept: /
Accept-Encoding: gzip, deflate
Connection: keep-alive
content-type: application/json
Transfer-Encoding: chunked
Content-MD5: 5f4308e950ab4d7188e96ddf740855ec
Host: www.blabla.com

1a
pforpersia,champaignurbana
5
there
0

HTTP/1.1 400 Invalid HTTP Request
Date: Tue, 25 Oct 2016 20:25:16 GMT
Connection: keep-alive
Server: ATS/7.1.0
Cache-Control: no-store
Content-Type: text/html
Content-Language: en
Content-Length: 220`

@persiaAziz-zz
Copy link
Author

There are two host fields in the request block now.

@Lukasa
Copy link
Member

Lukasa commented Oct 25, 2016

Some notes on your code:

  • Please don't provide the Host header, especially as you're just providing the header Requests would set.
  • Doubly, don't provide the Transfer-Encoding header under any circumstances. Requests will not obey it, so it's better to let Requests set its own.
  • Where is that User-Agent coming from?

I continue to be unable to reproduce this locally. Can you tell me more about what your environment is? Requests version, Python version, OS.

@persiaAziz-zz
Copy link
Author

persiaAziz-zz commented Oct 25, 2016

Please ignore the user agent field. I am gathering data from apache traffic server log. The logged requests have all those fields and I am replaying those requests using the python requests library. I have requests 2.10.0 installed. I am running on Ubuntu16.04 with python3.5

@Lukasa
Copy link
Member

Lukasa commented Oct 26, 2016

So, we should stop for a moment.

If your goal is to replay a log as accurately as possible, Requests is a bad choice for you. Requests will try to do a lot of things to be helpful, and all of those things have the potential to change the framing of the request. In particular, you cannot just set the Transfer-Encoding field and expect Requests to obey you: that's not how Requests works.

Regardless, there is a bug here: it's to do with how Requests sends generators. Right now requests has its own code for doing chunked uploads, and that code is clearly not hitting the "deduplicate Host header" path that it should. Probably this means we should start using request_chunked from urllib3, except that also doesn't strip the Host header. So, two bugs really.

@Lukasa Lukasa changed the title Host field problem Chunked uploads do not deduplicate the Host header field when provided by the user Oct 26, 2016
@Lukasa
Copy link
Member

Lukasa commented Oct 26, 2016

The urllib3 issue is tracked in urllib3/urllib3#1009.

@nateprewitt
Copy link
Member

@Lukasa, it looks like this was wrapped up in urllib3/urllib3#1018. Are there any outstanding pieces left here?

@Lukasa
Copy link
Member

Lukasa commented May 1, 2017

I don't think so!

@Lukasa Lukasa closed this as completed May 1, 2017
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants