Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to execute POST request with JSON payload #560

Open
Mantisus opened this issue Oct 2, 2024 · 2 comments · May be fixed by #542
Open

Unable to execute POST request with JSON payload #560

Mantisus opened this issue Oct 2, 2024 · 2 comments · May be fixed by #542
Assignees
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@Mantisus
Copy link
Contributor

Mantisus commented Oct 2, 2024

Example

async def main() -> None:
    crawler = HttpCrawler()

    # Define the default request handler, which will be called for every request.
    @crawler.router.default_handler
    async def request_handler(context: HttpCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url} ...')
        response = context.http_response.read().decode('utf-8')
        context.log.info(f'Response: {response}')  # To see the response in the logs.

    # Prepare a POST request to the form endpoint.
    request = Request.from_url(
        url='https://httpbin.org/post',
        method='POST',
        headers = {"content-type": "application/json"},
        data={
            'custname': 'John Doe',
            'custtel': '1234567890',
            'custemail': '[email protected]',
            'size': 'large',
            'topping': ['bacon', 'cheese', 'mushroom'],
            'delivery': '13:00',
            'comments': 'Please ring the doorbell upon arrival.',
        },
    )
    await crawler.run([request])

Current response format

{
  "args": {}, 
  "data": "custname=John+Doe&custtel=1234567890&custemail=johndoe%40example.com&size=large&topping=bacon&topping=cheese&topping=mushroom&delivery=13%3A00&comments=Please+ring+the+doorbell+upon+arrival.", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US,en;q=0.9", 
    "Content-Length": "190", 
    "Content-Type": "application/json", 
    "Host": "httpbin.org", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-66fc90cf-11644d4f5483e1096211b721"
  }, 
  "json": null, 
  "origin": "91.240.96.149", 
  "url": "https://httpbin.org/post"
}

Expected response format

{
  "args": {}, 
  "data": "{\"custname\": \"John Doe\", \"custtel\": \"1234567890\", \"custemail\": \"[email protected]\", \"size\": \"large\", \"topping\": [\"bacon\", \"cheese\", \"mushroom\"], \"delivery\": \"13:00\", \"comments\": \"Please ring the doorbell upon arrival.\"}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Content-Length": "221", 
    "Content-Type": "application/json", 
    "Host": "httpbin.org", 
    "User-Agent": "python-httpx/0.27.2", 
    "X-Amzn-Trace-Id": "Root=1-66fc91c2-6db9989347fef25b150615e2"
  }, 
  "json": {
    "comments": "Please ring the doorbell upon arrival.", 
    "custemail": "[email protected]", 
    "custname": "John Doe", 
    "custtel": "1234567890", 
    "delivery": "13:00", 
    "size": "large", 
    "topping": [
      "bacon", 
      "cheese", 
      "mushroom"
    ]
  }, 
  "origin": "91.240.96.149", 
  "url": "https://httpbin.org/post"
}

Both HTTPX and curl_impersonate allow creating a “POST” request with JSON payload, in two ways

data={
    'custname': 'John Doe',
    'custtel': '1234567890',
    'custemail': '[email protected]',
    'size': 'large',
    'topping': ['bacon', 'cheese', 'mushroom'],
    'delivery': '13:00',
    'comments': 'Please ring the doorbell upon arrival.',
    }

response = httpx.post(url, json=data)

response = httpx.post(url, data=json.dumps(data))

But we can't reproduce this behavior in Crawlee, because the json parameter is not passed when the request is created and the data parameter cannot be a string

@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Oct 2, 2024
@belloibrahv
Copy link

Hi @Mantisus,

I'm interested in working on this issue regarding the JSON payload handling in POST requests. I have a few questions to ensure I understand the requirements correctly:

  1. It seems the current behavior differs from HTTPX in how it handles the json parameter and string data. Are we aiming to make Crawlee's behavior match HTTPX exactly, or are there specific adjustments needed for the Crawlee context?

  2. For the implementation, should we:

    • Add support for both methods shown in the example (json=data and data=json.dumps(data))?
    • Focus on one specific approach?
    • Or is there a different preferred solution?
  3. Are there any specific test cases you'd like to see implemented alongside the fix?

I have some experience with Python HTTP clients and would be happy to contribute to this issue. Let me know if you need any clarification or have additional requirements before I start working on it.


References:

@vdusek
Copy link
Collaborator

vdusek commented Oct 2, 2024

Hi, thanks for reporting this. We already know about this, it is related to the #542.

@vdusek vdusek linked a pull request Oct 2, 2024 that will close this issue
1 task
@vdusek vdusek self-assigned this Oct 2, 2024
@vdusek vdusek added the bug Something isn't working. label Oct 2, 2024
@vdusek vdusek added this to the 99th sprint - Tooling team milestone Oct 2, 2024
vdusek added a commit that referenced this issue Oct 2, 2024
vdusek added a commit that referenced this issue Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants