Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] When calling Opensearch Bulk API, host and port are called in a non-combined form. #824

Open
fast-coding opened this issue Sep 20, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@fast-coding
Copy link

fast-coding commented Sep 20, 2024

What is the bug?

As stated in the official document, errors occur in host and port when creating the Client.
https://opensearch.org/docs/latest/clients/python-low-level/

client = OpenSearch(
    hosts = [{'host': host, 'port': 443}],

How can one reproduce the bug?

import boto3
from requests_aws4auth import AWS4Auth
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth

host = '<opensearch_domain>/_bulk'
region = 'ap-northeast-2'  
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWSV4SignerAuth(credentials, region, service)

index = 'movies'
datatype = '_doc'

client = OpenSearch(
    # hosts=[{"host": host, "port": 9200}],
    hosts=[host+"9200"],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class = RequestsHttpConnection,
    pool_maxsize = 20
)
movies = '{ "index" : { "_index" : "movies", "_id" : "2" } } \n { "title" : "Interstellar", "director" : "Christopher Nolan", "year" : "2014"} \n { "create" : { "_index" : "movies", "_id" : "3" } } \n { "title" : "Star Trek Beyond", "director" : "Justin Lin", "year" : "2015"} \n { "update" : {"_id" : "3", "_index" : "movies" } } \n { "doc" : {"year" : "2016"} }'

client.bulk(movies)

Error

requests.exceptions.InvalidURL: Failed to parse: https://[<opensearch_domain>/_bulk]:9200/_bulk

What is the expected behavior?

I re-set the host in the following format and it works normally.

client = OpenSearch(
    hosts=[host+"9200"],
    http_auth=awsauth,

Do you have any additional context?

In summary, it seems that when i create the Opensearch client, i need to create the host and port as a string and then put it in the list.
However, the official document says that i need to save it in dictionary form in the list. Please check this part to fix the bug.

@fast-coding fast-coding added bug Something isn't working untriaged Need triage labels Sep 20, 2024
@dblock
Copy link
Member

dblock commented Sep 23, 2024

I believe host should just be the host, not <opensearch_domain>/_bulk, aka just <opensearch_domain>. I checked the docs but I am not seeing anything that has implied otherwise? Help me find what needs changing? Or contribute to https:/opensearch-project/documentation-website directly?

@dblock dblock removed the untriaged Need triage label Sep 23, 2024
@fast-coding
Copy link
Author

Yes, thank you for your reply. As you said, it seems that you can subtract _bulk from the host. However, the code below needs to be modified.

An error occurs when executing the code below.

hosts=[{"host": host, "port": 9200}],
Failed to parse: https://[https://.....ap-northeast-2.es.amazonaws.com/]:9200/_bulk

If you put hosts in the list as a string, it works normally.

hosts=[host+"9200"],

@dblock
Copy link
Member

dblock commented Sep 24, 2024

I think this is by design.

An error occurs when executing the code below.
hosts=[{"host": host, "port": 9200}],

this produces

hosts = [{"host":"https://.....ap-northeast-2.es.amazonaws.com/_bulk", "port":9200}]

which is incorrect, this is not a host, this is a URL. The error is expected.

If you put hosts in the list as a string, it works normally.
hosts=[host+"9200"],

produces

hosts=["https://.....ap-northeast-2.es.amazonaws.com/_bulk9200"]

There's code in the client that allows you to specify a URL in hosts that contain both a host and a port. This translates to
host = 'ap-northeast-2.es.amazonaws.com' and port = 443 (not 9200, note the missing : as it adds it to _bulk9200). The path is just dropped when the URL is parsed.

Is there still a scenario that doesn't behave as you'd expect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants