Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sqlalchemy for postgres Unix sockets #761

Merged
merged 10 commits into from
Nov 11, 2021

Conversation

mattoberle
Copy link
Contributor

@mattoberle mattoberle commented Oct 19, 2021

Description

This PR addresses an issue that breaks opentelemetry-instrumentation-sqlalchemy when connecting to Postgres via Unix socket.

The following bit of replaced code contained a type inconsistency:

attrs[SpanAttributes.NET_PEER_PORT] = int(data.get("port"))
# data.get -> Optional[str]
# int(None)

When using postgresql via unix socket dsn looks something like this:

'user=postgres host=/tmp/socket dbname=postgres'

The parse_dsn function returns this:

{'user': 'postgres', 'dbname': 'postgres', 'host': '/tmp/socket'}

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

I applied the patch locally and then did this:

  1. Set a SQLAlchemy URL like this: postgresql+psycopg2://{user}:@/{dbname}?host={socket_path}.
  2. Apply the instrumentation to the engine.
  3. Execute a query through the ORM and commit.

Note: I'm happy to write a test for this as well, but there aren't currently tests against specific engines or private functions. I wasn't sure if that was by design.

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@mattoberle mattoberle requested a review from a team October 19, 2021 15:58
The following bit of replaced code contained a type inconsistency:

```py
attrs[SpanAttributes.NET_PEER_PORT] = int(data.get("port"))
```

`data.get` returns `Optional[str]` but `int(None)` throws a `TypeError`.

When using postgresql via unix socket `dsn` looks something like this:

```py
'user=postgres host=/tmp/socket dbname=postgres'
```

The `parse_dsn` function returns this:

```py
{'user': 'postgres', 'dbname': 'postgres', 'host': '/tmp/socket'}
```
@mattoberle mattoberle force-pushed the bugfix/sqlalchemy/unix-sockets branch from aa4b1bb to 610ee1e Compare October 19, 2021 16:10
attrs[SpanAttributes.NET_PEER_PORT] = int(data.get("port"))
# parse_dsn may omit port when connecting via unix socket
if data.get("port"):
attrs[SpanAttributes.NET_PEER_PORT] = int(data["port"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a unix socket path we can add different span attributes

attributes[SpanAttributes.NET_PEER_NAME] = conn_kwargs.get("path", "")
attributes[
SpanAttributes.NET_TRANSPORT
] = NetTransportValues.UNIX.value
.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the host attribute will be the path of the socket (which aligns with the way psql takes it as a parameter).
I am open to differentiating it though if NET_PEER_NAME is not an accurate representation in that context.

Copy link
Contributor Author

@mattoberle mattoberle Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like NET_PEER_NAME should be the path to the socket:

If net.transport is "unix" or "pipe", the absolute path to the file representing it should be used as net.peer.name (net.host.name doesn't make sense in that context).

If we are setting NET_PEER_TRANSPORT anywhere it should be unix.
With the values we are setting now I think this should be okay, unless I'm missing a spot where other net.* values get set.

Edit: Sorry, misread! I think we are good on NET_PEER_NAME but you are right, the transport could be set as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Let's add NET_PEER_TRANSPORT conditionally when connecting via unix socket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is absence of port guaranteed to be a unix socket? could it be something else?

Copy link
Contributor Author

@mattoberle mattoberle Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owais that's a great point.

It looks like the TypeError could apply to non-socket cases too.

from psycopg2.extensions import parse_dsn
from sqlalchemy import create_engine

engine = create_engine('postgresql://postgres@localhost/postgres')
conn = engine.raw_connection()
cur = conn.cursor()

cur.connection.dsn  # 'host=localhost user=postgres dbname=postgres
parse_dsn(cur.connection.dsn)  # {'user': 'postgres', 'dbname': 'postgres', 'host': 'localhost'}

However, there is cur.connection.info which provides:

cur.connection.info.port  # 5432
cur.connection.info.host  # 'localhost'

When connecting to a socket it provides:

cur.connection.info.port  # 5432
cur.connection.info.host  # '/var/run/postgresql/'

psycopg2 makes the assumption that every socket is named .s.PGSQL.{port}, meaning it only stores the absolute path of the socket directory. Mirroring that assumption here seems risky.

The info object seems like the safer bet since it's always provides certain values.
If you initialize a SQLAlchemy engine with engine = create_engine('postgresql://') the .dsn attribute will be an empty string.

I think at the very least the presence of a filesystem path in info.host is a clue we are using sockets... but I'll dig around the psycopg2 source a bit to see if I can find any better guarantees.

Edit: That being said, it seems like there is a bug present now that people will encounter if they aren't setting an explicit port (ie. relying on the default). The other function that sets attributes simply omits attributes that aren't present. Worth splitting the work into a bugfix and something that improves the attributes?

Edit 2: Mirroring the .s.PGSQL.{port} convention to construct the absolute path seems fine actually, that's a convention enforced by PostgreSQL itself. And since socket support isn't available on Windows a simple check for '/' at the start of the host should be enough to determine whether we are using a socket or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Checking if the host starts with unix://, unix:/// or / makes more sense to me than checking for absence of port.

Copy link
Contributor Author

@mattoberle mattoberle Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed up a change that implements the NET_TRANSPORT logic for sockets, although I found there is unreachable code.

I wrote a quick test locally.
The _get_attributes_from_cursor function is only called if host is not in the sqlalchemy URL.
This is the case for all unix socket connections (for postgres).

On the other hand, host is always present when connecting via TCP (it's a requirement, the default if omitted is unix socket).
The code that derives attributes from the URL depends on explicit args, so TCP connections will never get their net.transport (and won't get things like net.peer.port if using defaults).

We'd have to inspect the cursor for every vendor to get any information not inside the URL.
In the commit I just pushed up we are doing that for postgresql, but I feel like with this much divergent logic the test suite would basically need access to every supported SQLAlchemy engine.

Revisiting the original scope of this PR:

There is a bug when using sockets for postgres specifically because of unreachable (in tests) vendor-specific code.
I think the initial commit b03b44f addresses the bug without losing anything useful / introducing anything dangerous-- maybe enhancing the attribute recognition is better suited for a bigger PR?

    def test_netransport_attributes_from_cursor_unix_minimal_args(self):
        engine = create_engine("postgresql://")
        ...
        self.assertEqual(spans[0].attributes['net.peer.name'], '/var/run/postgresql/.s.PGSQL.5432')
        self.assertEqual(spans[0].attributes['net.transport'], 'unix')

    def test_netransport_attributes_from_cursor_unix_explicit_args(self):
        engine = create_engine("postgresql://postgres@:5432/postgres?host=/var/run/postgresql")
        ...
        self.assertEqual(spans[0].attributes['net.peer.name'], '/var/run/postgresql/.s.PGSQL.5432')
        self.assertEqual(spans[0].attributes['net.transport'], 'unix')

    def test_netransport_attributes_from_cursor_tcp_minimal_args(self):
        engine = create_engine("postgresql://postgres:postgres@localhost")
        ...
        self.assertEqual(spans[0].attributes['net.peer.name'], 'localhost')
        # self.assertEqual(spans[0].attributes['net.transport'], 'tcp')
        # self.assertEqual(spans[0].attributes['net.peer.port'], 5432)

    def test_netransport_attributes_from_cursor_tcp_explicit_args(self):
        engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
        ...
        self.assertEqual(spans[0].attributes['net.peer.name'], 'localhost')
        # self.assertEqual(spans[0].attributes['net.transport'], 'tcp')
        self.assertEqual(spans[0].attributes['net.peer.port'], 5432)

@mattoberle mattoberle force-pushed the bugfix/sqlalchemy/unix-sockets branch from aa6f951 to c4e9be1 Compare October 20, 2021 15:26
@owais owais enabled auto-merge (squash) November 11, 2021 11:43
@owais owais merged commit 10d8e26 into open-telemetry:main Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants