Got rid of redundant JOINs/WHEREs of WhereInWalker #10917

SerheyDolgushev · 2023-08-23T16:46:49Z

Doctrine\ORM\Tools\Pagination\Paginator is an amazing tool that lets you fetch entities in chunks. And it sends two separate SQL queries:

The first query to get unique entity IDs
The second query is fetching the actual entity data. This query uses the result of the previous one as an additional filter.
For simplicity let's call them id and data queries.

To demonstrate it, lets use CmsUser and CmsAddress entities:

...
$query = new Query($this->em);
$query->setDQL('SELECT u
    FROM Doctrine\\Tests\\Models\\CMS\\CmsUser u
    JOIN Doctrine\\Tests\\Models\\CMS\\CmsAddress a
    WHERE a.city = :filterCity');
$query->setParameters(['filterCity' => 'London']);
$query->setMaxResults(1);
$paginator = (new Paginator($query, true))->setUseOutputWalkers(false);
$paginator->getIterator();

The code above will trigger two SQL queries:
id

SELECT DISTINCT c0_.id AS id_0
FROM cms_users c0_
INNER JOIN cms_addresses c1_
WHERE c1_.city = ? LIMIT 1

data

SELECT
    c0_.id AS id_0,
    c0_.status AS status_1,
    c0_.username AS username_2,
    c0_.name AS name_3,
    c0_.email_id AS email_id_4
FROM cms_users c0_
INNER JOIN cms_addresses c1_
WHERE 
    c1_.city = ?
    AND c0_.id IN (?)

The problem is that the data query has:

unused join: INNER JOIN cms_addresses c1_
redundant where condition c1_.city = ?
This PR tries to optimize the data query.

Basically, this PR does the following:

Removes all WHERE conditions for data query, except the condition that comes from the id query results. Please note, that this change is not applied to the queries with subselect. Because in this case subselect part might use some params, so we can not skip the original params in the data query. It should be possible to optimize subselect cases in the future, but it would require a bit more sophisticated solution: for data query remove only original params that are not used in the subquery (instead of skipping all).
All JOINs are skipped for the data query if not any join is used in SELECT, GROUP BY, or ORDER BY clauses. It is done only if the HAVING clause is not specified (simplicity is the main reason, and it can be optimized in the future).

I want to clarify that the EasyAdmin createIndexQueryBuilder triggered this PR, and all the index actions in EasyAdmin should benefit from it.

derrabus · 2023-08-23T21:49:38Z

Please explain what problem exactly you're attempting to solve with your change. And please have a look at the failing tests.

SerheyDolgushev · 2023-08-24T13:00:03Z

@derrabus please check the updated PR description, and also I pushed some code changes that fix coding standards/test and added testRedunandQueryPartsAreRemovedForWhereInWalker test which demonstrates this PR changes.

cjunge · 2023-09-11T21:35:36Z

@SerheyDolgushev do you have any benchmarks to show the performance improvement from this PR? I'm interested to see if there is any real-world benefit besides tweaking the generated query.
My concern is that it's introducing more complexity to the query building without any/much gain in performance.

tests/Doctrine/Tests/ORM/Tools/Pagination/PaginatorTest.php

SerheyDolgushev · 2023-09-13T13:44:17Z

@SerheyDolgushev do you have any benchmarks to show the performance improvement from this PR? I'm interested to see if there is any real-world benefit besides tweaking the generated query. My concern is that it's introducing more complexity to the query building without any/much gain in performance.

@cjunge I see your concerns, but performance improvement depends on the actual query. The more joins and conditions the original query has, the more its performance will improve with this change.

I just used a simple user/addresses example to illustrate my point:

CREATE TABLE cms_users (
  id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL,
  name VARCHAR(32) NOT NULL,
  PRIMARY KEY(id)
);

CREATE TABLE cms_addresses (
  id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL,
  user_id BIGINT UNSIGNED NOT NULL,
  country CHAR(2) NOT NULL,
  city VARCHAR(32) NOT NULL,
  street VARCHAR(32) NOT NULL,
  PRIMARY KEY(id)
);
ALTER TABLE cms_addresses ADD CONSTRAINT FK_user FOREIGN KEY (user_id) REFERENCES cms_users (id);
	
INSERT INTO cms_users (`name`)
VALUES ('John'), ('Nicole'), ('David');

INSERT INTO cms_addresses (`user_id`, `country`, `city`, `street`)
VALUES
	(1, 'UK', 'London', 'Street #1'),
	(1, 'US', 'Tampa', 'Street #2'),
	(2, 'FR', 'Paris', 'Street #3'),
	(3, 'US', 'Miami', 'Street #4')
;

And let's assume that in the original query, you are getting users with US addresses. In this case, the first query to get unique entity IDs will be:

mysql> SELECT DISTINCT c0_.id as id_0
    -> FROM cms_users c0_
    -> INNER JOIN cms_addresses c1_ ON c1_.user_id = c0_.id
    -> WHERE c1_.country = 'US';
+------+
| id_0 |
+------+
|    1 |
|    3 |
+------+
2 rows in set (0.00 sec)

And without this PR changes the second query to fetch actual users data will be:

SELECT
    c0_.id AS id_0,
    c0_.name AS name_1
FROM cms_users c0_
INNER JOIN cms_addresses c1_ ON c1_.user_id = c0_.id
WHERE 
    c1_.country = 'US'
    AND c0_.id IN (1, 3);

But with this PR changes the second query to fetch actual users data will be:

SELECT
    c0_.id AS id_0,
    c0_.name AS name_1
FROM cms_users c0_
WHERE c0_.id IN (1, 3);

The difference between these two queries is:

mysql> EXPLAIN SELECT
    ->     c0_.id AS id_0,
    ->     c0_.name AS name_1
    -> FROM cms_users c0_
    -> INNER JOIN cms_addresses c1_ ON c1_.user_id = c0_.id
    -> WHERE 
    ->     c1_.country = 'US'
    ->     AND c0_.id IN (1, 3);
+----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | c0_   | NULL       | range | PRIMARY       | PRIMARY | 8       | NULL        |    2 |   100.00 | Using where |
|  1 | SIMPLE      | c1_   | NULL       | ref   | FK_user       | FK_user | 8       | test.c0_.id |    1 |    25.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

VS

mysql> EXPLAIN SELECT
    ->     c0_.id AS id_0,
    ->     c0_.name AS name_1
    -> FROM cms_users c0_
    -> WHERE c0_.id IN (1, 3);
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | c0_   | NULL       | range | PRIMARY       | PRIMARY | 8       | NULL |    2 |   100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

This will become more noticeable the more joins/conditions/rows are used in the query.

Regarding introducing additional complexity with this PR, I tried to cover just the most used and basic cases to make this as simple as possible. Also, all the changes in the unit tests should give a good perspective on the context here.

Please let me know if there are any other questions you would like to discuss.

github-actions · 2024-10-19T03:06:02Z

There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days.
If you want to continue working on it, please leave a comment.

derrabus · 2024-10-19T10:09:33Z

@SerheyDolgushev Looks like this PR did not get much traction, which is probably due to its complexity and the nieche problem that it solves. We don't introduce this kind of improvement to ORM 2 anymore, so the PR would need to be reworked to ORM 3. Do you want to do that or shall we close the PR?

SerheyDolgushev · 2024-10-20T09:41:11Z

@derrabus to be honest I had to go through the PR description to refresh my memories about it. And seems like it is still useful change and brings value. So I merged it with 3.3.x and updated the target branch. Please let me know if there is anything else I can help with.

SerheyDolgushev force-pushed the fix/paginator-skip-redunand-joins branch from e5d9d71 to e60ba0f Compare August 23, 2023 16:49

Got rid of redundant JOINs/WHEREs of WhereInWalker

2d74ff0

SerheyDolgushev force-pushed the fix/paginator-skip-redunand-joins branch from e60ba0f to 2d74ff0 Compare August 23, 2023 16:51

derrabus added Failing Test Improvement labels Aug 23, 2023

derrabus changed the base branch from 2.16.x to 2.17.x August 23, 2023 21:46

derrabus changed the base branch from 2.17.x to 2.16.x August 23, 2023 21:46

derrabus changed the base branch from 2.16.x to 2.17.x August 23, 2023 21:48

SerheyDolgushev added 3 commits August 24, 2023 09:26

Fixed Coding Standards issues

2e6b5fa

Removed outdated phpstan errors

926c04d

Got rid of redundant JOINs/WHEREs of WhereInWalker

a99b11f

derrabus removed the Failing Test label Sep 11, 2023

derrabus reviewed Sep 12, 2023

View reviewed changes

tests/Doctrine/Tests/ORM/Tools/Pagination/PaginatorTest.php Outdated Show resolved Hide resolved

SerheyDolgushev added 2 commits September 13, 2023 14:46

Applied request review changes

54b3827

Applied request review changes doctrine#2

e928402

github-actions bot added the Stale label Oct 19, 2024

github-actions bot removed the Stale label Oct 20, 2024

Merge branch '3.3.x' into fix/paginator-skip-redunand-joins

d94992a

SerheyDolgushev changed the base branch from 2.17.x to 3.3.x October 20, 2024 09:27

Made phpcs happy

5743753

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got rid of redundant JOINs/WHEREs of WhereInWalker #10917

Got rid of redundant JOINs/WHEREs of WhereInWalker #10917

SerheyDolgushev commented Aug 23, 2023 •

edited

Loading

derrabus commented Aug 23, 2023

SerheyDolgushev commented Aug 24, 2023

cjunge commented Sep 11, 2023

SerheyDolgushev commented Sep 13, 2023 •

edited

Loading

github-actions bot commented Oct 19, 2024

derrabus commented Oct 19, 2024

SerheyDolgushev commented Oct 20, 2024

Got rid of redundant JOINs/WHEREs of WhereInWalker #10917

Are you sure you want to change the base?

Got rid of redundant JOINs/WHEREs of WhereInWalker #10917

Conversation

SerheyDolgushev commented Aug 23, 2023 • edited Loading

derrabus commented Aug 23, 2023

SerheyDolgushev commented Aug 24, 2023

cjunge commented Sep 11, 2023

SerheyDolgushev commented Sep 13, 2023 • edited Loading

github-actions bot commented Oct 19, 2024

derrabus commented Oct 19, 2024

SerheyDolgushev commented Oct 20, 2024

SerheyDolgushev commented Aug 23, 2023 •

edited

Loading

SerheyDolgushev commented Sep 13, 2023 •

edited

Loading