-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues with advanced search queries #2040
Comments
Hmm this certainly isn't generally the case with all installs: just as a data point on a local install on a laptop with ~30,000 items, the time of a 4-clause OR "contains" search isn't anywhere near 75 seconds here. The version of Omeka S and of MySQL or MariaDB you're seeing this on would be relevant, probably. "contains" searches are never going to be particularly efficient by their nature, but if we're talking a minute-plus, that sounds like some sort of pathological mis-optimization going on somewhere. |
I'm using Omeka S 4.0.1 with no modules or themes, 200 items, and MySQL 8.0.32 on an M1 Mac. I also tried MariaDB 10.11.2 but it didn't make any difference in performance times. I have been testing using the "is exactly" query, which is of course an improvement over wildcard searches, but it's still awfully sluggish. Similarly, specifying the metadata term seems to help too but only to a degree. |
4.0.2 makes a small change to indexing for the resources table which could make some relevant difference here. |
I compared 4.0.1 against the latest stable version as well as #2096 using MariaDB 11.1.2, here are my results:
*Kept crashing due to segmentation fault I also noticed that the database doesn't respond to any new queries while the search is being run, so this could be used to very easily carry out a denial-of-service attack. |
Thanks for the research. Can I ask, were these queries performed on the admin or public sides (or rather, as a logged-in or logged-out user, more relevantly)? Since you mention the recent PR, I'm wondering about my comment there that the inclusion of is_public as the first component of the key could cause the index to be not used when is_public isn't part of the query (i.e., when the query is performed by a user that has permission to see all the items). We had trouble before reproducing your performance results... I wonder if you could possibly share a dump of your database so we could try to investigate possible changes and/or compare results to yours? |
These queries were performed from the admin panel, but we originally came across the issue while logged out on front-end, so my initial inclination is that it's no-factor, but I'll look into it more. To populate items on my local machine for these tests, I used the OSII module to pull a duplicate of https://arepr.org/api using all default settings. Let me know if there's anything I can do to lend a hand. |
OK. Specifically the logged-in/logged-out aspect should only really strongly come into play for the PR with the new index, since that index includes is_public. |
Okay got it. I just ran a quick test using the PR while logged out and got similar results to above. |
Can you try out the branch |
Just from initial tinkering, this looks great – huge improvement. I'll test it out more rigorously and report back. |
I ran some more tests and everything works well, within reasonable limits. Thanks again for working on this, I support it being merged |
Thanks, this is very helpful feedback. I'll update when we've moved on this... probably a merge of that branch, though we'll probably squash it or otherwise clean it up a little before then. |
OK, it's merged (d013400). |
Running an advanced search query with more than 3
OR
clauses takes over 75s to complete even with dedicated, high-performance hardware. Additional clauses increase the query time exponentially. I've tested on small installations with few items and populated metadata fields, but it doesn't make much difference. We're trying use this to search for a handful of metadata keywords, which I'd imagine is a relatively common use case.The text was updated successfully, but these errors were encountered: