-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Keyword Field and Refactor FieldMapper with ParameterizedFieldMapper #3
Conversation
Signed-off-by: Mingshi Liu <[email protected]>
Signed-off-by: Mingshi Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am running into issues with FlatObjectFieldMapperTests
class.
The following is a recreation CLI command.
Navigate to <project_root>/plugins/flat-object
folder and run:
../../gradlew ':plugins:flat-object:test' --tests "org.opensearch.flatobject.mapper.FlatObjectFieldMapperTests"
...s/flat-object/src/test/java/org/opensearch/flatobject/mapper/FlatObjectFieldMapperTests.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Mingshi Liu <[email protected]>
Signed-off-by: Mingshi Liu <[email protected]>
ba34d41
to
c05cb46
Compare
Signed-off-by: Mingshi Liu <[email protected]>
@lukas-vlcek Thanks for the doc! I am able to run the plugin now. This Pr now can create the flat-object field type index, upload the document and search on string on the example: —————————Able to create "flat-object" index
—————————Able to load doc to "flat-object" index
———————Able to search and fetch
————— Able to search and fetch
————— Fetch result
The next step would work on to parse the json object into a string into "key value" pair format. |
@mingshl That is nice progress!
|
Finally, I found the reason why The reason is explained here: Simply put, the test will need to depend on @mingshl I will be pushing fix for this tomorrow. |
Signed-off-by: Mingshi Liu <[email protected]>
lukas-vlcek I made the KeyValueJsonXContentParser in the commit
This is the test case that I am using and it's working properly for create index, query filed name, query field value, and update docs, and query again, it's working properly. Requesting review from @froh Next step is to enable search for dot path. Here is the logging that I have for testings: ----create index ---
---upload doc---
---check mapping --
---query on field name and field value ---
-- update doc ---
--- query on added sub field name --
|
@mingshl Great! I will have a look tmrw. |
Signed-off-by: Mingshi Liu <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
@lukas-vlcek Made another commit I made for flat multiple nested fields which should flat multiple layers, so in regarding to how many inner layer of nested JSON, it should be handed in the KeyValueJsonXContentParser. I tested with the following nested Json documents, this is a three levels nested documents:
``
In the past commits, I can achieve the following functionalities:
But there are a couple things in my list that I didn't go through yet
.... I am still working on the dot.path notation query, so might keep the querybuilder file until that. |
the plugin gradle file is updated in the commit Enable flat-object Plugin to run
plugins/flat-object/src/main/java/org/opensearch/flatobject/mapper/FlatObjectFieldMapper.java
Outdated
Show resolved
Hide resolved
plugins/flat-object/src/main/java/org/opensearch/flatobject/mapper/FlatObjectFieldMapper.java
Outdated
Show resolved
Hide resolved
...object/src/test/java/org/opensearch/flatobject/xcontent/KeyValueJsonXContentParserTests.java
Outdated
Show resolved
Hide resolved
...flat-object/src/main/java/org/opensearch/flatobject/xcontent/KeyValueJsonXContentParser.java
Outdated
Show resolved
Hide resolved
1) Fixing failing test `FlatObjectFiledMapperTests.testExistsQueryMinimalMapping`: ``` Can not write a field name, expecting a value com.fasterxml.jackson.core.JsonGenerationException: Can not write a field name, expecting a value at __randomizedtesting.SeedInfo.seed([163E47FC8A06B681:6F031AC0DD5FD2A1]:0) at app//com.fasterxml.jackson.core.JsonGenerator._reportError(JsonGenerator.java:2733) at app//com.fasterxml.jackson.core.json.UTF8JsonGenerator.writeFieldName(UTF8JsonGenerator.java:217) at app//org.opensearch.common.xcontent.json.JsonXContentGenerator.writeFieldName(JsonXContentGenerator.java:195) at app//org.opensearch.common.xcontent.XContentBuilder.field(XContentBuilder.java:298) at app//org.opensearch.flatobject.mapper.FlatObjectFieldMapperTests.writeFieldValue(FlatObjectFieldMapperTests.java:45) at app//org.opensearch.index.mapper.MapperTestCase.writeField(MapperTestCase.java:87) at app//org.opensearch.index.mapper.MapperServiceTestCase.source(MapperServiceTestCase.java:184) at app//org.opensearch.index.mapper.MapperTestCase.assertExistsQuery(MapperTestCase.java:108) at app//org.opensearch.index.mapper.MapperTestCase.testExistsQueryMinimalMapping(MapperTestCase.java:103) ``` The `writeFieldValue` needs to construct a real object. 2) Add "missing" GeoShape field mapper plugin for `FlatObjectQueryBuilderTests`: This is really a hack, which is unfortunately required. For further info see the following ticket I opened: <opensearch-project#6322> We are able to start fixing individual failing tests. Signed-off-by: Lukáš Vlček <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
@mingshl I pushed a commit to fix some of the failing tests. Specifically:
|
Signed-off-by: Mingshi Liu <[email protected]>
I fixed a naming bug with yaml resource folder that was preventing the testcluster to find the yaml tests. We can create and run real yaml tests now! You can use |
We have the first working yaml test! Signed-off-by: Lukáš Vlček <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
- Add modification to CHANGELOG.md to pass CHANGELOG check - Fix `javadoc` and `precommit` issues (mostly formatting and naming) Signed-off-by: Lukáš Vlček <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
- Fixing testingConventions no.1 Not extending OpenSearchTestCase (or respectively LuceneTestCase) was causing this error: ``` > Task :plugins:flat-object:testingConventions FAILED Expected at least one test class included in task :plugins:flat-object:integTest, but found none. Tests classes with suffix `Tests` should extend org.apache.lucene.tests.util.LuceneTestCase but the following classes do not: * org.opensearch.flatobject.xcontent.KeyValueJsonXContentParserTests ``` - Fixing testingConventions no.2 Custom support of `gradle run` task was causing the following error: ``` > Task :plugins:flat-object:testingConventions FAILED Expected at least one test class included in task :plugins:flat-object:integTest, but found none. FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':plugins:flat-object:testingConventions'. > Testing conventions [Expected at least one test class included in task :plugins:flat-object:integTest, but found none.] are not honored ``` We fixed this by making part of `build.gradle` script conditional. This means that from now the `run` task needs to be called like this: `gradle -PisLocal run` Signed-off-by: Lukáš Vlček <[email protected]>
e4f576b
to
c0e2337
Compare
Gradle Check (Jenkins) Run Completed with:
|
@peterzhuamazon I am trying to understand more about Let me add more details here.
|
@mingshl However, if this was the only workflow check that start |
Thanks @peterzhuamazon for the suggestions. Hi @lukas-vlcek since we are not pushing over the upstream, that's fine to remove the gradle check yml file. We will include the gradle check when we are ready to push to main. |
Signed-off-by: Mingshi Liu <[email protected]>
Hi @lukas-vlcek To test the performance of flatObject, I spent some time added a new class FlatObjectMappingBenchmark, I tested with a document with 100 nested level , each with 10 subfields, in using Dynamic Mapping, it gets exception of fields over 1000. BUT it works pretty nicely with Flat-Object Mapping. here are some benchmark outcomes: I shall also test some edge case like 999 subfields in the future commits in using DynamicMapping and FlatObjectMapping, please expect them in the coming week. Thanks @dai-chen for introducing micro-benchmark tool. Requesting review from @dai-chen about this Add FlatObjectMappingBenchmark class. |
Providing more implementation for FlatObjectQueryBuilder so that more tests from AbstractQueryTestcase are passing. One particular issue is that several internal methods assume access to package restricted members or methods. We provided workaround by introducing TempWorkaround class that is a member for this specific package and exposes required methods in a more broad manner. This is a temporary solution and should go away once we decide whether we need the QueryBuilder or not. And if we need it then we can move it to specific package or consider different solution. However, we need revisit this in the future. Signed-off-by: Lukáš Vlček <[email protected]>
Hi, I was not aware I was tagged here as well. Thanks. |
@mingshl Just FYI, I have been looking at some options how we could manage more than a single field to store the content to (using different "analyzers") and one particular option I like a lot is using the multi-field field type. The multi-field is already used by several existing field types under the hood, for instance by autocomplete field types – so that one is a nice source of inspiration atm. |
@lukas-vlcek I successfully created two subfields as keyword in the new commit!!!! The lesson learn I have is that each field would need a fieldmapper, a field type and a field. So I created the ValueFieldMapper and ValueAndPathFieldMapper, and also their field type and fields. It works nicely in the benchmarking class. Please see the new commit in detail I think we have enough to put up a draft PR to the core. I am about to create a new PR again core as API just as we discuss in the issue. |
Signed-off-by: Mingshi Liu <[email protected]>
Signed-off-by: Mingshi Liu <[email protected]>
This test was failing because the builder instance that is passed into multi-fields has the DocValues set to true by default. By setting it to false made the test passing successfully. But this needs to be investigated further. Also fixed the yamlTest to pass successfully, however, it required to call the internal `_value` field in the query explicitly. This can mean that we will need to implement and employ the QueryBuilder(?). Signed-off-by: Lukáš Vlček <[email protected]>
Signed-off-by: Lukáš Vlček <[email protected]>
Explicitly verify that no dynamic fields are created, hence the flat-object field helps reduce the mapping explosion. Signed-off-by: Lukáš Vlček <[email protected]>
Signed-off-by: Mingshi Liu <[email protected]>
After talking to the maintainers, because we are going to release in 2.7, we don't need a feature branch. I got the suggestions that we should start raising a pull request to the main branch. Please see the draft PR 6507 @lukas-vlcek I don't have permission at assign you, maybe you can try assign yourself in the draft PR? |
@mingshl Looking at new commits now. A real quick note - do you think you can remove binary |
Sure. We don’t need a new querybuilder, I will remove it in today’s commit.
From: Lukáš Vlček ***@***.***>
Reply-To: mingshl/OpenSearch-Mingshl ***@***.***>
Date: Tuesday, March 21, 2023 at 7:02 AM
To: mingshl/OpenSearch-Mingshl ***@***.***>
Cc: "Liu, Mingshi" ***@***.***>, Mention ***@***.***>
Subject: Re: [mingshl/OpenSearch-Mingshl] Add Keyword Field and Refactor FieldMapper with ParameterizedFieldMapper (PR #3)
@mingshl<https:/mingshl> Looking at new commits now. A real quick note - do you think you can remove binary QueryFlatObjectTests.class file? I am not sure if QueryFlatObjectTests.java file would pass style check (formatting).
—
Reply to this email directly, view it on GitHub<#3 (comment)>, or unsubscribe<https:/notifications/unsubscribe-auth/A3BBKSRTP5UTMCPU4AP7ZSDW5GYFRANCNFSM6AAAAAAUWHIDTI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Description
This is an WIP towards approach 1. Adding this PR to further discuss about the format of parsing dot path notation approach for key value pairs in keyword field.
In this PR,
-Added keyword field,
-refactor FieldMapper with ParameterizedFieldMapper
-refactor FieldMapperTest with ParameterizedFieldMapper
Next step,
To be determined to write a new parser to read context and tokenized into key-value pairs
* option 1:catalog.title=Lucene in Action
* option 2: catalog.title=Lucene, catalog.title=in, catalog.title=Action
Issues Resolved
#1018
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.