Several issues around the flat object type #16061

bugmakerrrrrr · 2024-09-24T12:41:52Z

Is your feature request related to a problem? Please describe

In #6507, we add the flat object type. In current implementation, we use two stages to process the flat_object field in the document. First we use the JsonToStringXContentParser to collect all the keys and values (keyList, valueList and valueAndPathList) in the field and convert to XContentParser for return. The Lucene fields are then constructed by parsing the fields in the XContentParser.

OpenSearch/server/src/main/java/org/opensearch/common/xcontent/JsonToStringXContentParser.java

Lines 80 to 85 in d6bda7d

 builder.field(this.fieldTypeName, new HashSet<>(keyList)); 

 builder.field(this.fieldTypeName + VALUE_SUFFIX, new HashSet<>(valueList)); 

 builder.field(this.fieldTypeName + VALUE_AND_PATH_SUFFIX, new HashSet<>(valueAndPathList)); 

 builder.endObject(); 

 String jString = XContentHelper.convertToJson(BytesReference.bytes(builder), false, MediaTypeRegistry.JSON); 

 return JsonXContent.jsonXContent.createParser(this.xContentRegistry, this.deprecationHandler, String.valueOf(jString));

For a field of flat_object type in a document, the following internal fields will be created by default:

root StringField and SortedSetDocValuesField for each subfield key(prefiexed by root field name);
value StringField and SortedSetDocValuesField for each subfield value;
valueAndPath StringField and SortedSetDocValuesField for each subfield;
_field_name StringField for each value and valueAndPath.

PUT test
{
  "mappings": {
    "properties": {
      "field1": {
        "properties": {
          "field2": {
            "type": "flat_object"
          }
        }
      }
    }
  }
}

PUT test/_bulk
{"index": {}}
{"field1": {"field2": {"a": "1", "b": "2"}}}

For example, the request above generates the fields listed below.

There are several issues around the flat object field.

If a subfield in the flat_object field suffixed by VALUE_SUFFIX (._value) or VALUE_AND_PATH_SUFFIX (._valueAndPath), some extra unexpected field may be created.

OpenSearch/server/src/main/java/org/opensearch/index/mapper/FlatObjectFieldMapper.java

Lines 660 to 669 in d6bda7d

 if (valueType.equals(VALUE_SUFFIX)) { 

 if (valueFieldMapper != null) { 

 valueFieldMapper.addField(context, value); 

 } 

 } 

 if (valueType.equals(VALUE_AND_PATH_SUFFIX)) { 

 if (valueAndPathFieldMapper != null) { 

 valueAndPathFieldMapper.addField(context, value); 

 } 

 }

We use '=' to concat subfield key and value, if a subfield key contains '=', the prefix query may return wrong results.
The root fields is confusing and unnecessary. AFAIK, the root field is use to execute exist query and build fielddata, but it doesn't be generated correctly. For example, if we have document {"field1": {"field2": {"field3": {"a": "1", "b": "2"}}}}, and field2 is flat_object field. After processed, the root fields contains values field1.field2.a, field1.field2.b and field1.field2.field3. The exist query of field1.field2.field3.a doesn't return correct result. On the other hand, I don't know is there any meaning to aggregate or sort on the subfield keys. In fact, I don't think that we need to support aggs on flat_object field, it's a object, not a scalar value. If we do need, then we should aggregate on the subfield values, not the subfield keys. Of course, it still makes sense to aggregate subfields, we can utilize the valueAndPath field to support this.
Creating _field_name field for value and valueAndPath is meaningless. The _field_name field is used by exist query, we just need to create it for each full leaf path of subfield.
The value of SortedSetDocValuesField of value and valueAndPath has unnecessary prefix. When create SortedSetDocValuesField, we use root field name as the prefix of value.
Two-stage processing is unnecessary. In the process of converting to JSON strings, we use a lot of additional resources, this is really unnecessary, we can add the corresponding field to parse context during the process.

Describe the solution you'd like

Use one-stage processing;
Remove the FlatObjecField;
Support the aggs on the subfield, but not the root field, which means the fielddata is not supported on the root field but the subfield;
For the indices created after 2.18.0, remove the prefix of the value of SortedSetDocValuesField.

In addition, I have no good idea to fix the issue 2, any suggestions about this or the overall issue are welcome.

Related component

Search

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

bugmakerrrrrr · 2024-09-24T12:43:30Z

@msfroh @kkewwei you might be interested in this :)

kkewwei · 2024-09-27T06:25:00Z

@msfroh @kkewwei you might be interested in this :)

@bugmakerrrrrr Most of the optimizations make sense to me. @msfroh , how do you think?

I also agree with the fifth point, and mentioned in the pr #14383 (comment)

msfroh · 2024-10-02T20:09:03Z

We use '=' to concat subfield key and value, if a subfield key contains '=', the prefix query may return wrong results.

Note that if we want to change the delimiter, we need to be careful about backward compatibility.

I believe we can tell what version of OpenSearch was used to create an index. Maybe we can continue to use the old behavior on indices created before the change, while supporting the new behavior only indices created on newer versions.

msfroh · 2024-10-02T20:51:07Z

@bugmakerrrrrr Having read your PR, I really like your improvements and suggestions. Combined with @kkewwei's work in #14383, I think it might be worth it to "fork" the existing flat object code to allow us to make the backward-incompatible changes only on new indices.

Some of your improvements (like getting rid of JsonToStringXParser) can be done in a backwards-compatible way. So, maybe we isolate those changes. Other changes, like removing unnecessary prefixes, using the field names field, not using = as a delimiter, etc. could be moved into a new class and we could mark the old class as deprecated. I believe that we only guarantee backward compatibility from the last 2.x release to 3.0, so the deprecated implementation could be removed in 3.0.

bugmakerrrrrr · 2024-10-08T11:07:52Z

@msfroh make sense to me. I'll create a new PR to make some bwc improvements firstly.

bugmakerrrrrr added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 24, 2024

github-actions bot added the Search Search query, autocomplete ...etc label Sep 24, 2024

bugmakerrrrrr linked a pull request Sep 25, 2024 that will close this issue

Optimize flat object mapper by using one-stage processing #16081

Open

3 tasks

sandeshkr419 removed the untriaged label Oct 2, 2024

sandeshkr419 assigned bugmakerrrrrr Oct 2, 2024

msfroh mentioned this issue Oct 2, 2024

Flat object field use IndexOrDocValuesQuery to optimize query #14383

Merged

3 tasks

bugmakerrrrrr mentioned this issue Oct 12, 2024

Optimize flat_object type in a BWC way with one phase processing #16297

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several issues around the flat object type #16061

Several issues around the flat object type #16061

bugmakerrrrrr commented Sep 24, 2024 •

edited

Loading

bugmakerrrrrr commented Sep 24, 2024

kkewwei commented Sep 27, 2024 •

edited

Loading

msfroh commented Oct 2, 2024

msfroh commented Oct 2, 2024

bugmakerrrrrr commented Oct 8, 2024

Several issues around the flat object type #16061

Several issues around the flat object type #16061

Comments

bugmakerrrrrr commented Sep 24, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

bugmakerrrrrr commented Sep 24, 2024

kkewwei commented Sep 27, 2024 • edited Loading

msfroh commented Oct 2, 2024

msfroh commented Oct 2, 2024

bugmakerrrrrr commented Oct 8, 2024

bugmakerrrrrr commented Sep 24, 2024 •

edited

Loading

kkewwei commented Sep 27, 2024 •

edited

Loading