Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-Aggregate Not Deserializing #8359

Closed
deinman opened this issue Sep 26, 2024 · 7 comments
Closed

Sub-Aggregate Not Deserializing #8359

deinman opened this issue Sep 26, 2024 · 7 comments

Comments

@deinman
Copy link

deinman commented Sep 26, 2024

Elastic.Clients.Elasticsearch version:
8.15.6

Elasticsearch version:
8.13.0

.NET runtime version:
8.0.8

Operating system version:
Win 11

Description of the problem including expected versus actual behavior:
Passing a sub aggregate causes the resultant aggregate hits to not deserialize.

Steps to reproduce:

var terms = Aggregation.Terms(new TermsAggregation
{
    Field = Infer.Field<Model>(x => x.GroupId),
});

terms.Aggregations = new Dictionary<string, Aggregation>
{
    {
        "earliest_event", Aggregation.TopHits(new TopHitsAggregation
        {
            Sort = new List<SortOptions>
            {
                SortOptions.Field(Infer.Field<Model>(x => x.StartTime),
                    new FieldSort { Order = SortOrder.Asc })
            },
            Size = 1
        })
    }
};

var aggregations = new Dictionary<string, Aggregation> { { "events", terms } };

var request = new SearchRequest<Model>
{
    Aggregations = aggregations,
    Query = Query.MatchAll(new MatchAllQuery())
};

var result = await client.SearchAsync<Model>(request);
public class Model
{
    public Guid Id { get; set; }
    public DateTimeOffset StartTime { get; set; }
    public Guid? RecurrenceId { get; set; }
    public Guid GroupId => RecurrenceId ?? Id;
}

Image

Expected behavior
I'd expect the resultant Hits inside the TopHitsAggregate to be typed as Model. If there's somewhere else I need to define the TDocument other than the SearchRequest - I can't find it.

I've confirmed the data in the nested Hits has the expected information, it's just coming back as an object.

As a work-around, I'm able to just deserialize it in some code after it comes back, which is workable, but it seems like there should be a way to get that TDocument to apply so users don't have to manually deserialize the result.

Provide ConnectionSettings (if relevant):
N/A

Provide DebugInformation (if relevant):
N/A

@deinman deinman added 8.x Relates to 8.x client version Category: Bug labels Sep 26, 2024
@christiaanderidder
Copy link

christiaanderidder commented Sep 27, 2024

We have exactly the same issue, but did manage to find a temporary workaround. I hope this helps you too:

var firstHitForBucket = topHitsAggregate.Hits.Hits.First();
                
// There is an issue where Hit<T> is not typed correctly and returned as a Hit<object?> of type Hit<JsonElement>.
// For now, we parse the json ourselves.
// See: https:/elastic/elasticsearch-net/issues/8359
if (firstHitForBucket.Source is not JsonElement json) continue;

// Make sure to deserialize the JsonElement with the same options as the regular elastic document deserializer.
var document = json.Deserialize<TDocument>(serializer.SerializerOptions);

We get the serializer from ElasticsearchClient.ElasticsearchClientSettings.SourceSerializer. By default the options are not exposed.
If you don't use a custom serializer you can get them from DefaultSourceSerializer.CreateDefaultJsonSerializerOptions().
In our case we use a custom serializer in which we expose the settings ourselves:

public class CustomJsonSerializer : SystemTextJsonSerializer 
{
    private readonly JsonSerializerOptions _options;

    public CustomJsonSerializer(IElasticsearchClientSettings settings) : base(settings)
    {
        // This is necessary for backwards compat
        _options = DefaultSourceSerializer.CreateDefaultJsonSerializerOptions();
        var jsonStringEnumConverter = _options.Converters.OfType<JsonStringEnumConverter>().First();
        _options.Converters.Remove(jsonStringEnumConverter);
    }

    protected override JsonSerializerOptions CreateJsonSerializerOptions() => _options;
    
    public JsonSerializerOptions SerializerOptions => _options;
}
// We use our own custom serializer, which we can abuse to expose and reuse internal serializer settings
var serializer = _elasticClient.ElasticsearchClientSettings.SourceSerializer as CustomJsonSerializer;

@flobernd
Copy link
Member

Hi @deinman, this behavior is not specific to sub-aggregations, but always the case for TopHitsAggregation.

In our specification, this type is currently modelled like this:

/** @variant name=top_hits */
export class TopHitsAggregate extends AggregateBase {
  hits: HitsMetadata<UserDefinedValue>
}

UserDefinedValue is equivalent to untyped/object.

In order to deserialize to the correct type, we would have to push down the generic type argument TDocument to the TopHitsAggregation type. I will check the implications with the rest of the team and come back to you when we came to a conclusion.

Thanks for providing a workaround @christiaanderidder !

@flobernd
Copy link
Member

It was just mentioned to me that top hits can as well be used to get top nested documents, in which case TDocument is not correct:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_top_hits_support_in_a_nested_or_reverse_nested_aggregator

As this behavior is dependant on a runtime-value (mappings), we have to return objects.

@deinman
Copy link
Author

deinman commented Sep 30, 2024

Ah, that's a bummer - but understandable given the different uses of TopHitsAggregation. Thanks for looking into it @flobernd and thanks for the workaround @christiaanderidder, I'd come up with basically the same thing - but it was great to know that someone else thought it up and implemented it as well!

@deinman deinman closed this as completed Sep 30, 2024
@flobernd
Copy link
Member

I'll see if I can provide an easier way of accessing the SourceSerializer 🙂

@flobernd
Copy link
Member

@deinman @christiaanderidder I just created a PR in the transport library that adds a few extension methods which can be used to further (de-)serialize arbitrary data (e.g. JsonElement) in an efficient way.

@flobernd
Copy link
Member

Ho @deinman @christiaanderidder, I have now created this follow up PR to include these changes in the actual client.

This will potentially be a breaking change for you as I had to change the inheritance hierarchy for the DefaultSourceSerializer as well as the methods of the base class itself.

The custom serializer implementation is no longer required. Instead, you can just use:

if (firstHitForBucket.Source is not JsonElement json) continue;

// Make sure to deserialize the JsonElement with the same options as the regular elastic document deserializer.
var document = _elasticClient.ElasticsearchClientSettings.SourceSerializer.Deserialize<TDocument>(json);

Make sure to add:

using Elastic.Transport.Extensions;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants