Change `@timestamp` field type to `date_nanos` in Logs data streams #102548

eyalkoren · 2023-11-23T16:40:42Z

Changing the @timestamp field type for Logs data streams in order to support nanosecond precision.

Implementation

Logs data streams have two sources defining the @timestamp field - ecs@mappings and logs@mappings.
As long as there is no formal support for date_nanos timestamps in ECS, we can only change it for logs-*-* data streams through logs@mappings, which will take precedence as it is an explicit mapping, as opposed to the ECS dynamic mappings.

Risks

What stems from the above is that the change proposed here introduces an incompatibility of the default Logs index template with ECS.
Although date and date_nanos types are mostly compatible and interchangeable, there is a risk of breaking specific client behaviors with this change.
Even though the value returned through searches is always a string, there may still be inconsistencies created when switching from date to date_nanos, for example:

By default, the value returned from date fields will be formatted according to the format that is associated with it. Since we don't define explicit format for @timestamp (before or within this change), the returned string will be formatted according to the default format. Since the default format for date is strict_date_optional_time and the default format for date_nanos is strict_date_optional_time_nanos, the returned value will be different, for example - 2023-11-23T12:51:54.048204Z instead of 2023-11-23T12:51:54.048Z. While not a huge risk, this may fail clients that run verifications on the returned value format. Even only breaking tests that assert for specific values is unpleasant.
A worse issue may occur when a numeric format (like epoch_millis) is specified in the search request and the client fails to parse it due to its different length, like the issue that was recently fixed in Kibana, where the TypeScript library failed to parse the nanoseconds date representation as it was too long.

Still missing

documentation of this change, risks, and way to restore
testing data stream with mixed indices - some with date timestamp and some with date_nanos timestamp
testing restoration to date type through logs@custom

elasticsearchmachine · 2023-11-23T16:41:07Z

Hi @eyalkoren, I've created a changelog YAML for you.

ruflin · 2023-11-24T07:13:25Z

It would be great to get feedback from the following groups:

@jpountz : Elasticsearch, storage, ...
@joshdover @AlexanderWert @axw : Otel
@elastic/clients-team : It would be good to know how the different clients would handled this change

Eventually we have to make this change because of otel and as I would argue, it is the better default. The main question for me is how we can roll it out in the best way with the minimal risk for our users. Some more ideas:

Make it default first in some of the integrations (or all of them?)
Make it available first in Serverless only -> possible?
Nudge users to change the default in the UI first. In Logs Explorer, ask users if we should change the default and they opt in at first and later we make the default change?

AlexanderWert · 2023-11-24T07:30:36Z

I can only comment on the need from OTel perspective: With OTLP using a nano-precision timestamp and our vision to store OTLP data with least mapping possible, we will need this for "Going big on OTel" rather sooner than later.

I'm not deep enough though to comment on the risks.

felixbarny · 2023-11-24T08:09:59Z

the returned value will be different, for example - 2023-11-23T12:51:54.048204Z instead of 2023-11-23T12:51:54.048Z

While true, this risk only affects users that actually ingest logs with timestamps in a resolution that's more granular than milliseconds. If the input timestamps are in milliseconds (such as 2023-11-23T12:51:54.048Z), the output will be the same (2023-11-23T12:51:54.048Z), without added zeros (2023-11-23T12:51:54.048000000Z).

I suppose this will not be the case for most typical logging use cases but it does have an effect for traces (which are out of scope for this PR, though).

A worse issue may occur when a numeric format (like epoch_millis) is specified in the search request and the client fails to parse it due to its different length

While there's a risk, the semantic of epoch_millis doesn't change - it doesn't become epoch_nanos. What changes is that the number may have fractional digits ("@timestamp": ["1700758463099"] vs "@timestamp": ["1700758463099.1235"]). But as this is not the default format, I don't think there's risk coming from that. If someone has changed the format, it means they've added a custom date or date_nanos mapping, thus overriding the default timestamp mapping.

@weltenwort I'd also be interested in your opinion on the risks of changing the field type for @timestamp from date to date_nanos from a client's perspective.

jpountz · 2023-11-24T09:20:57Z

It would be great to get feedback from the following groups:
@jpountz : Elasticsearch, storage, ...

You should expect higher storage, but I don't think that this should drive the decision. If nano resolution is the standard, let's adopt it instead of fighting against it.

We should just double check that everything we need works with nanosecond resolution, historically we had a couple incompatible features, hopefully they all got fixed since then.

axw · 2023-11-24T10:15:04Z

Seconding @AlexanderWert's and @jpountz's comments. I don't have anything more to add :)

weltenwort · 2023-11-24T10:15:26Z

It's hard to judge the risk of breaking clients in general since the parsing behavior is very dependent on the specific language and libraries used. Time and date representation is always a tricky topic. So in that sense it sounds like a breaking change to me.

Looking at Kibana specifically, the language mostly reduces the precision of too large numbers instead of failing the parsing. But as far as I'm aware the best practices are to use the string representations anyway. @felixbarny not sure if that is the information you were looking for.

StephanErb · 2023-11-24T10:16:46Z

From an end-user perspective the biggest reason why I want nanoseconds is to reduce the likelihood of log reordering in Kibana Discover and the Log Stream UI. There is currently no reliable tiebreaker that would prevent that (e.g., filebeat sending logs into one index with 4 shards, sometimes 10s of log lines per microsecond). If that could be fixed I would be happy - with or without nanoseconds.

felixbarny

I think we should adapt the logs@default-pipeline a bit here. Currently, it assigns a default @timestamp if it has not been set by the user, that's equal to the arrival time of the document. This change implicitly also increases the precision of that default timestamp from milliseconds to microseconds. I don't think this level of precision is required for the default timestamp. Therefore, I propose to truncate the timestamp in a script processor. For example like so:

{
  "script": {
    "if": "ctx['@timestamp'] == null",
    "source": "ctx['@timestamp'] = System.currentTimeMillis()"
  }
}

felixbarny · 2023-11-24T11:36:58Z

You should expect higher storage

That's only the case when using timestamps that have a higher precision than milliseconds, right? If the precision of the stored timestamps doesn't increase, Lucene will optimize storage using gcd encoding, is that correct?

But as far as I'm aware the best practices are to use the string representations anyway.

Using the default format for date_nanos (strict_date_optional_time_nanos||epoch_millis), the fields API will always return the date in a string/ISO8601 representation. I suppose this should guard against most issues. Even if there may be a precision loss, I'd expect date parsing implementations not to fail if the ISO8601 representation has more precision than what the date library supports. But there's no guarantee that all date parsing libraries actually degrade gracefully here.

ruflin · 2023-11-24T11:43:09Z

I don't think this level of precision is required for the default timestamp.

Playing devils advocate here and taking the comment from @StephanErb into account. Lets assume for some reason the timestamp was not set on the edge. The bulk request arriving still has the docs in the order of how each line was read. If we use more precision, will each entry have a slightly different timestamp and with it gets the right sorting order? Or will it get all the same timestamp anyways?

felixbarny · 2023-11-24T11:55:41Z

The default timestamp is just a guard against data loss rather than a full replacement for properly setting and parsing the timestamp. Documents that are processed in the same microsecond would still have the same timestamp currently. So an actual nano-precision timestamp set in the application that produces the logs, or the log.file.offset may be a better tiebreaker in these cases.

jpountz · 2023-11-24T17:25:50Z

That's only the case when using timestamps that have a higher precision than milliseconds, right? If the precision of the stored timestamps doesn't increase, Lucene will optimize storage using gcd encoding, is that correct?

This is correct for doc values. The KD index is a bit less smart about these things, so a date_nanos timestamp with milli resolution would still use a bit more space than a date field. We could look into improving this.

eyalkoren · 2023-11-26T08:32:58Z

Thanks for all the great feedback everyone 🙏

While true, this risk only affects users that actually ingest logs with timestamps in a resolution that's more granular than milliseconds. I suppose this will not be the case for most typical logging use cases...

@felixbarny True. As you noted, we do set automatically the @timestamp to _ingest.timestamp now, which is in microseconds resolution. But why do you think we should change the pipeline? It's a recent addition that I think would affect mostly new use cases. Existing use cases probably had to manually add timestamps, otherwise their documents got dropped. So if we agree this specific issue is a low risk and low impact anyway, I also think we should stick with the higher resolution timestamp for our automatic default (meaning - keep the pipeline as it is).

While there's a risk, the semantic of epoch_millis doesn't change - it doesn't become epoch_nanos. What changes is that the number may have fractional digits ("@timestamp": ["1700758463099"] vs "@timestamp": ["1700758463099.1235"]). But as this is not the default format, I don't think there's risk coming from that. If someone has changed the format, it means they've added a custom date or date_nanos mapping, thus overriding the default timestamp mapping.

@felixbarny This is not a theoretical thing, it is a real issue that broke the Logs UI. You are right, the issue is not that epoch_millils becomes epoch_nanos, it's because the client library expected a number in one format and got another. If this happened to us, I won't hurry dismissing it from happening elsewhere.

I'd need a bottom line here to decide on next steps. To do it as easy as possible, let's start with a go/no-go decision on this change as it is proposed now. Please add your votes with 👍 (go) or 👎 (no-go) and feel free to add a comment below either way if required, for example: "I think we should proceed with this change but it also requires ...." or "I think we should block this PR until the following dependencies are done: ..."

felixbarny · 2023-11-27T07:48:23Z

But why do you think we should change the pipeline? It's a recent addition that I think would affect mostly new use cases. Existing use cases probably had to manually add timestamps, otherwise their documents got dropped. So if we agree this specific issue is a low risk and low impact anyway, I also think we should stick with the higher resolution timestamp for our automatic default (meaning - keep the pipeline as it is).

It has more to do with cost rather than risk. High-resolution timestamps have a real cost for users. We already know that the default timestamp (arrival timestamp) isn't the actual timestamp so I don't think paying for the additional resolution is really worth it given that the precision in terms of getting closer to the real timestamp doesn't increase. The counter to that is that it gives more precision for when documents have arrived which could help with ordering. I think it's a valid argument and there's some value added by going to a higher granularity. But I still doubt that the additional costs are worth it in the specific case of the default timestamp.

If this happened to us, I won't hurry dismissing it from happening elsewhere.

I'm not trying to dismiss the risks here but get more fidelity into what the exact risks are. I do acknowledge that there's some level of risk associated with making the change. At the same time, I think it's less of a question of whether or not we should make the change. I think it's pretty clear that we'll do it. The question is how we roll out the change, how we minimize the risk, and how we communicate the change to users.

Given that there's relatively easy workaround for users that do experience issues by using the format parameter in fields to get the date formatted as strict_date_optional_time, I feel like the risk of this change is low enough to just do. But we should definitely add something to the release notes and potentially add a troubleshooting guide and inform the support team about this change.

I'm also wondering whether we should engage the breaking changes committee for this. @jpountz as you're part of the committee, what's your recommendation here?

eyalkoren · 2023-11-27T08:34:25Z

The counter to that is that it gives more precision for when documents have arrived which could help with ordering. I think it's a valid argument and there's some value added by going to a higher granularity. But I still doubt that the additional costs are worth it in the specific case of the default timestamp.

Yes, that's the point, it relates to the tie breaking discussion above. I don't think the storage cost worths consideration in this case. Our automatic timestamp setting should be applied to a negligible fraction of the documents (in general, not within a specific index necessarily). When it is applied, I think it's fine that it consumes the same storage. I don't mind either way, only raising this question.

I think it's less of a question of whether or not we should make the change. I think it's pretty clear that we'll do it.

OK, that's an important input. My understanding was that there may be an alternative, for example push nanosecond-resolution timestamp within ECS and then rely on that. If we go this path, it may be a different field, which means that this PR is irrelevant.

If we have an agreement that we go with this change, let's do discuss the rollout and whether we need to do something other than documenting and notifying (for which I will need some more specific pointers as well).

joshdover · 2023-11-27T10:01:16Z

No significant additional comments to make here - agree with the comments above. We should pay the cost now since this is generally the direction we need to go in, especially with OTel data where we may not always be able to depend on a log.file.offset always being present and should still have a good experience.

Though not in scope of this PR, worth mentioning that I do think it makes sense to avoid making this change on metrics data streams for now, since ~~@timestamp is already the most expensive field in time series data streams and there's not a strong use case sampling metrics so frequently.~~

eyalkoren · 2023-11-27T11:07:05Z

Though not in scope of this PR, worth mentioning that I do think it makes sense to avoid making this change on metrics data streams for now, since @timestamp is already the most expensive field in time series data streams and there's not a strong use case sampling metrics so frequently.

@joshdover maybe it is relevant to this PR.
IIANM your comment implies that we will need eventually to have different timestamp fields in ECS according to the different precision requirements, is that correct?

joshdover · 2023-11-27T12:49:11Z

Actually - I got my fields mixed up. _id and _seq_no are the expensive fields. I can't comment on the impact of increasing precision of @timestamp on TSDB storage. @martijnvg have you looked into this?

martijnvg · 2023-11-28T08:18:36Z

@joshdover I've not experimented with changing the the field type of the @timestamp field from date to date_nanos. Like Adrien suggests, it will likely increase disk usage, but it is difficult to reason how much exactly. If an integration sends timestamps as millis precision (but in nanos format) and we change the @timestamp mapping to date_nanos, then I expect that the storage usage increase will be minimal. Only if the integration really starts to use nanos level precision, then I expect storage usage increase to be more noticeable.

StephanErb · 2024-03-26T21:08:05Z

A significant part of this thread has been about storage costs. Would it be an option to bundle the date_nanos change with the new logs index mode #106462? As this comes with significant storage savings (>=50%?) paying a bit extra for the larger timestamp won't hurt any longer from a user perspective.

ruflin · 2024-03-27T08:36:34Z

We could bundle it together with the logsdb change but to be honest, I would prefer to make the change for everything, we just didn't get around yet to push it forward.

felixbarny · 2024-03-27T08:50:56Z

@StephanErb in the meantime, are you able to override the definition of the @timestamp field in the logs@custom component template?

ruflin · 2024-03-27T15:37:25Z

To keep us moving forward on this one:

It seems we all agree this change has to happen eventually. More and more logs are produced, otel moves to it, eventually becoming a standard. It is not a question if, it is a question when.
Storage: There is a storage impact but it should be minimal. We should run some rally tracks with and without nano seconds on our logs data sets to see the diff
Risk: There is risk that in some edge cases it might break a user that consumes the timestamps in their library in a very specific way. We have not found an easy fix for this conundrum. The workaround is that these users use logs@custom to rewrite it to the old behaviour.

Do I miss something in the above summary? If not, I suggest we move forward by running the storage test and find alignment on how we communicate the above change best to warn users ahead.

tylerperk · 2024-06-17T20:01:22Z

I suggest we move forward by running the storage test and find alignment on how we communicate the above change best to warn users ahead.

@ruflin has someone determined the storage impact of date_nanos in logs?

ruflin · 2024-06-18T06:41:18Z

@tylerperk I'm not aware that this was done so far. We should still do this but also have a look at the conversation around storage in the previous comments.

Change @timestamp field type to date_nanos in Logs data streams

cbb040f

eyalkoren added >enhancement :Data Management/Data streams Data streams and their lifecycles v8.12.0 labels Nov 23, 2023

eyalkoren requested review from ruflin and felixbarny November 23, 2023 16:40

eyalkoren self-assigned this Nov 23, 2023

elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Nov 23, 2023

Update docs/changelog/102548.yaml

bc7e21a

eyalkoren changed the title ~~Change @timestamp field type to date_nanos in Logs data streams~~ Change @timestamp field type to date_nanos in Logs data streams Nov 23, 2023

felixbarny reviewed Nov 24, 2023

View reviewed changes

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

felixbarny unassigned eyalkoren Dec 8, 2023

ruflin mentioned this pull request Dec 20, 2023

[Proposal] Stop shipping log.offset by default elastic/elastic-agent#3934

Open

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

ruflin mentioned this pull request Mar 28, 2024

Duplicate handling with rollover #44794

Open

gbamparop assigned ruflin Apr 10, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

ruflin mentioned this pull request Jun 5, 2024

[ES|QL] Support date_nanos field type #109352

Open

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change `@timestamp` field type to `date_nanos` in Logs data streams #102548

Change `@timestamp` field type to `date_nanos` in Logs data streams #102548

eyalkoren commented Nov 23, 2023 •

edited

Loading

elasticsearchmachine commented Nov 23, 2023

ruflin commented Nov 24, 2023

AlexanderWert commented Nov 24, 2023

felixbarny commented Nov 24, 2023

jpountz commented Nov 24, 2023 •

edited

Loading

axw commented Nov 24, 2023

weltenwort commented Nov 24, 2023

StephanErb commented Nov 24, 2023 •

edited

Loading

felixbarny left a comment

felixbarny commented Nov 24, 2023

ruflin commented Nov 24, 2023

felixbarny commented Nov 24, 2023

jpountz commented Nov 24, 2023

eyalkoren commented Nov 26, 2023

felixbarny commented Nov 27, 2023

eyalkoren commented Nov 27, 2023

joshdover commented Nov 27, 2023 •

edited

Loading

eyalkoren commented Nov 27, 2023 •

edited

Loading

joshdover commented Nov 27, 2023

martijnvg commented Nov 28, 2023

StephanErb commented Mar 26, 2024

ruflin commented Mar 27, 2024

felixbarny commented Mar 27, 2024

ruflin commented Mar 27, 2024

tylerperk commented Jun 17, 2024

ruflin commented Jun 18, 2024

Change @timestamp field type to date_nanos in Logs data streams #102548

Are you sure you want to change the base?

Change @timestamp field type to date_nanos in Logs data streams #102548

Conversation

eyalkoren commented Nov 23, 2023 • edited Loading

Implementation

Risks

Still missing

elasticsearchmachine commented Nov 23, 2023

ruflin commented Nov 24, 2023

AlexanderWert commented Nov 24, 2023

felixbarny commented Nov 24, 2023

jpountz commented Nov 24, 2023 • edited Loading

axw commented Nov 24, 2023

weltenwort commented Nov 24, 2023

StephanErb commented Nov 24, 2023 • edited Loading

felixbarny left a comment

Choose a reason for hiding this comment

felixbarny commented Nov 24, 2023

ruflin commented Nov 24, 2023

felixbarny commented Nov 24, 2023

jpountz commented Nov 24, 2023

eyalkoren commented Nov 26, 2023

felixbarny commented Nov 27, 2023

eyalkoren commented Nov 27, 2023

joshdover commented Nov 27, 2023 • edited Loading

eyalkoren commented Nov 27, 2023 • edited Loading

joshdover commented Nov 27, 2023

martijnvg commented Nov 28, 2023

StephanErb commented Mar 26, 2024

ruflin commented Mar 27, 2024

felixbarny commented Mar 27, 2024

ruflin commented Mar 27, 2024

tylerperk commented Jun 17, 2024

ruflin commented Jun 18, 2024

Change `@timestamp` field type to `date_nanos` in Logs data streams #102548

Change `@timestamp` field type to `date_nanos` in Logs data streams #102548

eyalkoren commented Nov 23, 2023 •

edited

Loading

jpountz commented Nov 24, 2023 •

edited

Loading

StephanErb commented Nov 24, 2023 •

edited

Loading

joshdover commented Nov 27, 2023 •

edited

Loading

eyalkoren commented Nov 27, 2023 •

edited

Loading