Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context and DistributedContext specification #75

Closed
wants to merge 5 commits into from

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Jun 6, 2019

This would address issue 46, hoping to find agreement on this as a basic outline. @carlosalberto could you review this and say how far off it is from the Java prototype, specifically with regards to naming, and/or offer feedback of any sort? I put this together after investigating OpenCensus-Go more so than from reviewing the OTel Java prototype.

@jmacd
Copy link
Contributor Author

jmacd commented Jun 6, 2019

Note this addresses #46

FYI I realize this conflicts with #73

But on the surface of it, I'd like to debate why we're using the term "TTL" to describe a property that is definitely not dependent on time. @songy23

Copy link
Member

@songy23 songy23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag has been renamed to DistributedContext (see #46) and will have its own specs.

@jmacd
Copy link
Contributor Author

jmacd commented Jun 6, 2019

@songy23 I'm somewhat confused as to who and how it was decided that "Tag has been renamed". I'm just not sure what I'm supposed to be following?

@songy23
Copy link
Member

songy23 commented Jun 6, 2019

More discussions on the naming can be found at #9 and open-telemetry/opentelemetry-java#11.

But on the surface of it, I'd like to debate why we're using the term "TTL" to describe a property that is definitely not dependent on time.

TTL doesn't describe time. Rather it's about the hops that tags/distributed context can propagate.

@jmacd
Copy link
Contributor Author

jmacd commented Jun 7, 2019

"TTL" is a common abbreviation for "Time to Live". How else did you arrive at the name TTL?
You're saying "TTL" is not time, it's about hops. Why not call it "Hops", then? I propose "Max Hops" is conceptually clearer.

@richardelling
Copy link

TTL as counting hops is used in IPv4 for almost 4 decades. But you are correct in that it is also sometimes used as a timestamp. I'll encourage disambiguation.

@songy23
Copy link
Member

songy23 commented Jun 7, 2019

In fact there's a hop field within TTL. Like @richardelling mentioned, we thought TTL can also represent the count of hops. From Wikipedia:

Time to live (TTL) or hop limit is a mechanism that limits the lifespan or lifetime of data in a computer or network. TTL may be implemented as a counter or timestamp attached to or embedded in the data.

In addition, the W3C correlation-context spec also uses TTL. For example w3c/baggage#4.

That being said personally I don't have a preference on the naming. Please feel free to file a separate issue to continue the discussion @jmacd. For this PR let's focus on context :)

@SergeyKanzhelev
Copy link
Member

@jmacd can you please iterate on it with @songy23. I like in your PR how you describe APIs specifics. I believe those documents can be nicely merged together.

As for TTL - we are at the milestone when we document what was implemented in Java. Proposing changes is fine. But we really need to put a stake in the ground on what API was proposed. And than accumulate and process the feedback.


The defined mutators are:
- `Insert(key, value)`: Enter a new key:value only if the key is not already defined
- `Upsert(key, value)`: Enter a new key:value or update the existing key:value unconditionally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upsert ? ;)

Copy link
Contributor Author

@jmacd jmacd Jun 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what OpenCensus called it. I haven't personally seen a need to distinguish Insert from Upsert semantics, but neither do I doubt its usefulness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu Have we decided to keep Upsert, Insert, and Remove operations on DistributedContext?


### `Context` accessors

#### `Context.Active()`: Get the current Span
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been discouraged in the past to expose the actual Span in the Context object, as we have been afraid of users passing Context around, as it were container.

Something along Span.FromContext() is valid though (which is currently exposed by Tracer.getCurrentSpan())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an application is using more than one Tracer, that seems problematic. Is this a static method?

I would like to see the conversation where this fear of passing context was discussed. Any pointers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm coming from Go, where Context is explicitly passed around. I wrote this specification with Go in mind, and it's difficult to generalize across explicit/non-explicit languages.

@jmacd
Copy link
Contributor Author

jmacd commented Jun 7, 2019

I could imagine a scenario where, let's say, a root node obtains a 1-minute lease on some resource, then places into the correlation context a 1-minute time-to-live on some contextual value. This seems like a useful semantic to have independently of a number-of-hops limit, and "time-to-live" is the perfect thing to name it.

If every time we explain this existing "TTL" concept, we need to use the word "hops", I think we're making things confusing. TTL makes more sense in an IPv4 setting because packets are not meant to live for a long time. Correlation contexts could live indefinitely.

`TagMap` is an immutable map of key:value assignments with metadata about how each key:value assignment (i.e., tag) should propagate. Keys are individually assigned a "Max Hops" value which determines the number of remote processes it will propagate into. Values for Max Hops:

- `0`: The tag will propagate to internal `Contexts` belonging to the same trace, within the local process, but will not propagate to a remote host
- `>=1`: The tag will propagate to a depth of (up to) 1 or more remote hosts from the current node in the trace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/remote hosts/remote processes/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@bogdandrutu
Copy link
Member

@jmacd I disagree with the argument. When it comes to network transfer or propagation TTL (because of the ipv4) has a clear meaning for me. When you refer to storing some data TTL indeed has a meaning related to time, but here we are focused on a "packet" that is propagated, and TTL refers to the propagation aspect of the tag.

@jmacd
Copy link
Contributor Author

jmacd commented Jun 7, 2019

@bogdandrutu @rghetia From Wikipedia:

In theory, under IPv4, time to live is measured in seconds, although every host that passes the datagram must reduce the TTL by at least one unit. In practice, the TTL field is reduced by one on every hop. To reflect this practice, the field is renamed hop limit in IPv6.

So they renamed it "hop limit" to clarify things, too.

@jmacd jmacd requested a review from reyang as a code owner June 7, 2019 19:47

The `Context` API provides a way to obtain an empty `Context`, which has no associated `SpanContext` and an empty `TagMap`.

#### `Context.WithMap(replacement)`: support binding a new TagMap to a Context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithMap sounds very generic, please consider WithTagMap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what do with this, since the TagMap type has been named DistributedContext. I suppose WithDistributedContext (it's sooooooo long).

## Context and Tags

**Tags** are key:value attributes, which are associated with **Spans**,
**Events**, and **Stats** data, that propagate alongside **SpanContext**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still use term stats or we are moving to "metrics"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been working on writing down a position on this topic, and it seems we are using both terms. An open question, in my mind, is whether stats and metrics can be specified in a way so that pre-aggregated metrics layers on top of the stats. Then metrics can be seen as a technique for type-safety and an optimization, but semantically equivalent to recording a stats measurement.

@reyang
Copy link
Member

reyang commented Jun 11, 2019

I could imagine a scenario where, let's say, a root node obtains a 1-minute lease on some resource, then places into the correlation context a 1-minute time-to-live on some contextual value. This seems like a useful semantic to have independently of a number-of-hops limit, and "time-to-live" is the perfect thing to name it.

This "time-to-live" is hard to implement on a distributed environment where clock skew happens.
Given we have the key-value pair (EntryKey and EntryValue), people who need such lease mechanism could implement custom logic on top of it. Having the OpenTelemetry library providing a generic distributed clock seems to be out-of-scope for a telemetry/monitoring library.

If every time we explain this existing "TTL" concept, we need to use the word "hops", I think we're making things confusing. TTL makes more sense in an IPv4 setting because packets are not meant to live for a long time. Correlation contexts could live indefinitely.

TTL hops is a very common terminology. Would you elaborate which part is confusing?

@jmacd
Copy link
Contributor Author

jmacd commented Jun 11, 2019

I would trust that if a user wanted to use time in a distributed way, that they would synchronize clocks and/or use very long timeouts relative to clock skew. When you say "time to live" the intent is pretty clear.

(You could imagine wanting a dctx to live for one minute, as in the example, but where there was only one hop. Then, the intent is to propagate a value and then continue propagating it locally in the 1-hop-away process, but to invalidate it after one minute passes. This is just an imaginary use-case meant to avoid potential future confusion over the name TTL, not something I'm advocating for.)

On the other hand, every time we discuss what TTL means in this specification, the word "hops" comes out. Why not call a thing what it is? I don't think this is super important, but the more clarity the better.

@jmacd jmacd changed the title Context and Tags specification Context and DistributedContext specification Jun 11, 2019
@jmacd
Copy link
Contributor Author

jmacd commented Jun 11, 2019

Obsoleted by #97.

@SergeyKanzhelev
Copy link
Member

@jmacd I like wording in terminology section explaining the Context concept. If you have time to port some of it the current spec - it will be great. Closing this PR

@jmacd
Copy link
Contributor Author

jmacd commented Jun 11, 2019 via email

@jmacd jmacd deleted the jmacd/spec_context_tags branch September 17, 2019 15:11
rockb1017 pushed a commit to rockb1017/opentelemetry-specification that referenced this pull request Nov 18, 2021
Add requirements of where example code location
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants