Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mapping time series collections #2687

Open
wants to merge 16 commits into
base: 2.10.x
Choose a base branch
from

Conversation

alcaeus
Copy link
Member

@alcaeus alcaeus commented Oct 10, 2024

Q A
Type feature
BC Break no
Fixed issues

Summary

Supersedes #2597. I was unhappy with the way we were mapping time series collections, so instead of adding more options to the Document attribute I decided to introduce a new attribute and XML mapping element to handle time series data. To make data handling easier, the ClassMetadataInstance will store the attribute instance entirely, as it serves as a value object for time series options. As a result, the XML driver is able to create the attribute instance from the data in the XML, needing no additional validation.

The only other change needed was to add support for time series options in the schema manager. Since time series collections need to be created as such, no handling is needed to update time series collections at all. Dropping time series collections works the same as dropping other collections, again requiring no other changes.

The tests in this PR cover the basic mapping and schema manager changes. No functional testing for time series collections was added, but I can add those tests if necessary.

Copy link
Member

@GromNaN GromNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

The functional test would ensure we pass the correct options to create the collection and this options works with all supported server versions (past and future).

);

if ($class->timeSeriesOptions->expireAfterSeconds) {
$options['expireAfterSeconds'] = $class->timeSeriesOptions->expireAfterSeconds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted this option is specific to TimeSeries collections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit odd that create receives this as a top-level option instead of within timeseries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option is also used for clustered collections, which can leverage a TTL to remove expired documents. I'm guessing the team decided to reuse the option since it was already there instead of adding an extra option in the timeseries options document.

* @ODM\TimeSeries(timeField="time", metadataField="metadata", granularity=Granularity::Seconds, expireAfterSeconds=86400)
*/
#[ODM\Document]
#[ODM\TimeSeries('time', 'metadata', Granularity::Seconds, 86400)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named arguments are frequently used for attributes, you should use them to ensure there is not regression with argument name change.

<xs:complexType name="time-series">
<xs:attribute name="time-field" type="xs:NMTOKEN" use="required" />
<xs:attribute name="meta-field" type="xs:NMTOKEN" />
<xs:attribute name="granularity" type="odm:time-series-granularity" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MongoDB 6.3 added bucketMaxSpanSeconds and bucketRoundingSeconds as an alternative to granularity. Both options are expected to have the same value (between 1 and 31536000), although we can rely on the server to enforce that. See:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed those. I decided against adding validation, except for limiting the range in the XSD as we're able to type this better. I'm not sure why there are two different options that require having the same value, but something tells me that this requirement may change in the future (a great example why we have the Defy Augury mantra in the driver specifications).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why there are two different options that require having the same value

From the minds that brought us a required sparsity option for encrypt() methods, before it was recently made optional. I jest, but that's what this reminded me of.

lib/Doctrine/ODM/MongoDB/SchemaManager.php Outdated Show resolved Hide resolved
lib/Doctrine/ODM/MongoDB/Mapping/ClassMetadata.php Outdated Show resolved Hide resolved
lib/Doctrine/ODM/MongoDB/Mapping/ClassMetadata.php Outdated Show resolved Hide resolved
// Mark as time series only after mapping all fields
if (isset($classAttributes[TimeSeries::class])) {
assert($classAttributes[TimeSeries::class] instanceof TimeSeries);
$metadata->markAsTimeSeries($classAttributes[TimeSeries::class]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the first markAs method in ClassMetadata? Just curious why you went this route instead of a setTimeSeries method as we have for ShardKey.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the first time I used this naming. I could change it to setTimeSeriesOptions to be consistent with others, but I figured that technically we're marking this as a time series document. Other methods could be changed as well, e.g. setShardKey should be markAsSharded as it doesn't allow to unshard metadata at a later stage anyways.

I have started another branch with a more comprehensive metadata change (e.g. using objects instead of arrays for field mappings, along with other changes), so I can defer the naming change to then if you'd prefer consistency here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you intend to revise things anyway, I have no preference. Probably easier to adopt the new format earlier and migrate older stuff as needed at some later point.

}

foreach ($xmlRoot->{'also-load-methods'}->{'also-load-method'} as $alsoLoadMethod) {
$metadata->registerAlsoLoadMethod((string) $alsoLoadMethod['method'], (string) $alsoLoadMethod['field']);
if (! isset($xmlRoot->{'time-series'})) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems long enough to do without this guard statement. If another attribute is added down the line, you'll likely need to wrap this in a condition as you just did for also-load-methods, so why not just do that now and avoid ever having to do this again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is our coding standard requiring an early exit, which seems pointless at this point. For now I've disabled the rule around this method, but I may want to refactor this into multiple methods at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the rule makes little sense for this kind of method. These aren't traditional guard statements of returning early before some logical operation is performed, because the method is handling many different kinds of operations. If anything, it just makes the final operation inconsistent from those handled earlier.

lib/Doctrine/ODM/MongoDB/Mapping/Driver/XmlDriver.php Outdated Show resolved Hide resolved
lib/Doctrine/ODM/MongoDB/SchemaManager.php Show resolved Hide resolved
);

if ($class->timeSeriesOptions->expireAfterSeconds) {
$options['expireAfterSeconds'] = $class->timeSeriesOptions->expireAfterSeconds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit odd that create receives this as a top-level option instead of within timeseries.

Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API looks stable enough to add docs to this PR.

@@ -806,6 +807,9 @@
*/
public $isReadOnly;

/** READ ONLY: stores metadata about the time series collection */
public ?TimeSeries $timeSeriesOptions = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use readonly here? AFAIK, that would just mean that markAsTimeSeries(), which sets this, can only be called once. Perhaps that's something to defer until later, as I expect it could be done across the board for all "READ ONLY" metadata properties.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. What we'd want here is asymetric visibility, where changing the property is only done through accessors (such as markAsTimeSeries) that can incorporate checks, while reading is always allowed. readonly in PHP actually means write-once, meaning that one wouldn't be able to override the time series configuration (e.g. to change granularity) when using inheritance mapping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readonly = you can change the assigned object properties. You cannot change the reference to the object. And I think - I'm not sure though - that you can only make assignment in the constructor 🤔 ?

Copy link
Member Author

@alcaeus alcaeus Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think - I'm not sure though - that you can only make assignment in the constructor

I thought something like that as well, but then it seems we are wrong, and the property can be initialised after the constructor, but only from within the class scope: https://3v4l.org/cV5QL

For our case, this still wouldn't work however, as we can't assign a default value to a readonly property but have to leave it uninitialised, which means that accessing the property always needs to be guarded by an isset check :(

@alcaeus
Copy link
Member Author

alcaeus commented Oct 17, 2024

@jmikola good point about the docs. I added TimeSeries to the attribute reference and wrote a short cookbook article based on the MongoDB documentation. I'd appreciate a thorough review of the cookbook entry to ensure it works for users 🙇‍♂️

@alcaeus alcaeus requested a review from jmikola October 17, 2024 08:50
#[ODM\Field(type: 'float')]
public float $temperature,
) {
$this->id = (string) new ObjectId();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ODM require a string cast here, or is it only done to satisfy the string type hint on the property? IIRC, ODM would have allowed an ObjectId on the initially created entity, and left is as-is for insertion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default id type converts an ObjectId instance to a string in the entity. It's bothered me for a while as working with the ObjectId instance directly would make more sense, but I've shied away from making the big BC break. We could use ObjectId|string as type for the identifier, but this would create ambiguity in the API as users would always have to check types when using the identifier, hence the string cast.

docs/en/cookbook/time-series-data.rst Outdated Show resolved Hide resolved
docs/en/cookbook/time-series-data.rst Outdated Show resolved Hide resolved
docs/en/cookbook/time-series-data.rst Outdated Show resolved Hide resolved
docs/en/cookbook/time-series-data.rst Outdated Show resolved Hide resolved

-
``metaField`` - The name of the field which contains metadata in each time
series document. The field can be of any data type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't immediately clear to me from reading the Time Series docs, but are documents in a time series collection restricted to only having _id, timeField, and metaField fields? The docs suggest that everything else gets stuffed into the metaField.

If so, this may be something to note in the cookbook article when we demonstrate model creation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've added the following language to the cookbook entry:

A time series document always contains a time value, and one or more measurement
fields. Metadata is optional, but cannot be added to a time series collection
after creating it. When using an embedded document for metadata, fields can be
added to this document after creating the collection.

I've also extended the example to store temperature and humidity, showing that time series documents can contain multiple measurements per entry.

{
$metadata = $this->dm->getClassMetadata(TimeSeriesTestDocument::class);

self::assertNull($metadata->timeSeriesOptions);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a meaningful test? I'm also a bit confused that these tests reference a TimeSeriesDocument class defined in the same file, while you also have a separate TimeSeriesDocument class defined in its own file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it isn't. This was left over from refactoring out the isTimeSeries field, where I replaced all occurrences of checking the field with the corresponding check on the timeSeriesOptions property. I removed the test since it's redundant.

As for the classes being used, there are lots of classes defined in this test file, I believe to avoid issues where changing the mapping of one class to test new functionality affects related tests. I've followed this for consistency, but we'd probably do well to refactor this in the near future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye. Ideally, I suppose these tests would be split into their own files or define anonymous classes within the test itself. I understand there's so much prior art that it'd be a chore to clean everything up at once, though.

@alcaeus alcaeus requested a review from jmikola October 18, 2024 07:23
@@ -20,6 +21,7 @@
use Documents\Sharded\ShardedOne;
use Documents\Sharded\ShardedOneWithDifferentKey;
use Documents\SimpleReferenceUser;
use Documents\TimeSeries\TimeSeriesDocument;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted that this is where the class is referenced. Missed that earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants