Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST high-level client: add synced flush API #29189

Closed
wants to merge 7 commits into from

Conversation

sohaibiftikhar
Copy link
Contributor

@sohaibiftikhar sohaibiftikhar commented Mar 21, 2018

Relates to #27205
Moved to #30650

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@javanna javanna self-assigned this Mar 21, 2018
@sohaibiftikhar
Copy link
Contributor Author

sohaibiftikhar commented Mar 21, 2018

@javanna This turned out to be a bit more complex than I initially thought it to be. It seems a bit tricky to reconstruct the SyncedFlushResponse object back from its JSON/XContent. Could you please have a look at SyncedFlushResponse::fromXContent to see if this is at least going in the right direction?

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

— Fixed issues with parser for null values
— Cleaned up the response parsing for SyncedFlush
— TODO: Run complete test suite and check style cleanup
@javanna javanna self-requested a review March 22, 2018 16:52
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy!

I wonder if in this case we shouldn't parse the normal response object and make a new one that is more like what is returned over the REST API. This is just so different and you have to go through so many hoops to make so many thing parseable I wonder if it is worth breaking from tradition here.

@javanna, I know you were reviewing this too, what do you think?

Map<String, ShardCounts> shardsCountsPerIndex = new HashMap<>();
Map<String, List<ShardsSyncedFlushResult>> shardsResultPerIndex = new HashMap<>();
// If it is an object we try to parse it for Fields._SHARD or for an index entry
for (Token curToken = parser.currentToken(); curToken != Token.END_OBJECT; curToken = parser.nextToken()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we need to parse the top level object by hand because of the unexpected names. ObjectParser doesn't support that sort of thing.

Integer successfulShards = null;
Integer failedShards = null;
Map<ShardId, List<FailureContainer>> failures = new HashMap<>();
if (curToken == Token.START_OBJECT) { // Start parsing for _shard or for index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to our other parsing code this is a little weird because it doesn't know what field it is parsing up front. I get why you do this, but it is weird.

Also it is weird because we don't serialize all that much information. You get almost nothing if there isn't an error.

Copy link
Contributor Author

@sohaibiftikhar sohaibiftikhar Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed true. Which is why I agree with your initial idea. It would be much better to have a separate response object altogether as this whole effort felt more like a workaround to me. Sorry, I misunderstood initially. We could probably have something like

{
  "_shards" : ...
  "indexes" : {
    "index1" : {
      ...
    },
    "index2" : {
      ...
    },
    ...
  }
}

And then parse the index information like a Map. But I am not sure about the repercussions that changing the JSON structure might have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't change the format of the response at this time. We can at some point, but for now we just have to parse what we have, and figure out what we should do to make things better for the future, meaning potentially breaking changes etc.

.field("relocating_node", relocatingNodeId())
.field("shard", id())
.field("index", getIndexName());
.field(Fields.STATE, state())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've stopped making these Fields objects in a past year or so and just started to use string constants or even quoted strings.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some initial comments, thanks a lot for your efforts @sohaibiftikhar I definitely didn't realize how complicated this API was going to be, I changed its difficulty in the meta issue to "medium". I definitely didn't see that it was printing out ShardRouting hence a lot of other stuff that needs to be parsed back. On alternative proposals, I am open to ideas, but if this is what we return through REST today and what we print out, I don't follow how using a different object would make things simpler. Maybe changing the format of the REST response would, but that is a different matter. @nik9000 could you clarify?

* See <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-synced-flush.html">
* Synced flush API on elastic.co</a>
*/
public void syncedFlushAsync(SyncedFlushRequest syncedFlushRequest, ActionListener<SyncedFlushResponse> listener, Header... headers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as odd as this sounds, could you rename the methods to flushSynced as that's how this API is referred to in our SPEC ? request and response can and should stay the same.

@@ -233,6 +234,15 @@ static Request flush(FlushRequest flushRequest) {
return new Request(HttpPost.METHOD_NAME, endpoint, parameters.getParams(), null);
}

static Request syncedFlush(SyncedFlushRequest syncedFlushRequest) {
String[] indices = syncedFlushRequest.indices() == null ? Strings.EMPTY_ARRAY : syncedFlushRequest.indices();
String endpoint = endpoint(indices, "_flush", "synced");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_flush/synced can now be provided as a single argument, it also won't be encoded this way which is fine as we know that it doesn't need to be encoded.

}
Map<String, String> expectedParams = new HashMap<>();
setRandomIndicesOptions(syncedFlushRequest::indicesOptions, syncedFlushRequest::indicesOptions, expectedParams);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove one of these empty lines please?

endpoint.add(String.join(",", indices));
}
endpoint.add("_flush");
endpoint.add("synced");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endpoint.add("_flush/synced") ?

String[] indices = syncedFlushRequest.indices() == null ? Strings.EMPTY_ARRAY : syncedFlushRequest.indices();
String endpoint = endpoint(indices, "_flush", "synced");
Params syncedFlushparameters = Params.builder();
// This request takes no other parameters other than the indices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this comment

if (curToken == Token.FIELD_NAME) {
String level2FieldName = parser.currentName();
curToken = parser.nextToken();
switch (level2FieldName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually have a switch based on the token found. For instance here we would have one for start_array, and we would do something if the current field name is failures otherwise ignore. And another if token.isValue() for total, successful and failed. You can find something like this for instance in SearchResponse. This makes it easier to reason about what we are parsing I think.

if (failureReason != null &&
shardId != null &&
totalCopies != null &&
successfulCopies != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that given our responses, this is always true?

Copy link
Contributor Author

@sohaibiftikhar sohaibiftikhar Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Same reason as above. I have a general tendency of always expecting illegal JSONs I think. This would mean that if someone changed the toXContent (removing some fields) method tests would start to fail. I will remove it if you think it is not required.

new FailureContainer(shardId, failureReason, totalCopies, successfulCopies, routing)
);
} else {
throw new ParsingException(startLocation, "Unable to construct ShardsSyncedFlushResult");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we try not to throw exception when parsing responses as they may become a problem when it comes to forward compatibility. The client should be able to speak to future versions that have added fields, arrays or objects, by just reading what it knows and ignoring the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding fields will not throw this exception. Removing will. IMHO it should because otherwise the response object would get constructed with nulls.

} else {
parser.skipChildren();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe for readability we could split this into a couple of methods for each section / object?

}
if (totalShards != null &&
successfulShards != null &&
failedShards != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when can it happen that some of these is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above. Just precaution against removing keys from the XContent string in the future. Will remove it if you want.

@sohaibiftikhar
Copy link
Contributor Author

sohaibiftikhar commented Mar 26, 2018

@javanna

On alternative proposals, I am open to ideas, but if this is what we return through REST today and what we print out, I don't follow how using a different object would make things simpler.

I am not sure about simpler. But what it would do is make it logically correct I think. For example consider the method SyncedFlushResponse::getShardsResultPerIndex. Currently this returns all responses successful or not. Since we discard successful responses upon serialization (conversion to XContent String) when we reconstruct this it would only contain responses for failed shards. Hence if you call ::toXContent on a SyncedFlushResponse object and then ::fromXContent on the result obtained it would not give back an equivalent object.

— Split the `toXContent` method into multiple methods
— fixed document:integTest
@sohaibiftikhar
Copy link
Contributor Author

sohaibiftikhar commented May 2, 2018

@javanna Sorry for the long delay as I was confused on how to proceed. I added the rest of the tests and merged with the latest master. gradle check completed successfully except some xpack stuff that seems unrelated. An issue was created for this already. Could you take a look now?

@sohaibiftikhar sohaibiftikhar changed the title [WIP] REST high-level client: add synced flush API REST high-level client: add synced flush API May 2, 2018
@nik9000
Copy link
Member

nik9000 commented May 8, 2018

OK! So, I vanished for a while because I was at the beach. And then a weekend. And then we had to get a few people together to talk about an approach.

So, there are a bunch of APIs like this: the Response objects just don't match the http response that we make. We use the Response objects elsewhere so we can't change them. The simplest way to actually integrate this into the high level rest client is to make a response object that parses the response from the server. Could you do that? Make a new response object that parses the response from the server and put it in the high level rest client.

I'm really sorry to throw away so much work that you've already done. I suppose it isn't a waste because your work was super important in making it clear that we can't share all of the responses even though we'd really like to.

@sohaibiftikhar
Copy link
Contributor Author

@nik9000 Thanks for the feedback. Since I always get confused with naming could you suggest a name for this new response object?

From my understanding, we would still need some part of what we are doing here (ShardRouting would still need to be parsed). The stuff that would change is that we would not be reconstructing the list of ShardsSyncedFlushResult thing. I will make the required changes once we can decide on a name.

@javanna
Copy link
Member

javanna commented May 8, 2018

@sohaibiftikhar the idea would be to have something that resembles more the json response, and get away from the current response and the heavy object that it holds. SyncedFlushResponse is still good I think, it is going to be in an another package anyway, and renaming it is always possible later if we wish to do so.

@nik9000
Copy link
Member

nik9000 commented May 8, 2018

I think I'd make an object to hold whatever we return instead of reusing ShardRouting. I see that we call toXContent on shard routing when we render the response but I'm don't think it is worth trying to rebuild the object in the client. It feels like it should be internal data that we should have never made part of the client in the first place.

@sohaibiftikhar
Copy link
Contributor Author

sohaibiftikhar commented May 8, 2018

I see that we call toXContent on shard routing when we render the response but I'm don't think it is worth trying to rebuild the object in the client

@nik9000 I get your point. But we pretty much serialize everything in ShardRouting. And it is not even flat. It serializes the AllocationId and RecoverySource and UnassignedInfo. The only part it leaves out is the SnapshotId::uuid when we have a SnapshotRecoverySource. So I don't understand how discarding deserialization of ShardRouting would help. We would need to replicate all the nested objects. With the only difference being the uuid that I mentioned above.

@nik9000
Copy link
Member

nik9000 commented May 8, 2018

I talked to @sohaibiftikhar over another channel and convinced him that we shouldn't reuse ShardRouting in the high level rest client. My argument was mostly that the class and the things that it touches feel internal and we don't really want to make guarantees that we won't change their shape in the future.

@sohaibiftikhar
Copy link
Contributor Author

Since the required changes are more or less entirely different is it okay if I move this to a separate PR?

@nik9000
Copy link
Member

nik9000 commented May 9, 2018

Since the required changes are more or less entirely different is it okay if I move this to a separate PR?

Either way is fine with me!

@sohaibiftikhar
Copy link
Contributor Author

Moved to #30650

@sohaibiftikhar sohaibiftikhar deleted the synced_flush branch June 1, 2018 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants