Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(observability): fix bugs found from product review + negative cases #2158

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

odeke-em
Copy link
Contributor

This change adds recording of retry span annotations, catching cases in which exceptions where thrown but spans were not ended while testing out and visually confirming the results.

@odeke-em odeke-em requested review from a team as code owners October 12, 2024 09:25
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: spanner Issues related to the googleapis/nodejs-spanner API. labels Oct 12, 2024
@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from 62bf905 to 1b7d8b6 Compare October 12, 2024 09:26
@product-auto-label product-auto-label bot added size: l Pull request size is large. size: xl Pull request size is extra large. and removed size: m Pull request size is medium. size: l Pull request size is large. labels Oct 12, 2024
src/transaction.ts Outdated Show resolved Hide resolved
src/database.ts Outdated
if (err) {
setSpanError(span, err!);
}
await runner.run().then(release, async err => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are already doing .then() on runner.run(), why do we then need to await here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odeke-em it will work even if we don't await on runner.run(). please check and remove the await

src/database.ts Outdated
this.pool_.release(session!);
};

const runner = new TransactionRunner(
session!,
transaction!,
(err, resp) => {
async (err, resp) => {
if (err) {
setSpanError(span, err!);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
setSpanError(span, err!);
setSpanError(span, err);

we already are using if condition

@@ -754,9 +755,11 @@ export class Snapshot extends EventEmitter {
(err as grpc.ServiceError).code === grpc.status.ABORTED
)
) {
span.addEvent('Transaction Aborted, retrying begin', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block is for the non Aborted errors. Change Transaction threw error, retrying with explicit begin

src/database.ts Outdated
@@ -3222,14 +3223,14 @@ class Database extends common.GrpcServiceObject {
span.addEvent('No session available', {
'session.id': session?.id,
});
this.runTransaction(options, runFn!);
await this.runTransaction(options, runFn!);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have we added await on this.runTransaction. This does not return a promise nor its a async method.

src/database.ts Outdated
span.end();
return;
}

if (err) {
await runFn!(err as grpc.ServiceError);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here ?

src/database.ts Outdated

if (isSessionNotFoundError(err)) {
span.addEvent('No session available', {
'session.id': session?.id,
});
release();
this.runTransaction(options, runFn!);
await this.runTransaction(options, runFn!);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again why async here?

src/database.ts Outdated

try {
const result = await runner.run();
span.end();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span.end() should be done in finally block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, on an exception this code retries in the while loop above so that's not correct to end it in the finally block. It should be ended only on success.

const traceConfig = {
tableName: table,
opts: this._observabilityOptions,
dbName: this._dbName!,
};
return startTrace('Snapshot.createReadStream', traceConfig, span => {
let attempt = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the attempt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surbhigarg92 why? The feature docs ask that we record the attempt number for streams.

@@ -754,9 +764,16 @@ export class Snapshot extends EventEmitter {
(err as grpc.ServiceError).code === grpc.status.ABORTED
)
) {
this.begin();
span.addEvent('Stream broken. Safe to retry', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these two spans for stream broken. We are not retrying broken stream here , that is happening in partial-result-stream class. https:/googleapis/nodejs-spanner/blob/main/src/partial-result-stream.ts#L558-L586
Maybe lets skip these two spans for now and move on.

Comment on lines +1303 to +1311
if (!resumeToken) {
if (attempt === 0) {
span.addEvent('Starting stream');
} else {
span.addEvent('Re-attempting start stream', {attempt: attempt});
}
} else {
span.addEvent('Resuming stream', {
resume_token: resumeToken!.toString(),
attempt: attempt,
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep a simple span like we have done in other method without attempt

if (resumeToken) {
          span.addEvent('Resuming stream', {
            resume_token: resumeToken!.toString(),\
          });
        }
        ```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow what you mean here @surbhigarg92. When do we record the beginning attempt or the re-attempts?

@@ -1330,8 +1364,16 @@ export class Snapshot extends EventEmitter {
(err as grpc.ServiceError).code === grpc.status.ABORTED
)
) {
this.begin();
span.addEvent('Stream broken. Safe to retry', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. remove these events

callback(err, resp);
});
}, callback);
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no change is required here. As we are doing ".then" on begin call.

@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from 91c5281 to 508243f Compare October 18, 2024 06:06
@alkatrivedi alkatrivedi added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Oct 18, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 18, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 18, 2024
@alkatrivedi alkatrivedi added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Oct 18, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 18, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 18, 2024
@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from 7bdc173 to a3589e5 Compare October 20, 2024 03:28
odeke-em added a commit to odeke-em/nodejs-spanner that referenced this pull request Oct 21, 2024
Extracted out of PR googleapis#2158, this change traces
Database.runTransactionAsync. However, testing isn't effective
because of bugs such as googleapis#2166.
odeke-em added a commit to odeke-em/nodejs-spanner that referenced this pull request Oct 21, 2024
Extracted out of PR googleapis#2158, this change traces
Database.runTransactionAsync. However, testing isn't effective
because of bugs such as googleapis#2166.
odeke-em added a commit to odeke-em/nodejs-spanner that referenced this pull request Oct 21, 2024
Extracted out of PR googleapis#2158, this change traces
Database.runTransactionAsync.

Updates googleapis#2079
This change adds recording of retry span annotations,
catching cases in which exceptions where thrown but spans
were not ended while testing out and visually confirming the results.
@odeke-em odeke-em force-pushed the trace-fix-bugs-from-product-review+negative-cases branch from e0e31f5 to 13bc9f5 Compare October 21, 2024 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/nodejs-spanner API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants