Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problematic rollback semantics with nested transactions #2767

Closed
rgalanakis opened this issue Aug 15, 2023 · 9 comments
Closed

Problematic rollback semantics with nested transactions #2767

rgalanakis opened this issue Aug 15, 2023 · 9 comments

Comments

@rgalanakis
Copy link

Hi there, thanks for building this library!

I am trying to do something like this:

with db.transaction():
  Thing.create()
  try:
    with db.transaction():
      raise SomeException()
  except SomeException as ex:
    assert len(Thing) == 1
    raise ex

But what's happening is:

  • BEGIN
  • Noop because we're in a transaction
  • INSERT
  • Exception is raised
  • ROLLBACK
  • ROLLBACK

That is, the BEGIN and ROLLBACK are unbalanced.

I believe you are aware this is the case, as you have recommended folks use atomic, which handles this balancing. However I do not want to use atomic, since it must create a savepoint, and I don't want to use savepoints (I find it makes the semantics of the transaction far more difficult to reason about, Python ecosystem seems to disagree but I'm not here to argue).

What I'd humbly suggest/ask is either:

  1. Add support for atomic(savepoint=False), so we can use atomic without savepoints, to get the balanced transaction behavior.

  2. Modify _transaction.__exit__ so it only rolls back if we're dealing with the top transaction, something like:

    def __exit__(self, exc_type, exc_val, exc_tb):
        try:
            if self.db.transaction_depth() == 1:
                try:
                  if exc_type: # This logic was outside the transaction_depth() == 1 check
                    self.rollback(False)
                  else:
                    self.commit(False)
                except:
                    self.rollback(False)
                    raise
        finally:
            self.db.pop_transaction()

I don't believe the current transaction rollback behavior is desirable since it breaks the semantics of context managers by leaking actions across contexts (the inner context manager rolls back the transaction begun by the outer transaction manager).

@coleifer
Copy link
Owner

I am not going to change the semantics of the transaction() context-manager's rollback behavior in the presence of an unhandled exception. The explicit use of the transaction() context manager when nesting is, as you note, up to the user to manage -- if you want nested atomic blocks you ought to be using atomic() instead of transaction().

In other words, if you do not want the exception to cause a rollback, catch it inside the nested transaction() block.

Alternatively, you can implement your own transactional context manager if you prefer to only roll-back the outermost, but I think that's problematic since an unhandled exception in a transaction should imply that the whole transaction should be aborted (since there is, effectively, only one transaction).

I'm struggling to think of a real-world scenario where this problem would actually occur.

@rgalanakis
Copy link
Author

rgalanakis commented Aug 16, 2023

I'm struggling to think of a real-world scenario where this problem would actually occur.

Sure.

An API unit test is wrapped in a transaction/atomic, doesn't matter.

The test setup involves fixturing a user.

The endpoint, for resetting a password, has code like:

with db.transaction():
  try:
    code = UserResetCode.verify_code_with_token(token)
  except UserResetCode.Unusable:
    return error_response()
  code.user.verify_email().save()

In the model:

class UserResetCode:
  @classmethod
  def verify_code_with_token(cls, token):
    code = cls.find_usable(token=token)
    with db.transaction():
      if not code:
          raise Unusable()
      # Mark the code used, etc

The Unusable exception, when it hits the transaction() in the verify_code_with_token method, rolls back all the data created in the unit test method. This wasn't a database exception, the transactions in the endpoint didn't end up opening a transaction, so I can't see why we'd want it to roll back the transaction.

The explicit use of the transaction() context manager when nesting is, as you note, up to the user to manage

That's what I'm trying to do. However I can't actually manage internal nesting, because it leaks and rolls back transactions opened up by outer contexts.

if you want nested atomic blocks you ought to be using atomic() instead of transaction().

Right, and I would, but I don't want automatic savepoints. Why not make the savepoint behavior of atomic configurable? While I never point to Django as an example of good design, its atomic does support savepoint=False at least.

EDIT (8/16/2023T10:50:00 Pacific): Fixed the example, there was a nested transaction in the verify method I originally missed.

@coleifer
Copy link
Owner

coleifer commented Aug 16, 2023

I'm still struggling to see it. It doesn't quite make sense to me, e.g., here:

    with db.transaction():
      if not code:
          raise Unusable()
      # Mark the code used, etc

Why wouldn't that exception be raised outside the transaction block?

The current semantics are such that raising an unhandled exception within a transaction-block will rollback. Part of the issue seems to me to be compounded by the decision to raise exceptions for flow-control.

So, with all this being said, the peewee docs are quite clear:

im-1692210111619

I don't intend to change it, as other people may rely on the current functionality -- and it should be quite easy to write your own context-manager subclass if you prefer the semantics you described in the original comment. Nor am I aware of any downsides to permitting savepoints to be used via the atomic() decorator in cases where nesting is variable depending on the way the code is called.

@rgalanakis
Copy link
Author

Why wouldn't that exception be raised outside the transaction block?

It gets handled at the endpoint level, and an appropriate error is returned.

Part of the issue seems to me to be compounded by the decision to raise exceptions for flow-control.

Sure, and when I'm programming in Go I don't use exceptions for flow control, but we're dealing with a language featuring StopIteration- exceptions for flow control is part of standard Python design. It's also done frequently in Peewee, like XDoesNotExist errors.

as other people may rely on the current functionality

This feels like depending on undefined C compiler behavior but ok. I think you can easily define the behavior, though- inner transactions are no-ops, only the outermost transaction does does anything. This is how all the other similar mechanisms in other languages I've used work (I believe it's also this way in other Python frameworks but can't say for sure). Python, maybe because Django did it, seems to be the only one with savepoint being the default- 'nested transactions' being noops is standard/reasonable behavior.

Nor am I aware of any downsides to permitting savepoints to be used via the atomic() decorator in cases where nesting is variable depending on the way the code is called.

Are you saying adding a savepoint=False option to atomic is on the table?

@coleifer
Copy link
Owner

I think you misunderstood the main issue. Consider this example.

If this code is run with the changes you proposed, then any queries executed after the except block will not be executable because Postgres reports that the transaction state is aborted (due to the integrity error):

db = PostgresqlDatabase('peewee_test')

class Reg(db.Model):
    key = CharField(primary_key=True)

db.create_tables([Reg])

with db.transaction() as tx:
    Reg.create(key='k1')
    try:
        with db.transaction() as tx2:
            Reg.create(key='k1')
    except IntegrityError:
        pass

    print(len(Reg))  # Cannot do this, transaction state is messed-up.

With the current Peewee code, the above code runs and produces an output of 0.

Neither of these seem quite correct. That is why it is strongly recommended to use atomic() if you have varying levels of nesting.

@rgalanakis
Copy link
Author

rgalanakis commented Aug 17, 2023

Maybe we should take a step back and look at the what I believe we can agree is a valid use case, and that there's no way to achieve it with peewee's transaction semantics (at least that I can find).

As a user of this library, I would like to be able to reason about a single transaction boundary, and any errors that reach said boundary will result in a rollback. This boundary respects both internal error handling (raising and catching of Python exceptions not having to do with the database) and database error handling (errors result in an aborted transaction).

With the current behavior, there is no way to achieve this- Python exceptions can result in database rollbacks, and database errors can be recovered from.

Put differently, as a user of this library, this library would be easier to reason about if the transaction semantics allowed this:

BEGIN;
INSERT;
INSERT;
INSERT;
-- ROLLBACK or COMMIT;

However, the current behavior requires I reason about this:

BEGIN;
INSERT;
-- Potential ROLLBACK
INSERT;
-- Potential ROLLBACK;
INSERT;
-- Potential ROLLBACK;
-- ROLLBACK or COMMIT

Do you disagree that users of this library should be able to reason about a single transaction boundary? Especially when it can be achieved in a relatively straightforward manner (making nested transaction noop, which is preferable, or adding a savepoint=False argument to atomic?

coleifer added a commit that referenced this issue Aug 17, 2023
@coleifer
Copy link
Owner

Yeah I'm starting to agree that your point-of-view on this one would be an improvement. I've made a small change to the _transaction.__exit__ handler to only initiate the rollback if it is the outermost one, which I think satisfies the original request. See 278c24d

I will note this in the changelog -- any users affected by the change I will deal with if/when they come up.

I am curious why the aversion to using savepoints, but nonetheless I do appreciate your persistence and agree that this is an improvement.

@rgalanakis
Copy link
Author

Perfect, thanks Charles! I ended up subclassing _transaction to make that change, and monkeypatching .transaction, so I'll be glad to remove my patch.

The aversion to using savepoints is that 1) they add database calls that may not even be necessary. I have looked at DB logs of applications where most of the database traffic are SAVEPOINT and RELEASE, and the latency from the calls is not insignificant at that level of chatter, and especially 2) I find them hard to reason about. I try not to hide the fact that we're working with a database, and the semantics of the single transaction boundary are easier to reason about (and I use savepoints in specific instances, like recovering from your example of an integrity error)

By the way if you ever find yourself writing Ruby I'd highly recommend Sequel, it's a really great ORM that I think you'd find a lot of sympathetic design and ideas with the author/community.

Thanks again,
Rob

gshmu pushed a commit to gshmu/peewee that referenced this issue Aug 22, 2023
@coleifer
Copy link
Owner

3.17.0 has been released, which contains this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants