Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates are not seen inside transaction created using beginTransaction() (until commit) #3295

Closed
Incanus3 opened this issue Dec 16, 2023 · 13 comments · Fixed by #3301
Closed
Assignees
Milestone

Comments

@Incanus3
Copy link
Contributor

Incanus3 commented Dec 16, 2023

Expected behavior

When I run update query (using Database.update() or Query.update()) and then read the value from db inside the same transaction created using beginTransaction() before calling Transaction.commit(), I would like to see the update value.

Actual behavior

I get the old value.

Steps to reproduce

@Entity
@DbName("ea")
@Table(name = "T_OBJECT")
data class EaObject(
    @Id
    @Column(name = "OBJECT_ID", nullable = false)
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    var id: Long = 0,

    @Column(name = "NAME")
    var name: String? = null,
) {
    override fun equals(other: Any?): Boolean = other is EaObject && other.id == this.id
    override fun hashCode(): Int = id.hashCode()
}

class DummyTestApplication

// this is needed for TestPropertySource to work
@SpringBootTest(classes = [DummyTestApplication::class])
@TestPropertySource(
    properties = [
        "logging.level.io.ebean.SQL=DEBUG",
        "logging.level.io.ebean.TXN=DEBUG",
    ],
)
class EbeanUpdateThenReadInTransactionTest {
    private val dsConfig = DataSourceConfig().also {
        it.url = "jdbc:h2:mem:eadb"
        it.driver = "org.h2.Driver"
        it.username = "user"
        it.password = "pass"

        it.addProperty("NON_KEYWORDS", "VALUE")
        it.addProperty("quoteReturningIdentifiers", false)
    }
    private val dbConfig = DatabaseConfig().also {
        it.name = "ea"
        it.isDdlRun = true
        it.isDdlGenerate = true
        it.isDefaultServer = false
        it.currentUserProvider = CurrentUserProvider { "test" }

        it.setDataSourceConfig(dsConfig)

        it.addClass(EaObject::class.java)
    }
    private val database = DatabaseFactory.create(dbConfig)

    @AfterEach
    fun tearDown() {
        // probably not supported by H2 - throws
        // org.h2.jdbc.JdbcSQLSyntaxErrorException: Cannot truncate "PUBLIC.T_OBJECT"; SQL statement:
        // truncate table T_OBJECT [90106-214]
        // database.truncate(EaObject::class.java)

        QEaObject(database).delete()
    }

    // this works
    @Test
    fun `using createTransaction()`() {
        val object1 = EaObject(id = 111L, name = "object1").also { database.save(it) }

        database.createTransaction().use {
            printVisibly("before update", objectName(111L)) // "object1" as expected

            object1.name = "updated"
            database.update(object1)

            printVisibly("before commit", objectName(111L)) // "updated" as expected
        }
    }

    // this doesn't
    @Test
    fun `using beginTransaction()`() {
        val object1 = EaObject(id = 111L, name = "object1").also { database.save(it) }

        database.beginTransaction().use {
            printVisibly("before update", objectName(111L)) // "object1" as expected

            object1.name = "updated"
            database.update(object1)

            // this doesn't work either
            // QEaObject(database).where().id.eq(111L)
            //     .asUpdate().set(QEaObject._alias.name, "updated").update()

            it.commitAndContinue() // this doesn't help

            printVisibly("before commit", objectName(111L)) // "object1", but should be "updated"

            it.commit() // this does, but it shouldn't be needed

            printVisibly("after commit", objectName(111L)) // "updated" as expected
        }
    }

    @Suppress("SameParameterValue")
    private fun objectName(objectId: Long) =
        database.find(EaObject::class.java, objectId)!!.name

    private fun printVisibly(label: String, value: Any?) {
        println("=".repeat(40))
        println(">>> $label: $value <<<")
        println("=".repeat(40))
    }
}
// output of the failing test

12:13:48.682 [Test worker] DEBUG io.ebean.SQL - txn[] insert into T_OBJECT (OBJECT_ID, NAME, ALIAS, STEREOTYPE, NOTE, TPOS, EA_GUID, OBJECT_TYPE, PARENTID, PACKAGE_ID, BACKCOLOR, PDATA1, CLASSIFIER, STATUS, Version, CREATEDDATE, MODIFIEDDATE, AUTHOR) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?); -- bind(111,object1,null,null,null,null,{ea8664db-6be5-4de6-879d-05db3894c301},null,null,0,-1,null,0,null,1.0,Sat Dec 16 12:13:48 CET 2023,Sat Dec 16 12:13:48 CET 2023,test)
12:13:48.688 [Test worker] DEBUG io.ebean.TXN - txn[] Commit
12:13:48.740 [Test worker] DEBUG io.ebean.SQL - txn[] select t0.OBJECT_ID, t0.NAME, t0.ALIAS, t0.STEREOTYPE, t0.NOTE, t0.TPOS, t0.EA_GUID, t0.OBJECT_TYPE, t0.PARENTID, t0.PACKAGE_ID, t0.BACKCOLOR, t0.PDATA1, t0.CLASSIFIER, t0.STATUS, t0.Version, (case when t0.object_type = 'Package' then 1 else 2 end), t0.CREATEDDATE, t0.MODIFIEDDATE, t0.AUTHOR from T_OBJECT t0 where t0.OBJECT_ID = ?; --bind(111, ) --micros(10,038)
========================================
>>> before update: object1 <<<
========================================
12:13:48.749 [Test worker] DEBUG io.ebean.SQL - txn[] update T_OBJECT set NAME=?, MODIFIEDDATE=?, AUTHOR=? where OBJECT_ID=?; -- bind(updated,Sat Dec 16 12:13:48 CET 2023,test,111)
12:13:48.750 [Test worker] DEBUG io.ebean.TXN - txn[] Commit
========================================
>>> before commit: object1 <<<
========================================
12:13:48.751 [Test worker] DEBUG io.ebean.TXN - txn[] Commit
12:13:48.753 [Test worker] DEBUG io.ebean.SQL - select t0.OBJECT_ID, t0.NAME, t0.ALIAS, t0.STEREOTYPE, t0.NOTE, t0.TPOS, t0.EA_GUID, t0.OBJECT_TYPE, t0.PARENTID, t0.PACKAGE_ID, t0.BACKCOLOR, t0.PDATA1, t0.CLASSIFIER, t0.STATUS, t0.Version, (case when t0.object_type = 'Package' then 1 else 2 end), t0.CREATEDDATE, t0.MODIFIEDDATE, t0.AUTHOR from T_OBJECT t0 where t0.OBJECT_ID = ?; --bind(111, ) --micros(308)
========================================
>>> after commit: updated <<<
========================================
12:13:48.769 [Test worker] DEBUG io.ebean.SQL - txn[] select t0.OBJECT_ID from T_OBJECT t0; --bind() --micros(470)
12:13:48.776 [Test worker] DEBUG io.ebean.SQL - txn[] delete from T_OBJECTPROPERTIES where (OBJECT_ID) in (?); -- bind(Array[1]={111}) rows(0)
12:13:48.783 [Test worker] DEBUG io.ebean.SQL - txn[] select t0.CONNECTOR_ID from T_CONNECTOR t0 where (START_OBJECT_ID) in (?); --bind(Array[1]={111}) --micros(409)
12:13:48.784 [Test worker] DEBUG io.ebean.SQL - txn[] select t0.CONNECTOR_ID from T_CONNECTOR t0 where (END_OBJECT_ID) in (?); --bind(Array[1]={111}) --micros(340)
12:13:48.786 [Test worker] DEBUG io.ebean.SQL - txn[] delete from T_OBJECT where OBJECT_ID  in (?); -- bind(Array[1]={111}) rows(1)
12:13:48.787 [Test worker] DEBUG io.ebean.TXN - txn[] Commit

Context

Just updated to ebean version 13.25.1, but the behavior persists. Is this a bug, or am I doing sth wrong? Should I reproduce this in the minimal ebean test repository?

@jnehlmeier
Copy link

I am not 100% sure but here is my thinking (see comments)

fun `using createTransaction()`() {

  // Here you create a new entity and save it to db using auto-commit mode
  val object1 = EaObject(id = 111L, name = "object1").also { database.save(it) }

  // Here you create a transaction that is not managed by ebean. It is meant to be 
  // passed around manually and ebean has many methods that have a transaction parameter.
  // The code in your use block does not use these methods and thus everything you do 
  // below is still done in auto-commit mode. 
  // So you get a SELECT, an UPDATE and a SELECT again.
  // I think if you would have used the Ebean methods that take a transaction the test
  // would fail as well, since object1 has never been loaded within the transaction and 
  // the update will become a stateless update.
  // See comments in your other test below.
  database.createTransaction().use {
    printVisibly("before update", objectName(111L)) // "object1" as expected

    object1.name = "updated"
    database.update(object1)

    printVisibly("before commit", objectName(111L)) // "updated" as expected
  }
}




fun `using beginTransaction()`() {
  // this creates a new entity and saves it using auto-commit mode
  val object1 = EaObject(id = 111L, name = "object1").also { database.save(it) }

  // here you start a transaction and each transaction has its own persistence context / cache
  database.beginTransaction().use {

    // you load an entity from database by id and output its name.
    // The entity you load here is not the same instance as object1. It is a new instance 
    // and that new instance is stored in the persistence context / cache of this 
    // transaction.
    printVisibly("before update", objectName(111L)) // "object1" as expected

    // Here you modify the entity created and saved before the transaction has started.
    // This entity instance has never been loaded during the existence of the transaction 
    // and thus is not known to persistence context / cache.
    // Because it has an ID, ebean will do a stateless update in the DB, which means 
    // it will just generate an update query by using object1 as a template
    object1.name = "updated"
    database.update(object1)

    // I think this causes the stateless update query to commit to db
    it.commitAndContinue() // this doesn't help

    // Here you load again an entity by the same id as above. The persistence context 
    // already knows this entity because you have already loaded it once. So no select 
    // query is done at all as seen in your console output.
    // In the code above you changed object1 which was never loaded by this transaction.
    printVisibly("before commit", objectName(111L)) // "object1", but should be "updated"

    // This commits the tx created by beginTransaction which basically also 
    // terminates/clears the persistence context / cache
    it.commit() // this does, but it shouldn't be needed

    // Now you are in auto commit mode again and a new select is done.
    printVisibly("after commit", objectName(111L)) // "updated" as expected
  }
}

@Incanus3
Copy link
Contributor Author

Incanus3 commented Dec 17, 2023

hi @jnehlmeier and thanks for your reply. this all makes perfects sense (and I didn't know that createTransaction() works like that, so that's a really useful info), but sadly this doesn't seem to be the cause of the problem. Even if I rewrite the second test as

database.beginTransaction().use { transaction ->
    val object1 = EaObject(id = 111L, name = "object1").also { database.save(it) }

    printVisibly("before update", objectName(111L)) // "object1" as expected

    object1.name = "updated"
    database.update(object1)

    transaction.commitAndContinue() // this doesn't help

    printVisibly("before commit", objectName(111L)) // "object1", but should be "updated"

    transaction.commit() // this does, but it shouldn't be needed

    printVisibly("after commit", objectName(111L)) // "updated" as expected
}

the problem still persists. Actually I first noticed this problem in a test that is completely run inside a transaction - I have a helper base class for this which does beginTransaction() in @BeforeEach and then rollback in @AfterEach.

@Incanus3
Copy link
Contributor Author

By the way, as per your comment

// I think if you would have used the Ebean methods that take a transaction the test
// would fail as well, since object1 has never been loaded within the transaction and

I tried to rewrite the first test using explicit transaction passing:

    @Test
    fun `using createTransaction()`() {
        database.createTransaction().use { transaction ->
            val object1 = EaObject(id = 111L, name = "object1").also { database.save(it, transaction) }

            printVisibly("before update", objectName(111L, transaction)) // "object1" as expected

            object1.name = "updated"
            database.update(object1, transaction)

            transaction.commitAndContinue() // this doesn't help

            printVisibly("before commit", objectName(111L, transaction)) // "object1", but should be "updated"

            transaction.commit() // this does, but it shouldn't be needed

            // BEWARE: this shouldn't use `transaction` as it is already committed and deactivated
            printVisibly("after commit", objectName(111L, transaction)) // "updated" as expected
        }
    }

    @Suppress("SameParameterValue")
    private fun objectName(objectId: Long, transaction: Transaction? = null) =
        if (transaction == null) database.find(EaObject::class.java, objectId)!!.name
        else database.find(EaObject::class.java, objectId, transaction)!!.name

and made two observations:

  1. you're absolutely right that in this case the second printVisibly call also prints the old value as in the case with beginTransaction()
  2. as you can notice, I made a mistake at first and also passed the transaction to the last printVisibly call, which is wrong, because the transaction is already inactive and shouldn't be used. in this case I would expect one of two things to happen:
  • the call will be run outside of any transaction (or in a new one, seeing the already committed changes) and will thus print the new value
  • the call will throw an exception because the used transaction is inactive

but what actually happens is that the call doesn't make any new queryies (see output below) and still prints the old value, which is actually worse than the case with beginTransaction(). Although I understand the difference - there it did a new select outside of transaction, here it reads from the cache - this behavior seems really surprising to me - I see old values even though the data has not only been updated, but also committed:

11:08:52.117 [Test worker] DEBUG io.ebean.SQL - txn[1001] insert into T_OBJECT (OBJECT_ID, NAME, ALIAS, STEREOTYPE, NOTE, TPOS, EA_GUID, OBJECT_TYPE, PARENTID, PACKAGE_ID, BACKCOLOR, PDATA1, CLASSIFIER, STATUS, Version, CREATEDDATE, MODIFIEDDATE, AUTHOR) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?); -- bind(111,object1,null,null,null,null,{278ebe3c-5512-4abe-b7c7-eb1147c5dd89},null,null,0,-1,null,0,null,1.0,Sun Dec 17 11:08:52 CET 2023,Sun Dec 17 11:08:52 CET 2023,test)
11:08:52.118 [Test worker] DEBUG io.ebean.SQL - txn[1001] select t0.OBJECT_ID, t0.NAME, t0.ALIAS, t0.STEREOTYPE, t0.NOTE, t0.TPOS, t0.EA_GUID, t0.OBJECT_TYPE, t0.PARENTID, t0.PACKAGE_ID, t0.BACKCOLOR, t0.PDATA1, t0.CLASSIFIER, t0.STATUS, t0.Version, (case when t0.object_type = 'Package' then 1 else 2 end), t0.CREATEDDATE, t0.MODIFIEDDATE, t0.AUTHOR from T_OBJECT t0 where t0.OBJECT_ID = ?; --bind(111, ) --micros(530)
========================================
>>> before update: object1 <<<
========================================
11:08:52.119 [Test worker] DEBUG io.ebean.SQL - txn[1001] update T_OBJECT set NAME=?, MODIFIEDDATE=?, AUTHOR=? where OBJECT_ID=?; -- bind(updated,Sun Dec 17 11:08:52 CET 2023,test,111)
11:08:52.121 [Test worker] DEBUG io.ebean.TXN - txn[1001] Commit
========================================
>>> before commit: object1 <<<
========================================
11:08:52.122 [Test worker] DEBUG io.ebean.TXN - txn[1001] Commit
========================================
>>> after commit: object1 <<<
========================================
11:08:52.123 [Test worker] DEBUG io.ebean.SQL - txn[1002] select t0.OBJECT_ID from T_OBJECT t0; --bind() --micros(347)
11:08:52.124 [Test worker] DEBUG io.ebean.SQL - txn[1002] delete from T_OBJECTPROPERTIES where (OBJECT_ID) in (?); -- bind(Array[1]={111}) rows(0)
11:08:52.125 [Test worker] DEBUG io.ebean.SQL - txn[1002] select t0.CONNECTOR_ID from T_CONNECTOR t0 where (START_OBJECT_ID) in (?); --bind(Array[1]={111}) --micros(347)
11:08:52.126 [Test worker] DEBUG io.ebean.SQL - txn[1002] select t0.CONNECTOR_ID from T_CONNECTOR t0 where (END_OBJECT_ID) in (?); --bind(Array[1]={111}) --micros(368)
11:08:52.127 [Test worker] DEBUG io.ebean.SQL - txn[1002] delete from T_OBJECT where OBJECT_ID  in (?); -- bind(Array[1]={111}) rows(1)
11:08:52.127 [Test worker] DEBUG io.ebean.TXN - txn[1002] Commit

@rbygrave
Copy link
Member

rbygrave commented Dec 18, 2023

Refer to https://ebean.io/docs/query/option#persistenceContextScope ... use query.setPersistenceContextScope(QUERY) such that the bean is read from the database (does not find and return the bean from the persistence context which is transaction scoped).

Use:

  Customer customer =
    new QEaObject()
      .id.equalTo(111)
      // ignore L2 cache
      .setUseCache(false)                       

      // use query scoped persistence context rather than transaction scoped persistence context
      // ... some would describe this as "ignore L1 cache" but "query scoped persistence context" is more accurate
      .setPersistenceContextScope(QUERY)         
      .findOne();

@Incanus3
Copy link
Contributor Author

hi @rbygrave and thanks, this is nice as a work-around, but from this reply, should I understand that this is an expected behavior? I would think that this shouldn't be needed by default, that results of updates ran inside the same transaction should just be seen as usual. Or is there something special about this case so that this is needed?

@rbygrave
Copy link
Member

Probably first is to understand that the persistence context is there for "consistent object graph building" which means that when a query is executed it uses the persistence context to "de-duplicate" instances that have already been loaded / "de-duplicate on ToMany and ToOne relationships".

By default the persistence context is scoped to the transaction ... just like JPA. (known as "transaction scoped persistence context"). Also the "persistence context" is also and perhaps more commonly known as the L1 cache but note that it's primary job is for "consistent object graph building" and imo it's just more that this also works like a cache - hence L1 cache / Level 1 cache.

With Ebean we have the option of changing this on a per-query basis or indeed globally.

On a per-query basis we use query.setPersistenceContextScope(QUERY). We'd look to use this on a use case where we know/desire a "fresh" version of the row/bean from the database.

There are folks that also change this globally for an application via DatabaseConfig.setPersistenceContextScope(). This then makes it different from JPA per say. Each query now gets its own persistence context (rather than the transaction scoped one).

@rbygrave
Copy link
Member

that results of updates ran inside the same transaction should just be seen as usual

If I read it correctly there are 2 instances used here for the logical same row (the first printVisibly loads the second instance and puts that into the persistence context but then the code does not use THAT instance to do the update and instead uses the first instance that was used for the insert).

So the instance that is first queried [and loaded into the transaction scoped persistence context] is not subsequently used for the update etc [so its held in memory with id 111L with always that old original value].

@Incanus3
Copy link
Contributor Author

Incanus3 commented Dec 18, 2023

Ooh, that's pretty subtle. Okay, I hope I understand now, but please just let me confirm, so that we're on the same page.

  • first object1 is saved into the database and presumably put into the persistence context
  • then database.find() is called with the same id (through objectName()), which loads a new instance of the same record and overwrites object1 in the context, basically invalidating it
  • when database.update() is then called with object1, the update query is correctly triggered, but since this object is no longer present in the context (I presume the L1 cache lookup is not done by the pkey, but using some internal object id), the instance that is there now (put there by the .find() call) doesn't get updated, because ebean doesn't know it represents the same record
  • unless I do some write operation using the instance that's currently in the context, all further .find() calls will return the cached (and stale) object loaded by the first .find() call

did I get this right? What I still don't understand is, how do stateless updates fit into this. I also tried to replace the name set and database.update() call with

QEaObject(database).where().id.eq(111L).asUpdate().set(QEaObject._alias.name, "updated").update()

which didn't work either, but does that mean that all cached objects changed by stateless updates (or any updates that don't use the currently-cached object) should be considered stale afterwards and have to be reloaded with this .setPersistenceContextScope(QUERY) modifier? Also, if I understand correctly, with this modifier, the find query will return a fresh object, but the stale one will still remain in the transaction-scoped persistence context, so if someone does another .find() call without this modifier, he/she will get the stale object again.

In the bigger picture, although this still seems to me like a considerable overhead to keep in mind all the context interactions and their invalidation consequences, I understand there are advantages for the persistence context behaving this way and I guess with a bit of practice one can get used to this and not make too many mistakes like this.

What seems most scary about this to me is the case (which is definitely not rare at least in our project) in which the code that does the update (or another action that will invalidate some previously loaded instances) and the code affected by it may be in completely different parts of the codebase and not know about each other's internal behavior. I can easily imagine a piece of code which 1) starts a transaction loads some records, does something with them, then 2) calls some function which may do a stateless update or find the same records again and update them (and the calling side doesn't and shouldn't know about that, because of separation of concerns, boundaries, etc.), thus invalidating the cache, then returns, after which 3) the calling side wants to do some more actions, but will get stale objects, similarly to my example.

What would be your suggestions to avoid problems like this. I don't think it's practical to add the setPersistenceContextScope(QUERY) modifier to all queries subsequent to this "external" function call and I also don't really want to disable the "caching" altogether. Is there some way to just "flush" the L1 cache? I noticed there is actually a flush() method on Transaction, but it isn't documented, is this the correct way to protect against such danger?

Lastly, I found some documentation on the L2 cache on ebean website, but except for this I can't find mentions of the L1 cache (or the "caching" behavior of the persistence context in general, only the graph-building one concerning related object fetching) and its differences from L2. Is there some page that I missed?

As always, thank you so much for both your replies and the great work you guys do maintaining this library, I do get confused by some of these internals from time to time, but barring these moments, I'm so glad we decided to make this switch and it was one of the best decisions we've made.

@Incanus3
Copy link
Contributor Author

Oh, I just got to this again and I found out, that the definition of flush I was looking at was decompiled, that's why there was no documentation. After I downloaded the sources and reindexed, I'm seeing those comments. Still not 100% sure this is the right method to call to ensure "clean cache" though.

@rob-bygrave
Copy link
Contributor

rob-bygrave commented Dec 20, 2023

Hmm, I'm going to try another angle / explanation so lets see how this goes.

ORM queries are "graph building" - building a graph of connected objects [from sql result sets]

Example:

  • Start a transaction
  • fetch Customer 42 into memory
  • fetch some Orders, some of which are related to Customer 42

Q: How many instances of Customer 42 are held in-memory by the application?
A: By default 1 instance of Customer 42 is held in-memory with "transaction scoped persistence context"

Its of the job of the persistence context to de-dup [when reading the resultSet and building objects into the graph] and ensure there is only 1 instance of Customer 42 [and by default this is scoped to the transaction].

Now, conversely if the answer is that there are multiple instances that represent Customer 42 ... it can get interesting if the application looks to mutate Customer 42 and save it etc.

So generally we prefer to build consistent object graphs where we know there is only 1 instance of Customer 42 held in-memory [by the transaction]. To a large extent imo this is the main definition and requirement of an ORM - to build consistent graphs.

The "scope" of this is also known as "Unit of Work" or "Session" - we work with a graph of connected entity beans on a "unit of work" basis which by default is exactly the same scope of a transaction. "Transaction scoped Unit of Work" ... and in that unit of work we expect 1 instance of Customer 42.


setPersistenceContextScope(QUERY)

This is not "refreshing" the persistence context but instead creating a wholly independent graph of objects. It is a wholly independent persistence context that contains all the objects built by the query PLUS any objects built subsequently via lazy loading on any of those objects.


stateless update

This is a feature that is expected to be used to avoid loading the bean at all. That is, if we are loading the bean then we really expect code to mutate the bean and save() or update() the bean and NOT use stateless update.

Stateless update is about performing an update without loading the bean / avoid the select to fetch the bean in the first place.

It is not expected that we load the bean but then use stateless update to update the row that the bean represents.


"bulk updates"

Bulk updates are good and we should use them BUT ebean doesn't know what rows are updated - there is no mechanism for ebean to determine which rows were modified and then remove the associated entries from the persistence context.

If we build entity beans before a bulk update - yes those beans could be "stale" [because the related rows were modified by the bulk sql update statement]. Currently, the expectation would be to refresh() them or not use them or use setPersistenceContextScope(QUERY)

My gut is telling me that the confusion around this is when we mix using entity beans with bulk updates &/or stateless updates.


flush()

This does not clear() the persistence context. What this does is flush any batched insert, update, delete statements [that have been batched up in order to use JDBC batch].


Questions that you might have asked probably ...

Q: When a stateless update is executed, does it clear that entry from the persistence context?
A: Yes it does.

Q: But in the last example ... huh?
A: The bean that does the insert isn't put into the persistence context, its then updated but that isn't a stateless update [its a normal update on a bean that was just inserted ]

Q: So a bean that is inserted doesn't go into the persistence context?
A: No it doesn't ... [and this might get confusing but ... we don't expect to insert a bean and then fetch that same instance again, and with ebean the persistence context is only needed for "graph building" ... so no inserted beans are not added to the persistence context]

Q: So there is no PersistenceContext.clear() method ... that I might want to use after say a bulk update is executed?
A: There is such a method but it isn't public API. If we look on JdbcTransaction there is persistenceContext() which returns the PersistenceContext [but this isn't public API currently] and there is PersistenceContext.clear().

@Incanus3
Copy link
Contributor Author

Wow, thank you for such a detailed explanation.

setPersistenceContextScope(QUERY)

This is not "refreshing" the persistence context but instead creating a wholly independent graph of objects. It is a wholly independent persistence context that contains all the objects built by the query PLUS any objects built subsequently via lazy loading on any of those objects.

Yeah, this I understood

stateless updates & bulk updates

Ok, here I got my terminology completely wrong, I basically thought they're the same thing. What I meant here were what you call "bulk" updates. Now that I actually know what stateless updates are, I'll try to use them more, because they should fix at least some of my problems with stale objects in the persistence context.

flush()

This does not clear() the persistence context. What this does is flush any batched insert, update, delete statements [that have been batched up in order to use JDBC batch].

Yeah, I just tried a few mins ago and saw that this does not help. I also tried to set persistenceContextScope to QUERY as you mentioned and that does indeed help.

Q: But in the last example ... huh?
A: The bean that does the insert isn't put into the persistence context, its then updated but that isn't a stateless update [its a normal update on a bean that was just inserted ]

What I meant here was the alternative (commented out) variant of the test that used "bulk" update.

Q: So there is no PersistenceContext.clear() method ... that I might want to use after say a bulk update is executed?
A: There is such a method but it isn't public API. If we look on JdbcTransaction there is persistenceContext() which returns the PersistenceContext [but this isn't public API currently] and there is PersistenceContext.clear().

:( I still kinda think we do need sth like this in our case. The problem, as I tried to explain with completely wrong terminology, is like this

class ClassOne(
  private val database: Database,
  private val classTwo: ClassTwo,
) {
  fun doSomeBigChange() {
    database.beginTransaction().use { tr ->
      // load a bunch of objects
      // do some stuff with them

      // - ClassTwo is a complete black box from ClassOne's perspective, the only thing ClassOne
      //   knows about it is that it needs to be called at this point, inside the same transaction
      // - this might do bulk updates that might affect some of the previously loaded objects
      //   or it might find them again (and then do "normal" updates or maybe deletes),
      //   which will replace "our" previously loaded objects in the context
      //   (if I understood correctly what happens in my case of printVisibly())
      classTwo.doSomeMoreStuff()
      // - so to be sure, I'd like to clear the context here

      // do some more stuff with the previously loaded objects

      tr.commit()
    }
  }
}

Actually, another option that I think would solve our problem, would be to be able to specify the PersistenceContextScope when creating (or beginning) the transaction. Then if the calling code knows that sth like this might happen, it could just use QUERY pc scope for this transaction and all should be safe (even though it might make more queries than necessary). Of course, I might use setPersistenceContextScope(QUERY) for every query ran after the "external" call, but that just doesn't seem very practical.

@rbygrave
Copy link
Member

be able to specify the PersistenceContextScope when creating (or beginning) the transaction

Yes.

Another alternative that I need to think through is to have the behaviour that the persistence context is cleared [automatically by ebean] after any bulk update or bulk delete statement is executed.

e.g when there is

QEaObject(database).where().<some predicates>.asUpdate().set(QEaObject._alias.name, "updated").update()

Then the persistence context could be automatically cleared for EaObject.class.

note: the persistence context is a map of maps first keyed by entity class type and then keyed by @Id value. It would not need to clear the entire persistence context but instead just clear it for EaObject.class.

When a SqlUpdate is executed it might need to clear the entire persistence context if the table being updated can not be determined.

@Incanus3
Copy link
Contributor Author

Incanus3 commented Dec 22, 2023

Another alternative that I need to think through is to have the behaviour that the persistence context is cleared [automatically by ebean] after any bulk update or bulk delete statement is executed.

Well, that seems ideal to me, it would definitely fix our problems and should prevent other people getting confused when making similar kind of mistakes. But if that's hard to do, being able to specify the scope for each transaction seems to me like an easy win. Thank you again, you've been as always extremely helpful.

rbygrave added a commit that referenced this issue Jan 8, 2024
Using SqlUpdate or an ORM Update query clear the appropriate part of
the PersistenceContext. The effect is that ORM queries executed after
a bulk update will effectively load a fresh copy of the data from the
database and will not reuse an instance from the persistence context
if the bean in question had already been loaded.
@rbygrave rbygrave self-assigned this Jan 8, 2024
rbygrave added a commit that referenced this issue Jan 10, 2024
…ontext

Also make improvements to the test asserts in TestPersistenceContextQueryScope
rbygrave added a commit that referenced this issue Jan 18, 2024
@rbygrave rbygrave added this to the 14.0.0 milestone Jan 19, 2024
rbygrave added a commit that referenced this issue Feb 15, 2024
…-update

#3295 Clear PersistenceContext on execution of bulk updates or deletes
rbygrave added a commit that referenced this issue Apr 4, 2024
…d mixed queries

14.0.0 via #3295 introduced this bug.

 #3295 introduced a behaviour where bulk updates clear the persistence context. Now the lazy loading of a BeanCollection works with an assumption that the "parent bean" is in the persistence context (and this assumption is broken with that change for this test case). That is, the bulk update is clearing the persistence context - removing the parent bean BEFORE the lazy loading is invoked ... and that doesn't then work because the parent bean isn't in the persistence context.

 This fix means that when lazy loading many's, the parent beans are putIfAbsent into the persistence context.
rbygrave added a commit that referenced this issue Apr 4, 2024
…d mixed queries

14.0.0 via #3295 introduced this bug.

 #3295 introduced a behaviour where bulk updates clear the persistence context. Now the lazy loading of a BeanCollection works with an assumption that the "parent bean" is in the persistence context (and this assumption is broken with that change for this test case). That is, the bulk update is clearing the persistence context - removing the parent bean BEFORE the lazy loading is invoked ... and that doesn't then work because the parent bean isn't in the persistence context.

 This fix means that when lazy loading many's, the parent beans are putIfAbsent into the persistence context.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants