Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monstache - problem reading from MongoDB secondaries. #724

Open
arekborucki opened this issue Jun 6, 2024 · 1 comment
Open

monstache - problem reading from MongoDB secondaries. #724

arekborucki opened this issue Jun 6, 2024 · 1 comment

Comments

@arekborucki
Copy link

arekborucki commented Jun 6, 2024

We have observed a problem when our Monstache instance uses MongoDB read preference "secondary". When we try to insert a large number of documents into MongoDB in a short time, such as 5,000 documents, not all of them are replicated to Elasticsearch. Approximately 4,950 to 4,970 are replicated. However, when we switch Monstache back to read preference "primary", everything is replicated correctly.
All documents are also correctly replicated if we use the connection string to MongoDB with only the secondary MongoDB node name and the parameter directConnection=true. However, in this case, Monstache cannot insert metadata to the MongoDB database.

Monstache uses a MongoDB view as the replication source in MongoDB. Here is our configuration:

elasticsearch-urls = ["${elastic_url}"]
relate-threads = 6000
relate-buffer = 15000
elasticsearch-max-seconds = 10
elasticsearch-max-bytes = 16777216
resume = true
resume-name = "bc2-${resume_id}-contacts"
change-stream-namespaces = [ "${mongodb_database}.contacts" ]
gzip = true
stats = true
elasticsearch-retry = true
prune-invalid-json = true
dropped-databases = false
dropped-collections = false
elasticsearch-client-timeout = 30
enable-http-server = true

[[mapping]]
namespace = "${mongodb_database}.contacts"
index = "contacts${es_suffix}"

[[mapping]]
namespace = "${mongodb_database}.contacts-view"
index = "contacts${es_suffix}"

[[relate]]
namespace = "${mongodb_database}.contacts"
with-namespace = "${mongodb_database}.contacts-view"
keep-src = false

Could the problem be that Monstache sees the document ID in the oplog, takes that document ID, and sends a query to one of the secondaries, e.g., db.contact-view.find({"id":"xyz"}). However, the document is not yet replicated despite being in the oplog, so it gets zero documents as a result of the query ?

we use MongoDB v6.0 , Monstache v6.7.10

db.adminCommand({ getDefaultRWConcern: 1 })
{
  defaultReadConcern: { level: 'local' },
  defaultWriteConcern: { w: 'majority', wtimeout: 0 },
  updateOpTime: Timestamp({ t: 1717599777, i: 6 }),
  updateWallClockTime: ISODate("2024-06-05T15:02:57.764Z"),
  defaultWriteConcernSource: 'global',
  defaultReadConcernSource: 'implicit',
  localUpdateWallClockTime: ISODate("2024-06-05T15:02:57.765Z"),
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1717673598, i: 4 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1717673598, i: 4 })
}
@hmsta
Copy link

hmsta commented Aug 4, 2024

interesting finding. I was just debugging a similar issue about missing some documents, since I've had used readPreference=nearest so far ..

switched to primary now, since your explanation totally makes sense, especially since I have set the writeConcern to primary only.

        "defaultWriteConcern" : {
                "w" : 1,
                "wtimeout" : 0
        },

Thanks for pointing that out, probably saved me a few hours of debugging :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants