[Segment Replication] [BUG] Missing documents post ingestion during primary failover and relocation #5946

dreamer-89 · 2023-01-19T23:37:15Z

Describe the bug
Coming from testing performed in #5898 on actual 3 node cluster, the searchable doc count does not match with the ingested doc count. I observed same with primary relocation behavior (backported #5344 & #5898 to test relocation behavior) as well. Cutting single issue as I believe the underlying cause is same for both.

Failover Case

To Reproduce

Create an index with segrep enabled

curl -X PUT "http://localhost:9200/test-index" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "replication.type": "SEGMENT"
    }
  }
}'

Ingest documents via script. Run this in multiple tabs. I used 3.

for i in {1..20000}
do
   curl --location --request POST "localhost:9200/test-index/_doc" \
    --header 'Content-Type: application/json' \
    --data-raw "{
      \"name\":\"abc${i}\"
    }"
    echo "\n"
done

Verify 1 primary & 1 replica created

[root@ip-10-0-4-169 opensearch]# curl -X GET "localhost:9200/_cat/shards?v"
index      shard prirep state   docs store ip         node
test-index 0     p      STARTED    0  208b 10.0.5.122 ip-10-0-5-122.us-west-2.compute.internal
test-index 0     r      STARTED    0  208b 10.0.3.198 ip-10-0-3-198.us-west-2.compute.internal

Kill node containing primary which instantly promotes replica as primary

[root@ip-10-0-5-122 ~]# ps -aux | grep opensearch
[root@ip-10-0-5-122 ~]# kill 2945

[root@ip-10-0-4-169 opensearch]# curl -X GET "localhost:9200/_cat/shards?v"
index      shard prirep state      docs   store ip         node
test-index 0     p      STARTED    2980 174.7kb 10.0.3.198 ip-10-0-3-198.us-west-2.compute.internal
test-index 0     r      UNASSIGNED

Wait for ingestion to complete. The searchable doc count is less than ingested. Index refresh and flush don't have any impact on count.

[root@ip-10-0-4-169 opensearch]# curl -X GET "localhost:9200/_cat/shards?v"
index      shard prirep state    docs store ip         node
test-index 0     p      STARTED 59997 2.3mb 10.0.3.198 ip-10-0-3-198.us-west-2.compute.internal
test-index 0     r      STARTED 59997 3.1mb 10.0.4.82  ip-10-0-4-82.us-west-2.compute.internal

root@ip-10-0-4-169 opensearch]# curl -X POST localhost:9200/test-index/_refresh
{"_shards":{"total":2,"successful":2,"failed":0}}[root@ip-10-0-4-169 opensearch]#
[root@ip-10-0-4-169 opensearch]# curl -X GET "localhost:9200/_cat/shards?v"
index      shard prirep state    docs store ip         node
test-index 0     p      STARTED 59997 2.2mb 10.0.3.198 ip-10-0-3-198.us-west-2.compute.internal
test-index 0     r      STARTED 59997 2.2mb 10.0.4.82  ip-10-0-4-82.us-west-2.compute.internal

Expected behavior
All ingested documents should be searchable

Host/Environment (please complete the following information):

OS: Linux AL2
Version: OpenSearch 2.x

Additional context

Setup. 3 data node (c5.xlarge), 1 master (c5.xlarge)

The text was updated successfully, but these errors were encountered:

dreamer-89 · 2023-01-20T01:11:38Z

Tried this on latest changes on main and it is reproducible.

dreamer-89 · 2023-01-20T20:01:06Z

Had internal discussion with @mch2 and others on this where we decided to rule out data loss issue. Performed another round of test, added another doc post repro steps above. But, it does not solve the issue. Checking more.

dreamer-89 · 2023-01-22T05:33:45Z

This issue appears pretty consistently with below integration test.

    public void testConcurrentIngestion() throws Exception {
        final String primary = internalCluster().startNode();
        createIndex(INDEX_NAME);
        final String replica = internalCluster().startNode();
        ensureGreen(INDEX_NAME);

        final int ingestionThreadCount = 3;
        final int docCount = 2000;
        final ConcurrentLinkedDeque<ActionFuture<IndexResponse>> pendingIndexResponses = new ConcurrentLinkedDeque<>();
        AtomicInteger integer = new AtomicInteger();
        Thread ingestionThreads[] = new Thread[3];
        for(int i=0;i<ingestionThreadCount;i++) {
            ingestionThreads[i] = new Thread(() -> {
                for (int j = 0; j < docCount; j++) {
                    pendingIndexResponses.add(
                        client().prepareIndex(INDEX_NAME)
                            .setId(Integer.toString(integer.incrementAndGet()))
                            .setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL)
                            .setSource("field", "value" + j)
                            .execute()
                    );
                }
            });
        }
        for(int i=0;i<ingestionThreadCount;i++)  {
            ingestionThreads[i].start();
        }
        internalCluster().restartNode(primary);
        ensureGreen(INDEX_NAME);
        assertEquals(getNodeContainingPrimaryShard().getName(), replica);

        for(int i=0;i<ingestionThreadCount;i++) {
            ingestionThreads[i].join();
        }
        assertBusy(() -> {
            client().admin().indices().prepareRefresh().execute().actionGet();
            assertTrue(pendingIndexResponses.stream().allMatch(ActionFuture::isDone));
        }, 1, TimeUnit.MINUTES);

        flushAndRefresh(INDEX_NAME);
        waitForReplicaUpdate();
        assertDocCounts(ingestionThreadCount * docCount, primary, replica);
    }

dreamer-89 · 2023-02-08T18:57:00Z

Changing the assert on Future::Done to perform Future::get resolves the doc count mis-match on the integration test here. The exceptions from failover (primary change) during ingestion might be marking the future as completed without waiting it to complete.

Though, there are occasional failures due to NodeClosedException as shown below, which can be retried on client side. This issue can be tracked separately.

[2023-02-09T09:10:30,549][INFO ][o.o.i.r.SegmentReplicationIT] [testConcurrentIngestion] [seed=[A13AFB44AA8794E0:6A4A27234FBA3344]] after test
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.indices.replication.SegmentReplicationIT.testConcurrentIngestion {seed=[A13AFB44AA8794E0:6A4A27234FBA3344]}" -Dtests.seed=A13AFB44AA8794E0 -Dtests.opensearch.logger.level=INFO -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=fr -Dtests.timezone=Pacific/Kiritimati -Druntime.java=19

NodeClosedException[node closed {node_t1}{otNskrEbSMaTb9p3UH0SrA}{4Ugw13EpRPi0lEcKN59mmA}{127.0.0.1}{127.0.0.1:49840}{dimr}{shard_indexing_pressure_enabled=true}]
java.util.concurrent.ExecutionException: NodeClosedException[node closed {node_t1}{otNskrEbSMaTb9p3UH0SrA}{4Ugw13EpRPi0lEcKN59mmA}{127.0.0.1}{127.0.0.1:49840}{dimr}{shard_indexing_pressure_enabled=true}]
	at __randomizedtesting.SeedInfo.seed([A13AFB44AA8794E0:6A4A27234FBA3344]:0)
	at org.opensearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:286)
	at org.opensearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:273)
	at org.opensearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:104)
	at org.opensearch.indices.replication.SegmentReplicationIT.testConcurrentIngestion(SegmentReplicationIT.java:223)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)

Changed integration test

   public void testConcurrentIngestion() throws Exception {
        internalCluster().startClusterManagerOnlyNode();
        final String primary = internalCluster().startNode();
        createIndex(INDEX_NAME);
        final String replica = internalCluster().startNode();
        ensureGreen(INDEX_NAME);

        logger.info("--> Wait for green completed with state {}", client().admin().cluster().prepareState().execute().actionGet().getState());

        final int ingestionThreadCount = 2;
        final int docCount = 2;
        final ConcurrentLinkedDeque<ActionFuture<IndexResponse>> pendingIndexResponses = new ConcurrentLinkedDeque<>();
        AtomicInteger integer = new AtomicInteger();
        AtomicInteger valueInte = new AtomicInteger();
        Thread ingestionThreads[] = new Thread[ingestionThreadCount];
        for(int i=0;i<ingestionThreadCount;i++) {
            logger.info("--> Started ingestion");
            ingestionThreads[i] = new Thread(() -> {
                for (int j = 1; j <= docCount; j++) {
                    synchronized (this) {
                        pendingIndexResponses.add(
                            client().prepareIndex(INDEX_NAME)
                                .setSource("field", "value" + valueInte.getAndIncrement())
                                .setId(Integer.toString(integer.incrementAndGet()))
                                .setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL)
                                .execute()
                        );
                    }
                }
            });
        }
        for(int i=0;i<ingestionThreadCount;i++)  {
            ingestionThreads[i].start();
        }
        logger.info("--> stop the primary");
        internalCluster().stopRandomNode(InternalTestCluster.nameFilter(primary));
        ensureYellow(INDEX_NAME);
        assertEquals(getNodeContainingPrimaryShard().getName(), replica);

        for(int i=0;i<ingestionThreadCount;i++) {
            ingestionThreads[i].join();
        }
        client().admin().indices().prepareRefresh().execute().actionGet();
        for(ActionFuture<IndexResponse> response: pendingIndexResponses) {
            response.actionGet();
        }
        flushAndRefresh(INDEX_NAME);
        assertDocCounts(ingestionThreadCount * docCount, replica);
    }

dreamer-89 · 2023-02-08T18:58:13Z

Generated a new min distribution from main and retried the repro steps from issue description above, the issue is not happening anymore. This may be resolved with #6122 which builds the segNo from SnapshotInfos snapshot (accurate) rather than live version (more recent) on primary.
Closing this issue.

dreamer-89 added bug Something isn't working untriaged labels Jan 19, 2023

dreamer-89 mentioned this issue Jan 19, 2023

[Segment Replication] Update force segment replication round to be synchronous #5898

Merged

6 tasks

dreamer-89 changed the title ~~[Segment Replication] [BUG] Missing documents post ingestion with primary failover and relocation~~ [Segment Replication] [BUG] Missing documents post ingestion during primary failover and relocation Jan 19, 2023

dreamer-89 mentioned this issue Jan 22, 2023

[Meta] Promote Segment Replication out of experimental. #5147

Closed

16 tasks

dreamer-89 mentioned this issue Jan 22, 2023

[Segment Replication] Ingestion round removes previously ingested documents #5969

Closed

dreamer-89 added the distributed framework label Jan 22, 2023

dreamer-89 self-assigned this Jan 23, 2023

anasalkouz removed the untriaged label Jan 23, 2023

dreamer-89 closed this as completed Feb 8, 2023

mch2 mentioned this issue Feb 17, 2023

Segment Replication - Recover all translog ops when flipping to able engine #6352

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Segment Replication] [BUG] Missing documents post ingestion during primary failover and relocation #5946

[Segment Replication] [BUG] Missing documents post ingestion during primary failover and relocation #5946

dreamer-89 commented Jan 19, 2023 •

edited

Loading

dreamer-89 commented Jan 20, 2023

dreamer-89 commented Jan 20, 2023

dreamer-89 commented Jan 22, 2023 •

edited

Loading

dreamer-89 commented Feb 8, 2023 •

edited

Loading

dreamer-89 commented Feb 8, 2023 •

edited

Loading

[Segment Replication] [BUG] Missing documents post ingestion during primary failover and relocation #5946

[Segment Replication] [BUG] Missing documents post ingestion during primary failover and relocation #5946

Comments

dreamer-89 commented Jan 19, 2023 • edited Loading

Failover Case

dreamer-89 commented Jan 20, 2023

dreamer-89 commented Jan 20, 2023

dreamer-89 commented Jan 22, 2023 • edited Loading

dreamer-89 commented Feb 8, 2023 • edited Loading

dreamer-89 commented Feb 8, 2023 • edited Loading

dreamer-89 commented Jan 19, 2023 •

edited

Loading

dreamer-89 commented Jan 22, 2023 •

edited

Loading

dreamer-89 commented Feb 8, 2023 •

edited

Loading

dreamer-89 commented Feb 8, 2023 •

edited

Loading