Skip to content

Commit

Permalink
Update docs on asymmetric network partitions (#18953)
Browse files Browse the repository at this point in the history
* Update docs on asymmetric network partitions

Fixes DOC-11224

Summary of changes:

- Provide more context on asymmetric partitions and how they're no
  longer a thing in CockroachDB v23.1+.  (Since that's quite an old
  version, we don't bother mentioning the version numbers.)
  • Loading branch information
rmloveland authored Oct 10, 2024
1 parent 05a3160 commit afc72a9
Show file tree
Hide file tree
Showing 11 changed files with 22 additions and 27 deletions.
7 changes: 7 additions & 0 deletions src/current/_includes/common/network-partitions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
A network partition occurs when two or more nodes are prevented from communicating with each other in one or both directions. A network partition can be caused by a network outage or a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is [drained and restarted]({% link {{ page.version.version }}/node-shutdown.md %}).

In a **symmetric** partition, node communication is disrupted in both directions. In an **asymmetric** partition, nodes can communicate in one direction but not the other.

CockroachDB protects against asymmetric partitions by converting all asymmetric (uni-directional) network partitions into symmetric (bi-directional) network partitions. This increases cluster resiliency by reducing the number of partition-related failures that can occur. Many temporary symmetric partitions can be recovered from automatically without operator intervention.

The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) or [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) are defined. A partition that cuts off at least `(n-1)/2` replicas from a [range]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-range) will cause range unavailability which will cause some data unavailability. If there are no localities or other constraints on where replicas are placed, then a partition of any ([`num_replicas`]({% link {{ page.version.version }}/configure-replication-zones.md %}#num_replicas) / 2) nodes will likely cause unavailability.
4 changes: 2 additions & 2 deletions src/current/v23.1/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,9 +232,9 @@ Again, firewalls or hostname issues can cause any of these steps to fail.
If the DB Console lists any dead nodes on the [**Cluster Overview** page]({% link {{ page.version.version }}/ui-cluster-overview-page.md %}), then you might have a network partition.
**Explanation:** A network partition prevents nodes from communicating with each other in one or both directions. This can be due to a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is torn down and rebuilt. In a symmetric partition, node communication is broken in both directions. In an asymmetric partition, node communication works in one direction but not the other.
**Explanation:**
The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) are defined. If localities are not defined, a partition that cuts off at least (n-1)/2 nodes will cause data unavailability.
{% include common/network-partitions.md %}
**Solution:**
Expand Down
6 changes: 1 addition & 5 deletions src/current/v23.1/ui-network-latency-page.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,7 @@ For instance, the cluster shown above has nodes in `us-west1`, `us-east1`, and `

Nodes that have lost a connection are displayed in a separate color. This can help you locate a network partition in your cluster.

{{site.data.alerts.callout_info}}
A network partition prevents nodes from communicating with each other in one or both directions. This can be due to a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is torn down and rebuilt. In a symmetric partition, node communication is broken in both directions. In an asymmetric partition, node communication works in one direction but not the other.

The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) are defined. If localities are not defined, a partition that cuts off at least (n-1)/2 nodes will cause data unavailability.
{{site.data.alerts.end}}
{% include common/network-partitions.md %}

Click the **NO CONNECTIONS** link to see lost connections between nodes or [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality), if any are defined.

Expand Down
4 changes: 2 additions & 2 deletions src/current/v23.2/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,9 +232,9 @@ If the DB Console
then you might have a network partition.
**Explanation:** A network partition occurs when two or more nodes are prevented from communicating with each other in one or both directions. A network partition can be caused by a network outage or a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is torn down and rebuilt. In a symmetric partition, node communication is disrupted in both directions. In an asymmetric partition, nodes can communicate in one direction but not the other.
**Explanation:**
The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) are defined. If localities are not defined, a partition that cuts off at least (n-1)/2 nodes will cause data unavailability.
{% include common/network-partitions.md %}
**Solution:**
Expand Down
4 changes: 1 addition & 3 deletions src/current/v23.2/ui-network-latency-page.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,7 @@ Hover over a cell to display more details:
This specific information can help you understand the root cause of the connectivity issue.

{{site.data.alerts.callout_info}}
A network partition occurs when two or more nodes are prevented from communicating with each other in one or both directions. A network partition can be caused by a network outage or a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is torn down and rebuilt. In a symmetric partition, node communication is disrupted in both directions. In an asymmetric partition, nodes can communicate in one direction but not the other.

The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) are defined. If localities are not defined, a partition that cuts off at least (n-1)/2 nodes will cause data unavailability.
{% include common/network-partitions.md %}
{{site.data.alerts.end}}

### Node liveness status
Expand Down
4 changes: 2 additions & 2 deletions src/current/v24.1/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,9 +232,9 @@ If the DB Console
then you might have a network partition.
**Explanation:** A network partition occurs when two or more nodes are prevented from communicating with each other in one or both directions. A network partition can be caused by a network outage or a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is torn down and rebuilt. In a symmetric partition, node communication is disrupted in both directions. In an asymmetric partition, nodes can communicate in one direction but not the other.
**Explanation:**
The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) are defined. If localities are not defined, a partition that cuts off at least (n-1)/2 nodes will cause data unavailability.
{% include common/network-partitions.md %}
**Solution:**
Expand Down
4 changes: 1 addition & 3 deletions src/current/v24.1/ui-network-latency-page.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,7 @@ Hover over a cell to display more details:
This specific information can help you understand the root cause of the connectivity issue.

{{site.data.alerts.callout_info}}
A network partition occurs when two or more nodes are prevented from communicating with each other in one or both directions. A network partition can be caused by a network outage or a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is torn down and rebuilt. In a symmetric partition, node communication is disrupted in both directions. In an asymmetric partition, nodes can communicate in one direction but not the other.

The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) are defined. If localities are not defined, a partition that cuts off at least (n-1)/2 nodes will cause data unavailability.
{% include common/network-partitions.md %}
{{site.data.alerts.end}}

### Node liveness status
Expand Down
4 changes: 2 additions & 2 deletions src/current/v24.2/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,9 +232,9 @@ If the DB Console
then you might have a network partition.
**Explanation:** A network partition occurs when two or more nodes are prevented from communicating with each other in one or both directions. A network partition can be caused by a network outage or a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is torn down and rebuilt. In a symmetric partition, node communication is disrupted in both directions. In an asymmetric partition, nodes can communicate in one direction but not the other.
**Explanation:**
The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) are defined. If localities are not defined, a partition that cuts off at least (n-1)/2 nodes will cause data unavailability.
{% include common/network-partitions.md %}
**Solution:**
Expand Down
4 changes: 1 addition & 3 deletions src/current/v24.2/ui-network-latency-page.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,7 @@ Hover over a cell to display more details:
This specific information can help you understand the root cause of the connectivity issue.

{{site.data.alerts.callout_info}}
A network partition occurs when two or more nodes are prevented from communicating with each other in one or both directions. A network partition can be caused by a network outage or a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is torn down and rebuilt. In a symmetric partition, node communication is disrupted in both directions. In an asymmetric partition, nodes can communicate in one direction but not the other.

The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) are defined. If localities are not defined, a partition that cuts off at least (n-1)/2 nodes will cause data unavailability.
{% include common/network-partitions.md %}
{{site.data.alerts.end}}

### Node liveness status
Expand Down
4 changes: 2 additions & 2 deletions src/current/v24.3/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,9 +232,9 @@ If the DB Console
then you might have a network partition.
**Explanation:** A network partition occurs when two or more nodes are prevented from communicating with each other in one or both directions. A network partition can be caused by a network outage or a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is torn down and rebuilt. In a symmetric partition, node communication is disrupted in both directions. In an asymmetric partition, nodes can communicate in one direction but not the other.
**Explanation:**
The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) are defined. If localities are not defined, a partition that cuts off at least (n-1)/2 nodes will cause data unavailability.
{% include common/network-partitions.md %}
**Solution:**
Expand Down
4 changes: 1 addition & 3 deletions src/current/v24.3/ui-network-latency-page.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,7 @@ Hover over a cell to display more details:
This specific information can help you understand the root cause of the connectivity issue.

{{site.data.alerts.callout_info}}
A network partition occurs when two or more nodes are prevented from communicating with each other in one or both directions. A network partition can be caused by a network outage or a configuration problem with the network, such as when allowlisted IP addresses or hostnames change after a node is torn down and rebuilt. In a symmetric partition, node communication is disrupted in both directions. In an asymmetric partition, nodes can communicate in one direction but not the other.

The effect of a network partition depends on which nodes are partitioned, where the ranges are located, and to a large extent, whether [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) are defined. If localities are not defined, a partition that cuts off at least (n-1)/2 nodes will cause data unavailability.
{% include common/network-partitions.md %}
{{site.data.alerts.end}}

### Node liveness status
Expand Down

0 comments on commit afc72a9

Please sign in to comment.