Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add replicated MySQL tutorial #1722

Merged
merged 17 commits into from
Dec 2, 2016

Conversation

enisoc
Copy link
Member

@enisoc enisoc commented Nov 18, 2016

cc @erictune @foxish @kow3ns @janetkuo @kubernetes/sig-apps

This is a replicated MySQL StatefulSet tutorial. It was inspired by #1599, but uses StatefulSet to achieve the following properties:

  1. One StatefulSet can run a master and any number of slaves.
  2. The StatefulSet can be scaled up and down.
  3. PersistentVolumes are auto-provisioned.
  4. New slaves perform a clone of existing data, and then start replicating from the master.
  5. Slave pods that get rescheduled (e.g. due to Node failure) get re-linked to the stable PersistentVolumeClaim, start back up, and reconnect to replication.
  6. If the master is restarted or rescheduled, slaves will keep retrying to connect back to it via the stable DNS name.

This example has the following (known) caveats:

  1. There's no authentication anywhere.
  2. Ordinal index 0 is always assumed to be the master. You must recover the master rather than failover to a slave.
  3. It requires an image (Dockerfile included) with some extra MySQL tools installed.
  4. If using only transactional tables (InnoDB), instance N will slow down while instance N+1 is taking a clone from it upon scaling the StatefulSet up. If using non-transactional tables (MyISAM), the whole table will be locked while being cloned.

This change is Reviewable

@janetkuo janetkuo added this to the 1.5 milestone Nov 18, 2016
@foxish foxish assigned kow3ns and unassigned lavalamp Nov 18, 2016
@bparees
Copy link

bparees commented Nov 18, 2016

you might also be interested in the mongodb stateful set example we did for openshift (when they were still called petsets):
https:/sclorg/mongodb-container/tree/master/examples/petset

@@ -0,0 +1,144 @@
apiVersion: apps/v1beta1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps by convention, we should have the headless service be in the same file as the statefulset itself? I think most other examples do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my Vitess chart, I found it useful to have multiple StatefulSets share a single headless service. Is that a reasonable pattern? If so, the convention of putting the headless service together with the StatefulSet wouldn't make sense. For the tutorial, I also like the idea of keeping the service separate to reduce noise, and so I can separately explain the parts. If we still want to stick to the convention though, I'll do that. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for having both in the same file, I like the convention and it's a bit easier to create from a single file rather than creating from two separate files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this initial comment thread, I've also added a client service in addition to the headless one. So now I have two services in mysql-services.yaml. I also have mysql-configmap.yaml, so 3 files total. I think if it was just a StatefulSet and one headless service, I would agree with putting them together to get to the one-file ideal. However, with 2 services and a ConfigMap, I feel like the separation adds enough clarity to be worth the extra files.

# Generate mysql server-id from pod ordinal index.\n
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1\n
echo [mysqld] > /mnt/conf.d/server-id.cnf\n
echo server-id=$((100 + ${BASH_REMATCH[1]})) >> /mnt/conf.d/server-id.cnf\n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do (100 + ordinal_idx)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL server id 0 is reserved, so we need some offset on the ordinal index. I didn't want to use something like ordinal+1 because it would be easy to see server id 1 and mistakenly assume that corresponds to ordinal 1. I'll add a comment somewhere nearby to make this less mysterious.


echo "Initializing replication from clone position"
[[ `cat xtrabackup_binlog_info` =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
mv xtrabackup_binlog_info xtrabackup_binlog_info.orig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the container fails after this step?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the CHANGE MASTER TO to be executed at-most-once, because it's dangerous to re-point the replication position if replication has already started making progress. So, if the container fails after moving xtrabackup_binlog_info but before finishing CHANGE MASTER TO, it will leave the slave without any replication and the operator will need to resolve it.

@foxish
Copy link
Contributor

foxish commented Nov 18, 2016

  1. MyISAM tables are supported here? are they just going to be locked completely when the backup is being created?
  2. Would the constructed backups need encryption?
  3. When this adds authentication, it should have a separate backup user-role in addition to the mysql user I assume. Is that right?
  4. What are the limitations of this binlog based replication in comparison with that using transactions/GTID?

@enisoc
Copy link
Member Author

enisoc commented Nov 18, 2016

Regarding the above questions:

  1. They will probably be locked completely. To be honest I hadn't thought about MyISAM since it's essentially deprecated now.
  2. The backups are streamed directly to the destination data-dir, so they don't need at-rest encryption. But yes, you would want to use encryption and authentication on the "stream backup" port.
  3. Yes, there should be non-root users for replication and app access with all the grants and such. I was thinking these features would make sense in a Chart, but just add noise in a tutorial.
  4. GTID mode would make it safer to re-point replication positions and easier to handle failover. For this tutorial, I just wanted to use default MySQL settings as much as possible to focus on explaining patterns for using StatefulSet in general. I can make sure to mention this as a caveat.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 19, 2016
@kow3ns
Copy link
Member

kow3ns commented Nov 21, 2016

Reviewed 1 of 4 files at r1, 3 of 5 files at r2.
Review status: all files reviewed at latest revision, 4 unresolved discussions.


docs/tutorials/replicated-stateful-application/mysql-statefulset.yaml, line 1 at r1 (raw file):

Previously, enisoc (Anthony Yeh) wrote… > In my Vitess chart, I found it useful to have multiple StatefulSets share a single headless service. Is that a reasonable pattern? If so, the convention of putting the headless service together with the StatefulSet wouldn't make sense. For the tutorial, I also like the idea of keeping the service separate to reduce noise, and so I can separately explain the parts. If we still want to stick to the convention though, I'll do that. Let me know what you think.
As long as both files are referenced in the tutorial it probably doesn't hurt too much to keep the service in a separate file. There are other examples in contrib that do this.

docs/tutorials/replicated-stateful-application/mysql-statefulset.yaml, line 2 at r2 (raw file):

apiVersion: apps/v1beta1
kind: StatefulSet

We discussed this offline already, but given the lack of security, when the tutorial is presented, we want to stress that this is not a production ready example.


Comments from Reviewable

@enisoc
Copy link
Member Author

enisoc commented Nov 23, 2016

The tutorial itself is now ready for review.

@enisoc enisoc force-pushed the mysql-stateful-set branch 2 times, most recently from 97d5f8f to 8ec6913 Compare November 23, 2016 22:35
@devin-donnelly devin-donnelly assigned kow3ns and steveperry-53 and unassigned kow3ns Nov 23, 2016
@devin-donnelly
Copy link
Contributor

devin-donnelly commented Nov 23, 2016

@steveperry-53, can you handle the doc review on this one, as it's a Tutorial?

@kow3ns
Copy link
Member

kow3ns commented Nov 28, 2016

Reviewed 6 of 6 files at r4, 1 of 1 files at r5.
Review status: all files reviewed at latest revision, 11 unresolved discussions.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 23 at r5 (raw file):

In particular, MySQL settings remain on insecure defaults to keep the focus
on general patterns for running stateful applications in Kubernetes.

I like the phrasing of the above, and this addresses my earlier concern.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 33 at r5 (raw file):

[Persistent Volumes](/docs/user-guide/persistent-volumes/)
and [Stateful Sets](/docs/concepts/controllers/statefulsets/),
as well as other core concepts like Pods, Services and Config Maps.

Might want to just add the hyperlinks for Pods, Services, and Config Maps.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 99 at r5 (raw file):

Note that only read queries can use the load-balanced Client Service.
Since there is only one master, clients should connect directly to the master
Pod (through its DNS entry within the Headless Service) to execute writes.

Do you mean the SRV record associated with the master? Might want to explicit.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 137 at r5 (raw file):

### Understanding stateful Pod initialization

The Stateful Set controller starts Pods one at a time, in order by their

We might want to link this to the Stateful Set concept, or the Stateful Set Basics tutorial when all three PRs get merged.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 152 at r5 (raw file):

Before starting any of the containers in the Pod spec, the Pod first runs any
[Init Containers](/docs/user-guide/production-pods/#handling-initialization)
in the order defined.

Maybe "in the order in which they are defined"


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 171 at r5 (raw file):

Since the example topology consists of a single master and any number of slaves,
the script simply assigns ordinal `0` to be the master, and everyone else to be
slaves.

You might want to expand on why its important that 0 is the canonical element of the set that is assigned to be the master. If the user fully understands how ordered Pod creation works and how MySQL master-slave replication works, they should be able to derive why this is so, but it might not hurt to be explicit.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 211 at r5 (raw file):

Also, since slaves look for the master at its stable DNS name (`mysql-0.mysql`),
they will automatically find the master even if it gets a new Pod IP due to
being rescheduled.

Do you know what the behavior is when the A record pointed to by the SRV record is removed and re-added resulting in an IP Address change (e.g. The node hosting the master fails, and it is rescheduled to a new node)? Are we sure that, after the connection between the Master and Slaves fails, the Slaves will call getbyhostname to re-resolve the IP address associated with SRV recorded and that they do not cache the previously resolved address?


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 480 at r5 (raw file):

{% capture cleanup %}

You should indicate to the user that she needs to manually delete any provisioned storage (i.e. Persistent Volumes).


Comments from Reviewable

@enisoc
Copy link
Member Author

enisoc commented Nov 29, 2016

Review status: 7 of 10 files reviewed at latest revision, 10 unresolved discussions.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 33 at r5 (raw file):

Previously, kow3ns (Kenneth Owens) wrote… > Might want to just add the hyperlinks for Pods, Services, and Config Maps.
Done.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 99 at r5 (raw file):

Previously, kow3ns (Kenneth Owens) wrote… > Do you mean the SRV record associated with the master? Might want to explicit.
You can query the DNS name and get either an A record or a SRV record. Most of the time you'll just use the A record to get the IP address. You would only need the SRV record if you want to find out the port number as well, but that's rarely used these days.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 137 at r5 (raw file):

Previously, kow3ns (Kenneth Owens) wrote… > We might want to link this to the Stateful Set concept, or the Stateful Set Basics tutorial when all three PRs get merged.
I linked it at the top, the first time I mentioned StatefulSet. Other than "first mention", it will be hard to come up with a consistent rule for when to link and when not to.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 152 at r5 (raw file):

Previously, kow3ns (Kenneth Owens) wrote… > Maybe "in the order in which they are defined"
Do you think "in the order defined" is unclear? I feel like both are clear, so I prefer the shorter one.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 171 at r5 (raw file):

Previously, kow3ns (Kenneth Owens) wrote… > You might want to expand on why its important that `0` is the canonical element of the set that is assigned to be the master. If the user fully understands how ordered Pod creation works and how MySQL master-slave replication works, they should be able to derive why this is so, but it might not hurt to be explicit.
Done.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 211 at r5 (raw file):

Previously, kow3ns (Kenneth Owens) wrote… > Do you know what the behavior is when the A record pointed to by the SRV record is removed and re-added resulting in an IP Address change (e.g. The node hosting the master fails, and it is rescheduled to a new node)? Are we sure that, after the connection between the Master and Slaves fails, the Slaves will call getbyhostname to re-resolve the IP address associated with SRV recorded and that they do not cache the previously resolved address?
The MySQL slaves will re-resolve, although the answer may be cached by the OS or DNS resolver lib. After the TTL, they should eventually converge though.

docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 480 at r5 (raw file):

Previously, kow3ns (Kenneth Owens) wrote… > You should indicate to the user that she needs to manually delete any provisioned storage (i.e. Persistent Volumes).
Done.

Comments from Reviewable

@kow3ns
Copy link
Member

kow3ns commented Dec 1, 2016

:lgtm:


Reviewed 6 of 6 files at r6.
Review status: all files reviewed at latest revision, 7 unresolved discussions.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 99 at r5 (raw file):

Previously, enisoc (Anthony Yeh) wrote…

You can query the DNS name and get either an A record or a SRV record. Most of the time you'll just use the A record to get the IP address. You would only need the SRV record if you want to find out the port number as well, but that's rarely used these days.

Yes, and the CNAME of the Headless Service fronts the SRV records which point to the A Records. I was indicating that "(through its DNS entry within the Headless Service)" might be a bit confusing. Not a blocker.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 137 at r5 (raw file):

Previously, enisoc (Anthony Yeh) wrote…

I linked it at the top, the first time I mentioned StatefulSet. Other than "first mention", it will be hard to come up with a consistent rule for when to link and when not to.

First use sounds like a good start.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 152 at r5 (raw file):

Previously, enisoc (Anthony Yeh) wrote…

Do you think "in the order defined" is unclear? I feel like both are clear, so I prefer the shorter one.

I think what is lacking is the where. "In the order defined in the manifest" makes it more clear, but plugging that in makes the next sentence sound clunky. Since you have a link to Init Containers, users that want to explore that concept have a quick link. I'll bite.


docs/tutorials/replicated-stateful-application/run-replicated-stateful-application.md, line 211 at r5 (raw file):

Previously, enisoc (Anthony Yeh) wrote…

The MySQL slaves will re-resolve, although the answer may be cached by the OS or DNS resolver lib. After the TTL, they should eventually converge though.

Sounds good, as long as the application isn't doing explicit address caching, it should converge as you say.


Comments from Reviewable

@steveperry-53
Copy link
Contributor

I'm starting a writing review now.

@steveperry-53
Copy link
Contributor

Anthony, This tutorial looks great. Just a few comments:

I don't think we need a new directory for this tutorial. I think it should go in the run-stateful-application directory.

There are several places where I think you could use present tense instead of future tense.
Example: "Once a slave begins replication, by default it will remember its master" could be "... it remembers its master."

@jeffmendoza, @pwittrock, and I have discussed how to get config files into the hands of the reader. I haven't published this in our contributor guidelines yet, here's our current thought: Don't have the reader copy and paste the config file. Instead, create a $REPO variable, and then create the API object directly from the config file in the repo. Here's an example.

Under Cleaning Up, do the steps need to be done in order? If so, let's use a numbered list instead of a bulleted list.

In a few places, we might want to say "MySQL master" instead of "master."
Someone new to Kubernetes might confuse the Kubernetes master with the MySQL master.
Example: Connect directly to the master Pod.

Some links are broken. I assume that's because the target topics aren't submitted yet.

@enisoc
Copy link
Member Author

enisoc commented Dec 2, 2016

@steveperry-53 I think I've addressed all your comments. The only remaining broken links should be those to the StatefulSet concept page, being added in #1719.

@jeffmendoza
Copy link
Contributor

The $REPO pattern was more useful for the main kubernetes repo, which was not a website. For the tutorials on the website, the URL is much shorter and can be used directly:

 export REPO=https://raw.githubusercontent.com/kubernetes/kubernetes.github.io/master
 kubectl create -f $REPO/docs/tasks/configure-pod-container/envars.yaml

Can be:

 kubectl create -f http://k8s.io/docs/tasks/configure-pod-container/envars.yaml

@devin-donnelly
Copy link
Contributor

Added Tech Review LGTM from @kow3ns .

Do we want to resolve @jeffmendoza 's comments before merging?

@steveperry-53
Copy link
Contributor

The comment by @jeffmendoza is resolved in the final commit.

@devin-donnelly
Copy link
Contributor

Great. Merging now.

@devin-donnelly devin-donnelly merged commit 3f2e858 into kubernetes:release-1.5 Dec 2, 2016
@enisoc enisoc deleted the mysql-stateful-set branch December 2, 2016 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.