Use Fleet machine metadata for "environments" #41

lukebond · 2015-03-23T17:31:38Z

Let's say you want dev, QA, staging and production clusters. Rather than have multiple clusters of Paz, they could be the same cluster but use Fleet machine metadata to schedule units only on hosts containing units from their environment.

e.g. 4 environments, each a 3-node cluster, you may have the following metadata for them:

Host	Name	Metadata
host1	dev1	environment=dev
host2	dev1	environment=dev
host3	dev1	environment=dev
host4	qa1	environment=qa
host5	qa2	environment=qa
host6	qa3	environment=qa
host7	staging1	environment=staging
host8	staging2	environment=staging
host9	staging3	environment=staging
host10	prod1	environment=prod
host11	prod2	environment=prod
host12	prod3	environment=prod

More can be read about Fleet scheduling with metadata here: https://coreos.com/docs/launching-containers/launching/launching-containers-fleet/#schedule-based-on-machine-metadata

Credit to @rimusz for the idea.

rimusz · 2015-03-23T17:36:11Z

@lukebond
it is better not to mix production cluster with the rest.
production is production it needs to be kept away from the development cluster.

e.g. use of different coreos release channels.

sublimino · 2015-03-24T10:26:53Z

@rimusz +1 for separating production environment - total isolation, nothing shared if possible

rimusz · 2015-03-24T10:33:44Z

@sublimino @lukebond
only private docker registry can be used for both, so docker images can be shared between clusters

lukebond · 2015-03-24T11:04:36Z

although i agree with this, there shouldn't be anything in Paz that cares whether you separate them or not. Paz just needs to be aware of environments (ie. a parameter to most REST calls) and translate them down to Fleet machine metadata at deployment time.

although Paz avoids doing infra stuff, i'm beginning to think it would be good to have a cluster provisioning tool (separate from Paz) that allows you to choose Etcd cluster topology, group machines and add metadata, etc.

rimusz · 2015-03-24T11:18:45Z

@lukebond
Yes, Paz should not care how your dev/production is set. It is just good practice to keep them separately.

I think the separate cluster provisioning tool makes sense, which as you said: allows you to choose Etcd cluster topology, group machines and add metadata, etc.

sublimino · 2015-03-24T12:01:21Z

That cluster provisioning tool(/GUI?) sounds like it could be a cloud-config generator via https://terraform.io/ - in support of immutable infrastructure we should deploy a new host with new config, health check, rebalance containers, and decommission old host? Servers should automatically be distributed between AZs where applicable.

etcd topology - for large deployments of any size CoreOS recommends running a separate 5-node etcd cluster, otherwise etcd should run on each host.

rimusz · 2015-03-24T12:31:39Z

@sublimino https://terraform.io/ it is good choice for cloud one setups. What about the bare-metal?

Regarding the etcd:
ever ever we should run etcd on each host, very bad idea, coreos does not recommend that.
I got bitten by that setup very badly.
I would recommend such setup:

Up to 9 workers, one etcd
then we can start from 3 etcd nodes for 10 up to 50 worker machines, then increase to 5 and so one.

Also etcd machines do not have to very powerful as they run only etcd cluster, e.g at GCE g1-small instances work just fine.
I had a long chat about it with Kelsey when he was at London Kubernetes meetup.

lukebond · 2015-03-24T12:40:34Z

Agree with all of this and aware of the Etcd-on-every-machine anti-pattern from previous experience (and was also at that meet-up). But since Paz doesn't do infra then that's down to whoever sets up the cluster.

rimusz · 2015-03-24T12:52:16Z

@lukebond yep, it is more for the cluster provisioning tool, which makes sense to have for sure, to prepare cluster for Paz.

sublimino · 2015-03-24T17:02:53Z

@rimusz if those bare-metal machines are accessible via ssh already we could conceivably rewrite the cloud-config file and reboot the server? Would have to ensure they're all on the same release channel.

Also been bitten by etcd 0.4 - hopefully we're fixed in v2, although not stressed it myself yet.

Read "on each host" above as "on three or fewer node clusters" - my concern with running less than three nodes is loss of resilience and the smallest machine breaking the cluster (AWS micro/small is not sufficient for etcd nodes). How much hand-holding should a provisioning tool do, @lukebond? And possibly it's another issue as I've hijacked this one! :)

As a footnote, the upper bound of etcd nodes required for stability across any cluster size is 5 according to a chat with Alex Polvi via some Chubby engineers. Further nodes add no meaningful resilience.

rimusz · 2015-03-24T17:38:42Z

@sublimino We can have a choice e.g. if somebody wants very small cluster of 3-5 nodes, they can have if they want just one etcd node, then 3 or 5 nodes depending on cluster size :-)
Yes, AWS ones micro/small are very bad, but Google g1-small (AWS small kind of) runs my etcd clusters just fine. This why I run away from AWS to GC.

@lukebond regarding this cluster provisioning tool, we need a separate repository under paz-sh.
I was looking forward to start messing with https://terraform.io/ for my small projects too, so there we can can put our brains to make a nice different clouds cluster provisioning tool, multi-clouds and etc.

lukebond · 2015-03-24T17:46:23Z

@rimusz good idea. i took the liberty of choosing a name: https:/paz-sh/clusterform

rimusz · 2015-03-24T17:47:07Z

👍

sublimino · 2015-03-24T21:23:35Z

Splendid!

On 24 March 2015 at 17:47, Rimas Mocevicius [email protected]
wrote:

[image: 👍]

—
Reply to this email directly or view it on GitHub
#41 (comment).

rimusz · 2015-03-25T10:14:30Z

Will Paz support already provisioned clusters?
Maybe cloudform can be used there?

lukebond · 2015-03-25T10:24:33Z

@rimusz currently that's all it supports. there are some helper scripts for bringing up a cluster (for testing/playing only really) but the idea is that you've already got your cluster and then you put Paz on it.

rimusz · 2015-03-25T10:27:28Z

If paz is going to use all that metadata stuff, some instructions need to be provided then, what metadata settings needs to be set on to current cluster to make paz to function properly

lukebond · 2015-03-25T10:50:36Z

Yes, when we start using it. Currently there are no such requirements but there soon will be, e.g. for tying scheduler and service directory to a particular host (they're the ones that have a DB and therefore need a volume mount and to not move hosts). I've been doing that manually so far.

There will also be some metadata for environments and as you say that needs to be defined and documented.

rimusz · 2015-03-25T10:51:43Z

Cool

lukebond added enhancement environments labels Mar 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Fleet machine metadata for "environments" #41

Use Fleet machine metadata for "environments" #41

lukebond commented Mar 23, 2015

rimusz commented Mar 23, 2015

sublimino commented Mar 24, 2015

rimusz commented Mar 24, 2015

lukebond commented Mar 24, 2015

rimusz commented Mar 24, 2015

sublimino commented Mar 24, 2015

rimusz commented Mar 24, 2015

lukebond commented Mar 24, 2015

rimusz commented Mar 24, 2015

sublimino commented Mar 24, 2015

rimusz commented Mar 24, 2015

lukebond commented Mar 24, 2015

rimusz commented Mar 24, 2015

sublimino commented Mar 24, 2015

rimusz commented Mar 25, 2015

lukebond commented Mar 25, 2015

rimusz commented Mar 25, 2015

lukebond commented Mar 25, 2015

rimusz commented Mar 25, 2015

Use Fleet machine metadata for "environments" #41

Use Fleet machine metadata for "environments" #41

Comments

lukebond commented Mar 23, 2015

rimusz commented Mar 23, 2015

sublimino commented Mar 24, 2015

rimusz commented Mar 24, 2015

lukebond commented Mar 24, 2015

rimusz commented Mar 24, 2015

sublimino commented Mar 24, 2015

rimusz commented Mar 24, 2015

lukebond commented Mar 24, 2015

rimusz commented Mar 24, 2015

sublimino commented Mar 24, 2015

rimusz commented Mar 24, 2015

lukebond commented Mar 24, 2015

rimusz commented Mar 24, 2015

sublimino commented Mar 24, 2015

rimusz commented Mar 25, 2015

lukebond commented Mar 25, 2015

rimusz commented Mar 25, 2015

lukebond commented Mar 25, 2015

rimusz commented Mar 25, 2015