-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster/HA Install of AWX #26
Comments
hello, we have a successfull HA deployement thanks to your rpm. Here is what we have done: we have made a lot of test on it and everything seems fine |
That is great to hear... thanks for your feedback.. If you have a more detailed installation description, I would love to add it to the documentation.. |
ok so here is the process: When all nodes are installed we can now build the rabbitmq cluster. Connect to nodes 2 and 3: rabbitmq is now in cluster. Second step is celery: We also saw that at this step it can be better to reboot all 3 nodes but one by one to kept the rabbitmq cluster in good shape. hope that can help |
i forgot but of course final step go to the web interface and create the instance with all 3 nodes |
@MrMEEE , this a bit out of topic. But for those who wish to explore and automate the HA / Instance group using official docker stand alone method can be access it from my repository . Were you able to add this piece of info under your wiki, may be some people out there could be helpful. Thanks |
So, basically, everything that is needed for a HA setup is to do a standalone postgresql (cluster) and a rabbitmq (cluster) and then do frontends that connects to these?? Should be pretty simple to implement.. I have added a links section |
@MrMEEE , thank you for adding these piece of info under your wiki. Only requirement is to setup a standalone postgresql and rest all will be taken care by playbook such as building and configuring the rabbitmq cluster and enabling the docker version of HA in all nodes. And Yes, it is pretty simple to implement now through my playbook |
Hi guys, Best, Tim. |
I would love to include playbooks for installing in the RPM... |
OK so I'll add that to my TODO for the next days... |
Did you had a chance to do it? I would love to try them. |
Hi all, very much interested in the playbook. If playbook is not available now, can someone highlight what's needed for pointing to external Postgres server - using the RPM installation method please. Set pg_hostname if you have an external postgres server, otherwisea new postgres service will be createdpg_hostname=hostname Thanks everyone for great efforts! |
In regards to the external postgres, you basically only needs to setup an external postgres (cluster?) And change the configuration in /etc/tower/settings.py to point to that server, before running the database initialization.. |
Thanks much for the quick response . Yes it will be a Postgres 2-node cluster with steaming replication . Yes I see there’s a section for configuring USER/PASSWORD/HOST/PORT in settings.py. So Initializing DB / all steps listed in awx.wiki/multi-section-page/configuration is still required? |
No issues setting up external Postgresdb . Issues with setting up cluster . I followed the previous comments on setting up clustering and got to point of enabling rabbitmq cluster within 2 nodes - but awx didn’t detect the additional node . The endpoint - api/v2/Ping only displays one activenode. Also there’s no awx-celery-worker - this service appears to have been deprecated? Thanks. |
Hi.. I think you have to enable each of the awx nodes with the command:
and yes, the celery worker is deprecated... |
Thanks much it worked! I had to run this command first - sudo -u awx scl enable rh-python36 rh-postgresql10 "awx-manage provision_instance --hostname=$(hostname)" before running your command - |
Also as far as upgrading to latest AWX version, I presume it will still work just so that we have to upgrade on all Nodes within the cluster. Thanks again! |
Ah, yes.. of course you have to do the provision_instance first :).. I will do a write-up on this and put on awx.wiki as soon as possible.. also i'm planning a setup-tools for simpler installation and configuration, which will also contain the HA... Could you share the exact changes you have made to the systemd files?? Remember not to change the files themselves, but to overwrite them with copies in /etc/systemd/system.. else they will get returned to default on the next update... In regards to updating, I think you should update the ansible-awx (and dependencies) on all nodes before running the database migrations... |
I said it too fast :) Tried - rabbitmqctl stop_app/ rabbitmqctl start_app , systemctl restart rabbitmq-server on the server. Also bounce both nodes. On the web GUI, I had switched from OFF to ON, but USED CAPACITY eventually becomes "UNAVAILABLE." |
ignore - the issue was with AWX not running during startup on the new node:) |
so far so good . I didn't make any systemD changes since celery has been deprecated. |
one issue came up so far is that when a job finished running on new node, the node USED CAPACITY goes into "UNAVAILABLE." It is as though the node lost its heartbeat to the rabbitmq cluster. Need to troubleshoot further. |
I'm in Praque for the week for a Red Hat event.. I will try to setup a HA environment when I get home, then we can debug together |
This is error msg I'm getting. 2019-06-27 14:01:34.390 [info] <0.1498.0> connection <0.1498.0> (127.0.0.1:42950 -> 127.0.0.1:5672): user 'guest' authenticated and granted access to vhost '/' Thanks |
Looks like the issue has to do with the fact - awx requires 'tower' vhost. Currently we're using host (default) '/'. So getting a bunch of closing AMQP connections. 2019-06-27 15:30:29.289 [info] <0.2130.0> connection <0.2130.0> (127.0.0.1:47012 -> 127.0.0.1:5672): user 'guest' authenticated and granted access to vhost '/' |
Hi guys, Finally the playbook is here: https:/powertim/deploy_awx-rpm Please try first to adapt the playbook before opening an issue. Best, Tim |
Hi Tim,
Good to see the repo. Waiting for the README. Wondering will it work on
centos7 as well?
…---
Best regards,
Gowtham
07798838879
=====================
Learn from mistakes....
Please consider the environment before printing this email - Thanks
On Tue, Jul 23, 2019 at 3:37 PM Timothée Christin ***@***.***> wrote:
Hi guys,
As we worked on it with @Aglidic <https:/Aglidic> to build
the first HA implementation of the RPM, we have a playbook which does the
full setup automatically.
It's just corporate currently, so I need find some time to generalize it
if you want to add it somewhere.
Best,
Tim.
Hi guys,
Finally the playbook is here: https:/powertim/deploy_awx-rpm
Currently designed for RHEL7 x86_64 with Satellite repos.
I will try to update it with manual repos as described on
https://awx.wiki/installation/repositories/rhel7-x86_64.
And why not in the future for the different OS supported on the awx.wiki...
Please try first to adapt the playbook before opening an issue.
I'll fill up the README soon.
Best,
Tim
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26?email_source=notifications&email_token=AA66HLTARJAVF4QGFEYCAXTQA4JTFA5CNFSM4FDCQXP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2TKQDI#issuecomment-514238477>,
or mute the thread
<https:/notifications/unsubscribe-auth/AA66HLXIP2J22MFX5W3L7EDQA4JTFANCNFSM4FDCQXPQ>
.
|
Hi @gowthamakanthan , It should work on CentOS 7 with a few changes:
But I'll try to add this content when I find the time for that (I hope quickly). |
@powertim - Thanks for the efforts! I've tested and works as expected. However, the previous reported issue still exists where the 2nd node (I have a 2 node rabbit mq cluster) goes into "UNAVAILABLE" state as soon as Job finished running. hostnameB is the 2nd node which has a Capacity of 0 because it's NOT availalbe. Primary node I've DISABLED it intentionally. [root@hostnameA deploy_awx-rpm]# sudo -u awx scl enable rh-python36 rh-postgresql10 "awx-manage list_instances" [tower capacity=0] |
This is installed using latest AWX 6.10. This is example of a run where node becomes "unavailable" where job no longer exists in the queue - with the below explanation. EXPLANATION |
Hi all, Issue with - scl: RuntimeError: Django version other than 2.2.2 detected: 2.2.4. Django is what comes by default - rh-python36-Django-2.2.4-1.noarch Thanks. Aug 6 18:40:19 hostnameA scl: Traceback (most recent call last): |
Did you install your cluster using the playbook?
I had this issue when relaunching the playbook.
Should be ok with a full clean install.
Cheers,
Tim.
Le mar. 6 août 2019 à 20:43, dnc92301 <[email protected]> a écrit :
… Hi all,
It looks like the problem is no longer servicing after setting up a new
server . However, I'm hitting the following issues with starting up awx.
Issue with - scl: RuntimeError: Django version other than 2.2.2 detected:
2.2.4.
Django is what comes by default - rh-python36-Django-2.2.4-1.noarch
Thanks.
Aug 6 18:40:19 hostnameA scl: Traceback (most recent call last):
Aug 6 18:40:19 hostnameA scl: File
"/opt/rh/rh-python36/root/usr/bin/daphne", line 11, in
Aug 6 18:40:19 hostnameA scl: load_entry_point('daphne==1.3.0',
'console_scripts', 'daphne')()
Aug 6 18:40:19 hostnameA scl: File
"/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/daphne/cli.py",
line 144, in entrypoint
Aug 6 18:40:19 hostnameA scl: cls().run(sys.argv[1:])
Aug 6 18:40:19 hostnameA scl: File
"/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/daphne/cli.py",
line 174, in run
Aug 6 18:40:19 hostnameA scl: channel_layer =
importlib.import_module(module_path)
Aug 6 18:40:19 hostnameA scl: File
"/opt/rh/rh-python36/root/usr/lib64/python3.6/importlib/*init*.py", line
126, in import_module
Aug 6 18:40:19 hostnameA scl: return _bootstrap._gcd_import(name[level:],
package, level)
Aug 6 18:40:19 hostnameA scl: File "", line 994, in _gcd_import
Aug 6 18:40:19 hostnameA scl: File "", line 971, in _find_and_load
Aug 6 18:40:19 hostnameA scl: File "", line 941, in _find_and_load_unlocked
Aug 6 18:40:19 hostnameA scl: File "", line 219, in
_call_with_frames_removed
Aug 6 18:40:19 hostnameA scl: File "", line 994, in _gcd_import
Aug 6 18:40:19 hostnameA scl: File "", line 971, in _find_and_load
Aug 6 18:40:19 hostnameA scl: File "", line 955, in _find_and_load_unlocked
Aug 6 18:40:19 hostnameA scl: File "", line 665, in _load_unlocked
Aug 6 18:40:19 hostnameA scl: File "", line 678, in exec_module
Aug 6 18:40:19 hostnameA scl: File "", line 219, in
_call_with_frames_removed
Aug 6 18:40:19 hostnameA scl: File
"/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/awx/*init*.py",
line 49, in
Aug 6 18:40:19 hostnameA scl: current=django.*version*)
Aug 6 18:40:19 hostnameA scl: RuntimeError: Django version other than
2.2.2 detected: 2.2.4. Overriding names_digest is known to work for
Django 2.2.2 and may not work in other Django versions.
Aug 6 18:40:19 hostnameA systemd: awx-daphne.service: main process exited,
code=exited, status=1/FAILURE
Aug 6 18:40:19 hostnameA systemd: Unit awx-daphne.service entered failed
state.
Aug 6 18:40:19 hostnameA systemd: awx-daphne.service failed.
Aug 6 18:40:21 hostnameA systemd: awx-cbreceiver.service holdoff time
over, scheduling restart.
Aug 6 18:40:21 hostnameA systemd: awx-channels-worker.service holdoff time
over, scheduling restart.
Aug 6 18:40:21 hostnameA systemd: awx-dispatcher.service holdoff time
over, scheduling restart.
Aug 6 18:40:21 hostnameA systemd: Stopped AWX Dispatcher.
Aug 6 18:40:21 hostnameA systemd: Stopped AWX channels worker service.
Aug 6 18:40:21 hostnameA systemd: Stopping AWX web service...
Aug 6 18:40:21 hostnameA systemd: Stopped AWX cbreceiver service.
Aug 6 18:40:21 hostnameA systemd: awx-daphne.service holdoff time over,
scheduling restart.
Aug 6 18:40:21 hostnameA systemd: Stopped AWX daphne service.
***@***.*** ~]#
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26?email_source=notifications&email_token=AB5CGUAI2Q2TP7DI463EVALQDHA3VA5CNFSM4FDCQXP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3WCZMI#issuecomment-518794417>,
or mute the thread
<https:/notifications/unsubscribe-auth/AB5CGUAIKH4KQ53MR2YAV23QDHA3VANCNFSM4FDCQXPQ>
.
|
@dnc92301 Please create new issues, instead of reusing old ones... Have you remembered to update the ansible-awx package??? |
@powertim Maybe the playbook doesn't update the ansible-awx package?? |
@tim - yes this happens after rerunning playbook. After upgrading to latest ansible-awx version it worked! |
It's updated now ! |
Yeah unfortunately re-running playbook cause failures. |
Hello, I have offline VMs where I need to build AWX. As listed above I saw about 160 rh-python36-* dependencies. Where I can find a tar ball or url for all rpms I need for AWX? |
@VJoshi0: yum install --download-only --download-dir /to/here/ rh-python36-* Further example at https://unix.stackexchange.com/questions/259640/how-to-use-yum-to-get-all-rpms-required-for-offline-use |
So after having the 3 instances clustered is a loadbalancer used at all? What about manual projects that are on the local filesystem? Rsync them? |
Hi All thanks for the great work that you are doing, I was wondering if there was a step by step guide for the HA setup similar to the standalone setup in this wiki guide https://awx.wiki/installation/installation |
Hi @elstoncawley, Unfortunately not, and I didn't work for a long time on the HA setup but you'll find steps in the playbook here https:/powertim/deploy_awx-rpm. Cheers, Tim. |
Thanks @powertim |
Yes in theory you can use all the repos you want but you need to change the way you enable repos and call them because I only provided a RHEL conf with Satellitte so subscription-manager command won't be available for you. |
Hi everbody! Did anyone get HA/Clustering running with AWX 11.X.X and redis? |
Responding to myself, and leaving reference material for those who need it, issue of https:/sujiar37/AWX-HA-InstanceGroup/issues/26 seems to shed some light. I will test asap. |
https:/fitbeard/awx-ha-cluster this playbook working well. I'm using it for a time. |
Moved from here:
subuk/awx-rpm#11
The text was updated successfully, but these errors were encountered: