Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple notifiers into a modular design #1400

Open
calston opened this issue May 31, 2018 · 23 comments
Open

Decouple notifiers into a modular design #1400

calston opened this issue May 31, 2018 · 23 comments

Comments

@calston
Copy link

calston commented May 31, 2018

Notifiers should be decoupled from the main application logic more to make it easier to test and integrate support for new endpoints.

Currently it's far too intimidating to contribute new notification destinations quickly as it's not at all clear where these should be added in various places without having to know the entire source tree.

@stuartnelson3
Copy link
Contributor

Our recommendation currently is to integrate via webhook, as we are not adding new notification types right now.

@calston
Copy link
Author

calston commented May 31, 2018

Oh okay, so you're going to stick with a terrible architecture and close my suggestion without really reading the point.

Maybe I should stick to Elasticsearch ecosystem.

@stuartnelson3
Copy link
Contributor

Snide remarks rarely achieve anything positive. Pointing out to me in a friendly way that I mis-read your issue, however, would have been constructive.

Alertmanager is supported by free labor. If you're unhappy with the architecture, please submit a suggestion on how to fix it, and a pull request implementing it, instead of simply insulting what is currently there.

@stuartnelson3 stuartnelson3 reopened this May 31, 2018
@calston
Copy link
Author

calston commented May 31, 2018

Yeah I was first opening an issue as a point of tracking a feature branch.

This is not the first time someone on the Prometheus project has just immediately closed my issue without giving it any thought, and it reflects very poorly on the project.

My original issue WAS a suggestion on how to fix it.

@juliusv
Copy link
Member

juliusv commented May 31, 2018

Hey people. As far as I read the above, @stuartnelson3 mis-read the initial issue and maybe closed it a bit too hastily. That's not ideal, but responding with insults ("terrible architecture") is not helping either. Please try to remain respectful and assume the best in others.

I think the initial proposal was a good one at face value, except if you factor in the current policy of not adding new notification mechanisms, it limits its value a bit (because if no new mechanisms are added in AM directly, it will only help people who want to add a new mechanism in a fork). Independently of that, it could still make sense to modularize this if it leads to cleaner code in general.

To go a bit into why (as far from what I have been following) new notification mechanisms are currently on hold: they are not easily testable unless you have access to the necessary receiver infrastructure, and people who have contributed them have not always put in the effort to maintain them over time, even when they promised to do so. Thus currently the recommendation is to integrate new ones via the webhook, for better or worse.

@ant31
Copy link

ant31 commented May 31, 2018

@calston could you explain why using the webhooks isn't satisfying enough for your use case?

@calston
Copy link
Author

calston commented May 31, 2018

@ant31 Webhooks has nothing to do with this case. But if you insist on the generic, not everything is a webservice, and not all webservices have the same API. Having a proxy web service to take the alertmanager web hook call and turn it into something that isn't one is just ridiculous.

@juliusv Thanks. This is exactly why I made the suggestion, because modularising the notification mechanisms would make testing them and building mocks significantly easier as I've seen in other Go projects.

It's difficult to accept the notion of "No more new mechanisms" if the existing ones aren't going to be removed because market fairness... Many of the current ones are commercial services.

@juliusv
Copy link
Member

juliusv commented May 31, 2018

Having a proxy web service to take the alertmanager web hook call and turn it into something that isn't one is just ridiculous.

This is what we currently suggest though. You might find it ridiculous, but on the other hand you also have to respect that the maintainers have decided (for now!) that they are not able to support more baked-in notifiers in the project. Of course it's not as nice as having a built-in integration.

modularising the notification mechanisms would make testing them and building mocks significantly easier as I've seen in other Go projects.

Yes, it would help, and I think it can still be a good idea to modularise it. But be aware that you still can't verify that the mock actually matches the real receiver, and that the result really looks right in the end. We have a similar problem and thus policy with built-in service discovery mechanisms in Prometheus itself.

It's difficult to accept the notion of "No more new mechanisms" if the existing ones aren't going to be removed because market fairness... Many of the current ones are commercial services.

I don't think we have a duty to proactively ensure absolute market fairness, and removing existing mechanisms would IMO be too painful for existing users.

@calston
Copy link
Author

calston commented May 31, 2018

This is what we currently suggest though

Since this issue is so constantly directed at webhooks, it's worth pointing out that the documentation describes an incredibly simple interface that doesn't seem to provide support for anything more nuanced like adding headers for HMAC.

And according to this other thread here #684 you seem to have no interest in that either. So basically you're telling people "No more notifiers" and then not giving them any other mechanism to extend it other than effectively writing their own alertmanager.

PS. On the subject of "respect", that issue got the same "just close it straight away, no need to listen to ideas or suggestions" treatment as this did.

I don't think we have a duty to proactively ensure absolute market fairness

I think if you want to stay part of CNCF you might want to double check on that.

@RichiH
Copy link
Member

RichiH commented May 31, 2018

Generally speaking, I think we all prefer communication without insults or threats.

Security is outside of our scope, on purpose. Proxy implementors are better suited to get security right so we rely on those throughout the ecosystem. Also see https://prometheus.io/docs/operating/security/

As far as I understood, you're offering to modularize this part of the codebase and to increase test coverage. No matter what happens after that, I think it's a great offer and would be looking forward to see this. To make sure effort is not wasted, I would suggest we start with a design doc.

@ripienaar
Copy link

Generally speaking, I think we all prefer communication without insults or threats.

As a keen observer of this project who have spent a bit of time on its IRC channel - and opted to leave it - I would say the general tone of this project is incredibly unfriendly.

Any suggestion for improvement, deviation, any question for use outside of the very narrow scope you plan is instantly met with a hugely dismissive "That is not how you should use Prometheus". This is dismissive, insulting to community who wish to partake in this community and generally set the tone all round. It's a huge shame because a big community of contributors, side projects and larger eco system is eager to exist but we are constantly shot down in every effort.

The handling this issue received - the outright close without due consideration or discussion is further evidence of the insulting, dismissing and exclusionary approach this project takes to contributors.

Might I suggest you lead by example?

@ripienaar
Copy link

Generally speaking, I think we all prefer communication without insults or threats.

And to be clear, this is not to condone the insults etc, but the continuous relentless dismissive attitude the prometheus team projects to the outside world is a giant slap in the face to the community, you can't exactly be unhappy that eventually this boils over and emotions get stirred.

@RichiH
Copy link
Member

RichiH commented May 31, 2018

I agree that we need to improve our communication on issues in several regards, and we are actually currently working actively on this, e.g. see prometheus/prometheus#4196

Even within Prometheus team, we regularly disagree about fundamental topics, which is, in part, due to the fact that a lot more thought and experience has gone into many design decisions than are readily apparent. That's not to say that everyone else is wrong, far from it. But it does mean that sometimes, the end result of a discussion is not what anyone proposing something would like to see. It also means we must become better at exposing underlying rationale, both by ways of documentation and within discussions. While it will take some time until we have public results, we are actively working on improving docs and rationales, as well.

As an aside, I agree with you that insults and threats are never an appropriate response while firmly rejecting the implication that they become appropriate within the context of stirred emotions.

On the plus side, there's a clear and constructive path forward within the scope of this issue: Start a design doc to achieve mutual understanding and agreement on refactoring this part of our codebase and then implement against this design doc.

@ant31
Copy link

ant31 commented May 31, 2018

Since this issue is so constantly directed at webhooks, it's worth pointing out that the documentation describes an incredibly simple interface

Most services allow 3rd party integration via webhook only. It is an extremely common practice. Yes, it adds load on the vendor side to integrate with the hook. But on the other side, it would not scale (limited number of contributors/maintenance/bugs support) to integrate every vendor on an opensource project

That said, the current interface is maybe insufficient to cover all use-cases, I don't know.

@jamtur01
Copy link
Contributor

jamtur01 commented May 31, 2018

@RichiH They may not become acceptable but @ripienaar was merely pointing out how he can see how they might happen. The tone of some of the Prometheus team in the past to polite suggestions has been upsetting. This has definitely engendered some ruffled feathers.

Transparency matters. It took me months of reading every issue - and I still subscribe to every update in the Prometheus and Alertmanager repos - and watching a dozen presentations to pull together a picture of the intent or history behind a lot of the core team's decisions. But a normal user or contributor to the project shouldn't have to do this to understand the "why" of an answer or an abruptly closed issue.

I'd also say that many of us, including a number of people who have been shut down when raising issues, have deep subject matter expertise in this domain. "We've already thought about this and you're wrong" is not a respectful response to that input. It's even more frustrating for a newcomer to the domain who is not only lost as to the why but is not going to learn anything from that answer.

I'd love to see the team invest a lot more time in documenting the architecture and especially the underlying rationale for design decisions. There's a few scattered "This is what Prometheus et al is not" or "We're not going to work on x" sections in various places but it's often hard to reason about a negative. There are numerous areas, the recent rate/xrate discussion for example, where a similar approach would be beneficial.

Additionally, a strong focus on plugins, interfaces and integration points made easier or at least better documentated. Excellent examples of this are notification mechanisms and service discovery plugins. For example, if the core team is not accepting new submissions here then investing in making it easier for people to contribute/plugin their own with detailed documentation on standards and integration points and improved "plugability" would resolve a lot of folks frustration.

I think Prometheus is a great project and I enjoy working with it. But I see a lot of people driven away because they feel they are not welcomed. And that's a great shame.

@sgissi
Copy link
Contributor

sgissi commented May 31, 2018

Just my 2 cents as an sporadic contributor: I would not call it unfriendly but certainly there is strong resistance on any proposed change. It can be a bit of a let down and annoying at times but I understand the reasoning: the small set of core devs will be the ones spending their free time supporting the feature after the original requester moved on. When proposing changes, I expect that I have to defend my case, support the rationale with real-world scenarios and propose a PR. Every time I proposed something, I always got resistance but at the same time, received guidance and feedback very quickly on every iteration to get it merged. I'm sure a dev could have implemented the requests in far less time than it took to help me.

As for the issue at hand, surprisingly there isn't much touch points for a new notifier, just the notify and config packages. The lack of modularity does shows up there with a lot of code repeated over and over. AM would benefit from a more decoupled notifier structure, bonus points for being implemented in one self-contained file per notifier.

I have been working on a package (in my limited spare time) to receive webhook events and call a notifier via the Notifier interface. The main idea being to serve as an incubator of sorts so that mature and well supported notifiers could make its way into AlertManager and, similarly, notifiers that are not actively being used or not being maintained as promised would have a clear way to be spun out. A more self-contained notifier structure would make this effort a lot easier.

Just a nit, a "webhook" Prometheus SD was proposed at #3602 but the agreed generic integration way for Prometheus SD is file_sd. Would be great to make this more consistent between the two.

@RichiH
Copy link
Member

RichiH commented May 31, 2018

@jamtur01 if you feel you gained "outside" understanding of some whys, please create PRs for docs. There's a short, magical time frame in which people are enough of an outsider to see some things with some distance. It would be great if you used that.

@sgissi the resistance will not go away in all cases, as you said yourself, there can be a lot of reasons for some decisions in a certain context. But it should be underpinned with a way to realize why something is the way it is. This also makes it easier for people to engage with the underlying facts and assumptions, resulting in a better discussion.

@jamtur01
Copy link
Contributor

@RichiH I have. Numerous ones. And am writing a book about Prometheus. Doesn't address any of the other issues I raised though.

@calston
Copy link
Author

calston commented May 31, 2018

@RichiH Then let me make one very simple suggestion here to not making everyone feel like they're having a door slammed in their face - don't use "Close and comment" for issues, maybe explain your POV and then leave at least some window for yourself to actually be wrong.

That is unless you're suggesting all of your reasons are 100% right, all of the time, with no discussions entertained. You must admit at least some, if not an extremely large amount, of that is also very insulting.

@juliusv
Copy link
Member

juliusv commented May 31, 2018

@jamtur01 Oh totally agreed. The problem is known and we need to do a better job on one hand surfacing the "why"s, but also in the tone and verbosity of responses when there is a disagreement or rejection. As @RichiH mentioned, we are in the process of both spreading awareness around this, but also documenting processes and reasonings better. It would be great if in such a case, a reply would start with "Thank you for X, but please be aware that we have decided Y a while ago, and you can see the reasoning at link Z.". And also leave a method to appeal to a wider team decision if there is no clear team consensus on something yet. I think that would do a lot to make things more welcoming, and on the plus side, I think the team is also fundamentally behind that (it just has sometimes been difficult for people in the heat of things).

That said, it should still be ok for an open-source project to make its own decisions on what to accept or reject, as long as it happens in a reasoned and polite fashion. I do think there is a big value in saying "no" in a good way. But I hope we can all agree on that part.

Btw., thank you for your contributions so far!

@jamtur01
Copy link
Contributor

@juliusv Thanks. Gracious as ever. I fully agree the project needs autonomy and a decision-making process, just that the process needs to default to open, reasonable, polite, transparent, and documented. It's not a small challenge as I well know! :) I look forward to seeing what evolves.

@juliusv
Copy link
Member

juliusv commented Jun 2, 2018

Relevant: prometheus/docs#1038

If anyone has great examples of other OSS projects documenting this kind of thing, please let us know.

@RichiH
Copy link
Member

RichiH commented Jun 7, 2018

@calston did you have time to spec something out, already? If yes, I would be interested to have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants