Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding to Grafana dashboards #520

Closed
tsloughter opened this issue Jul 28, 2017 · 28 comments
Closed

Adding to Grafana dashboards #520

tsloughter opened this issue Jul 28, 2017 · 28 comments

Comments

@tsloughter
Copy link
Contributor

First, is there a chat where operators are discussed? Questions like this I would first bring there if there was a gitter/slack/irc channel for discussing operators. I try to bring the helm specific questions to the helm-users channel on slack first, but often can't find answers.

So I'm trying to figure out a good way of adding dashboards to a grafana deployed through the prometheus-operator's grafana helm chart. When I had been deploying prometheus-operator through kube-prometheus scripts I would have my application's helm chart write over the grafana configmap with a new one. However, this is not possible when grafana was deployed with a separate helm chart.

I wanted to know if anyone had solved this issue. Not necessarily a full solution for dynamically adding dashboards for resources, but at least being able to recreate/rewrite what grafana's dashboard volume is made of.

@brancz
Copy link
Contributor

brancz commented Jul 31, 2017

We don't have a channel dedicated to the Prometheus Operator, however, lots of people use the #prometheus (on freenode) irc channel or sig-instrumentation on the Kubernetes slack.

As I don't use the grafana helm chart I cannot comment reliably on the functionality. As far as I understand, the grafana chart should be independent of the dashboards it is deployed with and only when deploying the kube-prometheus chart the grafana chart should be used to be deployed with a set of dashboards.

@weiwei04 being the maintainer of this chart, he can probably more reliably answer your questions.

@tsloughter
Copy link
Contributor Author

@brancz thanks!

Sure, the issue is, at least for me, wanting to have an unmodified kube-prometheus and grafana charts that get installed and then when I install another app through a chart that needs specific dashboards they have to manually added each time. Or grafana has to be a dep of each chart, and then include its dashboards, and then the user switches grafanas to see the different dashboards.

A separate issue is it seems values.yaml can't be templated... So I haven't found a way to populate the grafana serverDashboardFiles variable from a set of json files.

@weiwei04
Copy link
Contributor

weiwei04 commented Aug 1, 2017

@tsloughter

  1. The grafana chart under github.com/coreos/prometheus-operator/helm is mainly for kube-prometheus, when helm install kube-prometheus, prometheus(with predefined k8s monitor targets, alertrulles), alertmanager and grafana(with pre-installed dashboard for k8s) are installed together with one cmd.

  2. when I install another app through a chart that needs specific dashboards they have to manually added each time. Or grafana has to be a dep of each chart, and then include its dashboards, and then the user switches grafanas to see the different dashboards. I do have the same need as you mentioned, to reuse a already installed grafana to dashboarding some other apps, but for now I haven't figured out a appropriate solution to add dashboard into a already installed grafana besides use the grafana webui import.

Here's my current thought: In our user scenario, there are two kinds of users: 1). the k8s administrator, only cares about k8s system metrics and some addon metrics(ingress, storage-class provisioner etc). 2). the k8s user who cares about their containers resource metrics and some other business metrics. The need for dashboarding may be divided into a), b), c), d):
a). For k8s system metrics, dashboarding these metrics are easy since they are very specific and common, so they can be pre-installed by prometheus-operator's grafana chart(the grafana chart values.yaml serverDashboardFiles).
b). For k8s addon metrics that depends, some k8s cluster may use nginx as ingress, some may use træfik, they all provide prometheus metrics and have community contributed grafana dashboard. user need is when install these addons, the dashboard will automatically installed into a grafana instance.
c). For user installed apps(like mongodb, redis), some may have exporter to provide prometheus metrics and have community contributed grafana dashboard, user need is when install these apps, the dashboard will automatically installed into a grafana instance.
d). For business metrics.

For a). b). c), maybe a grafana-operator(define Kind: Grafana, Kind: GrafanaDashboard), and apps chart include a Kind: GrafanaDashboard yaml describes grafana dashboard. @brancz @tsloughter what do you think, about 2 weeks later, I will have full time to improve the grafana chart(since we use it a lot), add dashboard to grafana is a feature to consider, advises, desired features are welcomed.

Grafana Operator

For d). I think user may be create their dashboard from scratch.

For now, the manual solution to reuse a grafana instance to include more dashboard is to use grafana webui import.

  1. values.yaml can't be templated, yes values.yaml can't be templated, if you want to preinstall your own dashboard, cp values.yaml and replase serverDashboardFiles with your own, helm install grafana --name $RELEASE_NAME --values values.yaml, but I don't think it's a easy way, since grafana dashboard structs is TL,DR the values.yaml will very long and hard to debug, I recommend you use grafana webui import.

@brancz
Copy link
Contributor

brancz commented Aug 1, 2017

There was already some thought on adding Grafana support to the Prometheus Operator, see: #115.

Our general idea around using Grafana is that it needs to be completely stateless. Therefore any solution involving web UI import is highly discouraged (same applies for the current Grafana + grafana-watcher usage). It is important to understand that when a Grafana Pod dies, it will not have the same database, but it will again be provisioned from the ConfigMap.

The idea is that new dashboards are either created through a meta language (still to be implemented, but possibly weaveworks/grafanalib), or via the web UI, and immediately after completion exported (and deleted) via the web UI and then stored in some form of version control, from where the dashboards will then be consistently deployed to the ConfigMap and then to the Grafana instances through the established workflow with the grafana-watcher.

In that sense, the Grafana chart itself needs to be completely bare-bone, and not make any assumptions, other than that it will get datasources and dashboards which are to be provisioned. We should not always include the pre-built kube-prometheus dashboards, those should be values that are chosen by a user when deploying that chart.

To recap, the grafana-watcher was implemented so that Grafana is stateless, therefore importing through the web UI is highly discouraged.

@weiwei04
Copy link
Contributor

weiwei04 commented Aug 1, 2017

Thanks for the replay, my mistake. We use pvc(backed by ceph rbd) to store grafana state, if the Pod dies, the same pv will attach to the new grafana Pod, all the state are recovered. Yes, importing through the web UI is not apply everyone and should highly discouraged.

Make grafana complete stateless, by use configmaps to persistent the dashboards and use grafana-watcher to provision grafana suits more people, I'll keeps this in mind :)

Back to the question Adding to Grafana dashboards

  1. As far as I understand, the grafana chart should be independent of the dashboards it is deployed with and only when deploying the kube-prometheus chart the grafana chart should be used to be deployed with a set of dashboards

I should move pre-installed serverDashboardsFiles into kube-prometheus values.yaml and keep grafana chart clean.

  1. to adding a grafana dashboards into a grafana, one way is to add the dashboard into the corresponding configmap(be aware of configmap size limit).

@brancz
Copy link
Contributor

brancz commented Aug 1, 2017

That sounds perfect. That way we can ship sensible defaults with the kube-prometheus chart(s), but still leave them bare bone. Essentially as I view kube-prometheus, the only thing the "package" should include are the things that make the overlaying dashboards/rules work, eg. the Kubernetes services, and plugging together all the components like Prometheus and Alertmanager. These dashboards and rule files should purely be values that are configured by each user, but we can ship sensible but extensible example defaults.

@weiwei04 from the issues that people are having it seems that the kube-prometheus charts need some work, if you are interested in cleaning those up, I'm more than happy to move forward to progress on that end with your help.

Thanks for all your help @weiwei04 ! And always good to get a healthy discussion started @tsloughter, thanks for kicking off this one!

@brancz
Copy link
Contributor

brancz commented Aug 1, 2017

Basically what I'm saying is, if you two could help review the helm chart PRs and issues we'd highly appreciate that as we're not using helm ourselves, but would still love to ship high quality packages. We're just lacking contributors on the helm end. Let me know how I can help.

@weiwei04
Copy link
Contributor

weiwei04 commented Aug 1, 2017

Yes, I'd like to help maintain the kube-prometheus charts(PRs and issues are welcome), but I don't have a public available helm registry to publish these charts, since the charts in kube-prometheus are in @mgoodness registry.

@brancz
Copy link
Contributor

brancz commented Aug 1, 2017

@weiwei04 that won't be the problem. When we have them in a good state, I'm sure @ant31 can help us setup a helm repository in the quay helm registry (or before that to do pre-releases).

@tsloughter
Copy link
Contributor Author

@brancz yea, I'm looking for a stateless solution as well. Happy to help any way I can (I still need to read through this whole thread). I'm currently working on a blog post for our company website that details using these helm charts and grafana dashboards was one of the last pieces to figure out.

"Simplest" solution (not sure how simple it actually would be) I can think of is to either build a single config map from selecting other config maps that declare dashboards, which is then mounted to the pod. Or building a list of configmaps to mount (assuming the grafana program can accept multiple directories of dashboards?) and modifying the grafana deployment to mount them and add to the list of directories passed to the grafana executable.

@brancz
Copy link
Contributor

brancz commented Aug 1, 2017

That's one thing where having a controller/operator would be useful 🙂 . It's essentially the same thing as we're doing with rule files in the Prometheus object.

@eedugon
Copy link
Contributor

eedugon commented Aug 1, 2017

I'm very interested on this topic (more on the grafana overall solution you are discussing than the helm charts, but both are important).

Regarding the grafana dashboards and for short term solution (to allow users to add / remove dashboards easily), I would propose a bash script/tool that puts all dashboards into the ConfigMap yaml definition, and then updating the ConfigMap with kubectl should apply the changes. The tool could also allow "adding" or "removing" dashboards from the ConfigMap definition. Or even dumping the configMap definition from a running system just in case the dashboard files are lost.

But as @weiwei04 has mentioned... if there's a size limit in the config map we might reach it.
The size limit could also be checked in that script.
If you are interested on that I could do it, while you work on a better/long term solution defining new object types like "Grafana", "GrafanaDashboard", etc...

I think I will have to build the tool anyway to avoid using "make generate" for that activity.... unless you know a better procedure at the moment.

@tsloughter
Copy link
Contributor Author

Yea, an operator/controller would be best since it would work no matter what way grafana is installed to kubernetes.

It shouldn't be a very complicated task to do for grafana (compared to prometheus/alertmanager/etcd), right? If no one else has started on it I will take a wack at it.

Any pointers on if it should just be a custom controller or an operator, and links to any resources that might be helpful are much appreciated :)

@brancz
Copy link
Contributor

brancz commented Aug 2, 2017

@tsloughter we've thought about actually integrating that directly into the Prometheus Operator (which for would make the name slightly odd, so it might actually be best to develop outside. The great thing about having it in the Prometheus Operator would be that we can make a lot more assumptions for it's usage, but given we're all on this thread I think we agree on the datasource type 🙂 .

The size limit is exactly what we hit with Prometheus rule files as well, which is why we made it a label selector, I'd imagine to do the exact same thing. Other than that I agree @tsloughter, running grafana, especially in a stateless way is much much easier than Prometheus/Alertmanager/etcd as we by definition don't have to worry about state, it basically just templates the Deployment with the correct ConfigMaps.

Let me know when you've thrown something together @tsloughter ! Also I'd be happy to discuss design decisions as we've already thought about this a lot.

@eedugon regarding:

I would propose a bash script/tool that puts all dashboards into the ConfigMap yaml definition

This sounds like it could be an additional script in the hack/... collection of kube-prometheus, everything except the retrieval of the dashboard definitions actually already exists. See:

@ant31
Copy link
Contributor

ant31 commented Aug 2, 2017

@weiwei04 that won't be the problem. When we have them in a good state, I'm sure @ant31 can help us setup a helm repository in the quay helm registry (or before that to do pre-releases).

  1. install the helm plugin: https://github.com/app-registry/appr-helm-plugin/#install
  2. login to quay.io from cli:
helm registry login -u USERNAME -p PASSWORD quay.io
  1. go the helm directory (where there is the Chart.yaml)
helm registry push quay.io/$NAMESPACE/
  1. Go to quay.io (the front-end) and turn the app to 'public', in the setting view

to install it's:

helm registry install quay.io/ns/app -- [HELM_OPTS]

@weiwei04
Copy link
Contributor

#558 open a pr to delete pre-installed dashboards for kube-prometheus.

and thanks @ant31 for help to set up a registry, now the grafana chart can install with

helm registry install quay.io/alexwei/grafana [HELM_OPTS]

But the quay.io registry use a private spec different with the helm http registry spec, for now, I have't find out a proper way to include grafana into kube-prometheus. I will try to think other ways.

@tsloughter
Copy link
Contributor Author

I was out for a week but now looking at this again :). I noticed some PRs/issues that look related to grafana dashboards since then and will first look through those, but let me know if anyone knows of ones that definitely relate.

@brancz
Copy link
Contributor

brancz commented Aug 15, 2017

#556 and #535 are the PRs/issues to watch in this regard. They're not helm related, but are most likely going to play nicely into your helm setup once integrated.

@tsloughter
Copy link
Contributor Author

Cool. I'll look at those. But after looking at it I don't see myself getting around to working on an operator for this anytime soon, so hopefully someone else is interested :). Also, I wrote a short blog post about how we are using the operator and helm charts and installing dashboards for now https://spacetimeinsight.com/installing-monitoring-erlang-releases-kubernetes-helm-prometheus/ -- and if anyone reads it and see something wrong with how I'm doing the operator configuration please let me know :)

@tsloughter
Copy link
Contributor Author

Is there a reason the grafana-watcher watches a directory for config map changes instead of acting like an operator that watches kubernetes events and then POSTing them to grafana directly from the ConfigMap contents?

If the watcher watched for new config maps (or changes to existing ones) and checked if they were grafana dashboards, then importing them through the grafana api directly from the configmap this would solve the issue of only currently having one source for dashboards.

@tsloughter
Copy link
Contributor Author

So I did a quick thing, https://github.com/tsloughter/grafana-operator

This is what I was thinking in terms of loading the dashboards from ConfigMaps based on an annotation: https://github.com/tsloughter/grafana-operator/blob/173b1fadcb97e3ba042aea5fd564314eb0726618/pkg/controller/controller.go#L73-L89

Not sure if it makes sense as a sidecar or as part of prometheus-operator?

@brancz
Copy link
Contributor

brancz commented Sep 4, 2017

The reason why the grafana-watcher is a side-car is because this way it only needs to worry about consistency with the files on disk and the single "local" grafana instance. Whereas when you were to build this into an operator then a) that operator needs to perform all of those requests to all grafana servers, which is a NetworkPolicy I wouldn't want in my cluster b) the Operators are scoped to managing Kubernetes objects rather than interacting directly with the software.

A grafana Operator would be neat anyways, because we can handle higher level concerns, like how do we handle multiple configmaps being mounted, sane deployment strategies, etc. (whether it would belong in the Prometheus Operator is an open question)

@tsloughter
Copy link
Contributor Author

Ok, gotcha.

I'm going to continue working on the grafana operator when I have time. Hopefully others who are interested in Grafana being handled this way will see this issue and help out :). Going to add a list of what needs to be done for it to be actually usable to the readme.

@brancz
Copy link
Contributor

brancz commented Sep 18, 2017

With the recently added scripts by @eedugon I'm a bit skeptic whether a full blown Grafana Operator might be overdoing it, the scripts automatically split up the Grafana Dashboard Json definitions into multiple ConfigMaps and templates the Grafana deployment accordingly.

With all this tooling in place all one needs to do is export a Grafana dashboard through the UI and drop it into the grafana assets folder and run make generate, and it's all sharded into ConfigMaps with the Deployment templated. Then just running a kubectl apply over the generated files does the trick.

I feel we're at a pretty good point in terms of tooling around this, so I'll close this issue at this point, but feel free to re-open or continue the discussion if you feel there is a need 🙂 .

@brancz brancz closed this as completed Sep 18, 2017
@tsloughter
Copy link
Contributor Author

Ok. I disagree, but understand :). I'd want dashboards to work after a helm install pkg with no additional manual interaction. But I'm fine leaving this closed, for now at least.

@brancz
Copy link
Contributor

brancz commented Sep 18, 2017

We figured that that would also be possible by tweaking the grafana-watcher slightly, similar to how the prometheus-config-reloaded works.

@japaniel
Copy link

@brancz Was there further work/thought on slightly tweaking the grafana-watcher to allow for annotations to add dashboards? A PR that my searching for isn't coming up with perhaps? I would really like to use this in conjunction with a helm deployed grafana(watcher) and a dashboard deployed by an app. I understand the network policy concern as a singleton deploy of grafana, however my current setup is more towards allowing app devs to modify their own dashboards according to their needs and the platform devs (myself included) would worry about managing the rest of the stuff to make that work. i.e. network/rbac policies, separate (or not) grafana's and prometheus'

Again this is likely not a concern of the prometheus operator, but I am looking for a place that further work/discussion might have taken place.

@brancz
Copy link
Contributor

brancz commented Jan 19, 2018

@japaniel modifying existing dashboards is not a problem - annotations included. The process may just be a bit different than what you're doing today. The process how I recommend doing it is modifying the dashboard either via UI and export the result or the source directly (we're going to be working on some neat things here, so stay tuned). Then you open a pull request against your infra repo, or whereever the dashboard is versioned. Once merged it's deployed from disk as any other dashboard.

Where the experience lacks a bit today is the modifying + export in the UI as well as the text editing one, but this is going to get better 🙂 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants