Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grafana-dashboards ConfigMap is invalid (metadata.annotations too long) #535

Closed
eedugon opened this issue Aug 2, 2017 · 42 comments
Closed

Comments

@eedugon
Copy link
Contributor

eedugon commented Aug 2, 2017

What did you do?
Added dashboards to assets directory and recreate + apply the ConfigMap

What did you expect to see?
ConfigMap updated and applied.

What did you see instead? Under which circumstances?
Error in ConfigMap:
The ConfigMap "grafana-dashboards" is invalid: metadata.annotations: Too long: must have at most 262144 characters

Environment

  • Kubernetes version information:
    1.7 and 1.6 (tested with both)

  • Kubernetes cluster kind:

kops towards AWS

  • Manifests:
    Not much relevant, it's a matter of size. If I add only one or 2 .json files it works fine, but when I try to add the 5 json files I have it fails.
    Here I paste the output...
$ ls -1 assets/grafana/
all-nodes-dashboard.json
deployment-dashboard.json
kafka-brokers-dashboard.json
kafka-brokers-topics-dashboard.json
kafka-jvm-dashboard.json
kafka-streams-dashboard.json
kafka-zookeepers-dashboard.json
kubernetes-pods-dashboard.json
node-dashboard.json
prometheus-datasource.json
resource-requests-dashboard.json

$ make generate
>> Compiling assets and generating Kubernetes manifests

$ kubectl -n monitoring apply -f manifests/grafana/grafana-dashboards.yaml
The ConfigMap "grafana-dashboards" is invalid: metadata.annotations: Too long: must have at most 262144 characters

346803 bytes --> manifests/grafana/grafana-dashboards.yaml

This looks like a clear kubernetes limit... but I don't fully understand it, and the generated ConfigMap manifest is not that big....
What would you suggest?

Is there a way to increase that limit?

Thanks!

@brancz
Copy link
Contributor

brancz commented Aug 2, 2017

A ConfigMap has a size limit of 1Mb, this cannot be changed to my knowledge. The solution would be to shard the dashboards across multiple ConfigMaps, this is a bit more complicated, but generally possible, as we just need to bin-pack (no need for a fancy algorithm for this, just loop through the json files until 1Mb is reached) the dashboard json files into ConfigMaps, that then just end with an increasing numeric suffix.

Now the problem you encountered is actually even more tricky, because when you use kubectl apply, the old content of the configmap is saved in an annotation, you can disable that, but it's there for being able to roll back. So that means that our limit is now actually not 1Mb, but it has just been cut in half to ~500kb. The actual numbers might be slightly different in practice, but this is the theory.

Do you think you can give this a go and or investigate more?

@eedugon
Copy link
Contributor Author

eedugon commented Aug 2, 2017

mmm sure, thanks for the hints. I will take a deeper look and write back the conclusions.

@eedugon
Copy link
Contributor Author

eedugon commented Aug 2, 2017

Thanks to your hints, I have realized that the size limit reached in this case is the "annotations" size limit of kubernetes, which is 256 KB and not the ConfigMap limit which is 1MB.

That limit was discussed here:
kubernetes/kubernetes#15878

The only way I have found to "skip" the annotation is issuing a kubectl replace instead of kubectl apply. That works perfect (because my manifest is smaller than 1MB), but as you said we lose the "rollback" option (I would assume that the user of this should have the dashboards under control and if something is lost, he could load a previous version of the yaml somehow).

I will test later the "bin pack" proposal. If I have understood correctly, you mean creating ConfigMaps like grafana-dashboards-0, grafana-dashboards-1, ... each of them with less than 1MB, right? Are you confident that will work? :-) or should they be called "grafana-dashboards0", "grafana-dashboards1", ... ?

With this bin-packing... can I really "break" a json file into 2 different ConfigMaps? If so then I could create a tool with a configurable limit to write multiple ConfigMaps based on the json files + the configured limit. Or if I always have to start and end the ConfigMap with complete json files let me know to make it accordingly.

@brancz
Copy link
Contributor

brancz commented Aug 2, 2017

We can get creative and actually cut the json strings if we get the problem of single dashboards being larger than the limit, but I'd worry about that when we have that problem. For now I think just full dashboards bin-packed into the enumeration should suffice. Regarding the naming, I'm confident both namings work, personally I'd prefer

grafana-dashboards-0, grafana-dashboards-1, ...

but that's just my taste I guess.

Other than that, everything you wrote sounds good to me. Looking very much forward to this!

Also if you haven't yet and are interested I'd recommend you to check out weaveworks/grafanalib, I believe we could build some nice abstractions on top of this library as 98% of all graphs are the same and don't need the detailed configuration all the time that grafana offers, and if we generate the dashboards with that library we could write a nice little python utility that does all of the above in one nice tool (this is just an idea that's been floating through our heads for a while but haven't been able to extensively look into it).

It's awesome that you're taking a go at this!

@eedugon
Copy link
Contributor Author

eedugon commented Aug 2, 2017

Hi @brancz,

I have prepared the tool and I generate and apply the multiple configmaps based on files' sizes without problem, but it looks grafana pod doesn't really start (status = ContainerCreating) when we have the splitted configmaps.

These are the configmaps (with 6 and 5 files each):

$ kubectl -n monitoring get configmap
NAME                   DATA      AGE
grafana-dashboards-0   6         8s
grafana-dashboards-1   5         7s
prometheus-k8s-rules   10        4h

And this is the status of grafana pod:

grafana-513601738-255l2               0/2       ContainerCreating   0          7m

If with the same tool I prepare a manifest with only one ConfigMap (with name: grafana-dashboards only), then the pod starts properly.

Any guess? I also attach here one example of a generated manifest with 2 ConfigMaps.
grafana-dashboards-configMap-2017-08-02-203244.yaml.zip

And the output of the tool just in case...

# Grafana Dashboards ConfigMap will be created into file: grafana-dashboards-configMap-2017-08-02-203244.yaml
# Processing dashboard node-dashboard.json
# Processing dashboard all-nodes-dashboard.json
# Processing dashboard kafka-brokers-dashboard.json
# Processing dashboard kafka-jvm-dashboard.json
# Processing dashboard kafka-streams-dashboard.json
# Processing dashboard deployment-dashboard.json
# Processing dashboard kubernetes-pods-dashboard.json

# Size limit reached. Processing 226454 bytes. Creating configmap with id 0

# Processing dashboard kafka-zookeepers-dashboard.json
# Processing dashboard kafka-brokers-topics-dashboard.json
# Processing dashboard resource-requests-dashboard.json
# Processing datasource prometheus-datasource.json

# Size limit not reached (77250). Adding remaining files into configmap with id 1

# Process completed, configmap created: grafana-dashboards-configMap-2017-08-02-203244.yaml
# Summary
# Total files processed: 11
# Total amount of ConfigMaps inside the manifest: 2

configmap "grafana-dashboards-0" configured
configmap "grafana-dashboards-1" configured

# ConfigMap updated. Wait until grafana-watcher applies the changes and reloads the dashboards.

And that never happens :)

@brancz
Copy link
Contributor

brancz commented Aug 3, 2017

Hmm I think this might happen because the grafana watcher is currently only made to watch a single directory for dashboard changes.

We may need to adapt the grafana watcher to allow watching multiple directories.

As a side note, the manifests look good, so nice job on the tool! 🙂

@eedugon
Copy link
Contributor Author

eedugon commented Aug 3, 2017

Ok, I understand.
Then for the moment, and as a workaround, we could use kubectl replace instead of apply for the dashboards in case of problems (as if we use replace the limit will become 1MB instead of 256KB).

We can keep this tool on hold until it's really necessary. For the moment I will configure a big limit size in the tool and then it will always generate single configmaps. If you want me to update or share the tool somehow for your review let me know. Currently is a standalone small tool, but the algorithm could be integrated easily inside one of the hack/scripts/ files to be handled by make generate.

But if you want to move in another direction let me know.
Currently my coding skills are mainly bash, so I don't think I can help on updating or improving grafana-watcher, unless is written in bash =)

@brancz
Copy link
Contributor

brancz commented Aug 3, 2017

Great! I'd love to integrate it nonetheless, for us to know when we absolutely need this, and it can turn out to be useful for the Prometheus rule files as well. I can't promise when I'll get to improve the grafana-watcher to allow watching multiple directories, but I see the importance.

@eedugon
Copy link
Contributor Author

eedugon commented Aug 3, 2017

For the moment you can take a look at this:
https://github.com/eedugon/grafana-dashboards-configmap-generator

Feel free to try it and let me know your thoughts, it's completely harmless and it will leave the generated manifest in the "output" directory.

Don't set APPLY_CONFIGMAP to "true" if you don't want kubectl apply to be executed :)

As you will see, with the current dashboards (only the 6 main files from your project) the tool generates exactly the same file as yours (only 1 configmap), but if for example you set DATA_SIZE_LIMIT to 100 KB the tool will generate 2 configmaps, like...

$ DATA_SIZE_LIMIT="100000" bin/grafana_dashboards_generate.sh
# Starting execution of grafana_dashboards_generate on 2017-08-03-185512
# Configured size limit: 100000 bytes
# Grafna input dashboards and datasources will be read from: templates/grafana-dashboards
# Grafana Dashboards ConfigMap will be created into file:
output/grafana-dashboards-configMap-2017-08-03-185512.yaml

# File node-dashboard.json : added to queue
# File all-nodes-dashboard.json : added to queue
# File deployment-dashboard.json : added to queue

# Size limit (100000) reached. Processing queue with 91840 bytes. Creating configmap with id 0

# File kubernetes-pods-dashboard.json : added to queue
# File resource-requests-dashboard.json : added to queue
# File prometheus-datasource.json : added to queue

# Size limit not reached (29182). Adding remaining files into configmap with id 1

# Process completed, configmap created: grafana-dashboards-configMap-2017-08-03-185512.yaml
# Summary
# Total files processed: 6
# Total amount of ConfigMaps inside the manifest: 2

# To apply the new configMap to your k8s system do something like:
kubectl -n monitoring apply -f grafana-dashboards-configMap-2017-08-03-185512.yaml

@brancz
Copy link
Contributor

brancz commented Aug 4, 2017

This is great! Nice stuff, I'm hoping to get to improving the grafana-watcher soon!

@eedugon
Copy link
Contributor Author

eedugon commented Aug 4, 2017

Perfect, no rush, when it's ready then I will check the best way to integrate it or part of it in the make generate process that you own.
Although maybe the user doesn't want to regenerate all promtheus rules + grafana stuff at the same time and this approach (a separate script) is also valid.

For the moment I have changed the limit to 1MB and using "replace" instead of "apply" and all is working fine in my system, I can update the dashboards smoothly with this script that also does the kubectl replace.

Actually what I do for a new deployment is deploying prometheus operator with the official dashboards and right after updating the dashboards with the tool, instead of changing prometheus-operator default manifest.

@brancz
Copy link
Contributor

brancz commented Aug 4, 2017

That sounds fine. Whatever workflow works for you 🙂

@unguiculus
Copy link
Contributor

We had the same issue. The workaround with kubectl replace is nice but after adding more dashboards we also hit this limit. We have to use multiple config maps. In order to make this work I updated the Grafana Watcher so it can monitor multiple directories: #556

@eedugon
Copy link
Contributor Author

eedugon commented Aug 14, 2017

Thanks a lot @unguiculus !
@brancz , in case @unguiculus proposal works fine for multiple config maps like "grafana-dashboards-0", "grafana-dashboards-1", etc... should we integrate my code / tool into the project?

Let me know if you want me to propose something.
Btw, @unguiculus , have you tried https://github.com/eedugon/grafana-dashboards-configmap-generator ?

@unguiculus
Copy link
Contributor

@eedugon Looks interesting but have not tried it. I'm currently grouping dashboard files into multiple folders. I changed the shell scripts to create one configmap per folder. Works fine with the updated Grafana Watcher.

@brancz
Copy link
Contributor

brancz commented Aug 15, 2017

I was out sick yesterday, but so far this all sounds perfect. I'll get to your PR right away @unguiculus, then we can work on intregrating your work @eedugon. Nice work everyone!

@brancz
Copy link
Contributor

brancz commented Aug 15, 2017

I merged #556, I think it would now be appropriate to merge your tool @eedugon into the kube-prometheus scripts directory, and make it a seamless experience, wdyt?

@eedugon
Copy link
Contributor Author

eedugon commented Aug 15, 2017

Yes, @brancz , I will prepare it as part of kube-prometheus, let me work on that and i will propose a PR soon for your validation.

@eedugon
Copy link
Contributor Author

eedugon commented Aug 15, 2017

@brancz : PR 567 raised.
We could test it together with the new grafana watcher, if they are really aligned, because i'm not sure after reading latest comment from @unguiculus .

@unguiculus : What do you mean with one configmap per folder?

What I have prepared is the processing of N dashboards files, and splitting them in multiple confimaps based on a ocnfigurable size limit. So, if the total bytes of the json files is not higher than the size limit, only one configmap will be created "grafana-dashboards", but if the limit is exceeded, then multiple configmaps will be generated.

What I was expecting is a grafana watcher able to work with configmaps labeled "grafana-dashboards-0", "grafana-dashboards-1", etc... Would that be supported in your modified release?

@unguiculus
Copy link
Contributor

@eedugon This is just how I set it up. For each folder, I create one configmap with multiple dashboards. You can use any number of configmaps.

@eedugon
Copy link
Contributor Author

eedugon commented Aug 30, 2017

@unguiculus , @brancz , quick update: I will test this early next week, as I'm still out on vacations ;-)

@brancz
Copy link
Contributor

brancz commented Sep 4, 2017

Absolutely no worries! I just got back from vacation myself and slowly working through things, take the time off of GitHub 🙂 burnout is a real thing.

@stevenpall
Copy link

I just ran into this issue as well. I'm wondering if another option might be to follow a configmap-per-dashboard model? That way it's clear which configmap contains what data, and one would never encounter this max size limitation unless they had a single dashboard config > 1 MB.

Looking at the current Grafana deployment, I'm thinking the only tricky part might be figuring out how to volume mount n dashboard configmaps.

@brancz
Copy link
Contributor

brancz commented Sep 8, 2017

Absolutely, basically this is what a grafana operator could do similarly to what we do today in the Prometheus Operator with Prometheus objects selecting ConfigMaps containing rule files via a selector.

@eedugon
Copy link
Contributor Author

eedugon commented Sep 9, 2017

HI @brancz , sorry for the delay.

I have tested the multiple configmaps I generate with the tool to split .json dashboards into multiple configmaps based on a configurable file limit, and it works fine, as expected, but grafana-watcher doesn't detect the changes.

So, this is not a transparent method, because if we configure 4 configMaps (grafana-dashboards-0, grafana-dashboards-1, -2 and -3), grafana supports them but not without manually configuring grafana-watcher.

So, in this case, and as @unguiculus was explaining, we need to do this in grafana statefulset definition if we split the dashboards into multiple configmaps:

  • Add a volume in grafana for each new configmap
  • Mount the volumes in grafana-watcher container
  • Add --watch-dir options for each new configmap.

So, overally we are in the good direction, but we need something in grafana-watcher to try to detect automatically multiple configmaps with same naming scheme (xxxx-0, xxx-1, xxx-2, xxx-3, ...).

Or, if we don't like this approach, just developing a simple tool that when setting up grafana-dashboards it also set up grafana statefulset as well.

@stevenpall : one configmap per file is also possible, just considering the size limitation of the file.

@brancz
Copy link
Contributor

brancz commented Sep 13, 2017

Agreed, that's exactly what I could see an operator do, however, that might even be overdoing it dramatically, basically we just need something that generates the manifest when it generates the configmaps, which shouldn't be too hard to extend the scripts with.

@eedugon
Copy link
Contributor Author

eedugon commented Sep 13, 2017

@brancz : I can modify / generate the grafana manifest to be aligned with the configMaps, that won't be a problem at all.
Do you want me to include it in the proposal?
It will take just one or two hours to prepare it, just making a grafana manifest template to be processed after generating the configmaps.

@brancz
Copy link
Contributor

brancz commented Sep 13, 2017

If it's a quick one I'd say let's complete it now, while we're already at it.

@eedugon
Copy link
Contributor Author

eedugon commented Sep 14, 2017

got it working... I will update the PR

@brancz
Copy link
Contributor

brancz commented Sep 15, 2017

With #567 being merged, do you think we can close this @eedugon ?

@eedugon
Copy link
Contributor Author

eedugon commented Sep 16, 2017

Yeah, let's close this.
Just as a conclusion:
The issue about limit size of the configmap (metadata.annotations too long) should be fixed with this PR plus the previous update in grafana-watcher supporting multiple configmaps for dashboards.

With this PR, the make generate command will create N configmaps for grafana dashboards based on a sizelimit of 240000 bytes and will prepare a grafana-deployment.yaml aligned with the configmaps previously created.

The tool can be used also as a standalone tool to maintain only grafana dashboards and not all prometheus-operator manifests.

If someone faces an error with this new process a new issue should be raised :)

Closing the issue! Thanks all in the thread for your help!

@eedugon eedugon closed this as completed Sep 16, 2017
@brancz
Copy link
Contributor

brancz commented Sep 17, 2017

Once more, thanks @eedugon for your effort and involvement - this is an excellent result!

@mbugeia
Copy link

mbugeia commented Oct 3, 2017

I think the bug is still present in @eedugon generator. By using the script contrib/kube-prometheus/hack/scripts/generate-manifests.sh it generate a configmap that I wasn't able to deploy (same error, metadata.annotations too long)
By tuning the "-s" value to a lower number (150000) I was able to generate a proper configmap file.

@brancz
Copy link
Contributor

brancz commented Oct 4, 2017

This highly depends on the size of your previous configmap content (which is stored in an annotation). At some point I would recomment to just replace the manifest, and skip the annotation.

@eedugon
Copy link
Contributor Author

eedugon commented Oct 4, 2017

Thanks for sharing the info @mbugeia , good point. But I'm not sure about the best approach here...

As @brancz mentions, the reason is the size of the previous configmap stored in the system, not the new one (I believe).

Anyway I configured the 240000 limit by default without deep investigation, so in case 140000 works better we could consider it. The only caveat is that that limit will also limit the max size of a single dashboard file, and we need to take that into consideration.
I don't know exactly what information goes into the annotations section, but in case it's somehow the previous configmap + the new one, then the limit should be set probably to something closer to 256000 / 2, so around 125000. But as i don't know the first part of my statement I can only guess.
(But I would swear that I did some tests by applying multiple times new configurations with configmaps close to 240.000 bytes without problems).

As last comment, you could also try to run directly the script with -x --apply-type replace options. That will directly apply the changes using kubectl replace -n monitoring ... method.

@shadycuz
Copy link

@mbugeia Can you elaborate on this? I'm having the same issue with the same generation script.

@shadycuz
Copy link

@eedugon I'm having the same error as @mbugeia on the latest release and in a new namespace I still get the error (no previous configmap).

@eedugon
Copy link
Contributor Author

eedugon commented Nov 26, 2017

@shadycuz: Have you tried with -s 125000 ?
I you need help to test this let me know.

Interesting info about the error on a fresh new namespace, it would be nice to see the output of generate-manifests.sh and the kubectl apply.

@shadycuz
Copy link

@eedugon I don't know where to put -s or what it does. I ran the kubectl create -f and that worked fine.

$ bash ./hack/cluster-monitoring/self-hosted-deploy
namespace "review-testing-7r9j96" created
clusterrolebinding "prometheus-operator" configured
clusterrole "prometheus-operator" configured
serviceaccount "prometheus-operator" created
service "prometheus-operator" created
deployment "prometheus-operator" created
Waiting for Operator to register custom resource definitions...done!
daemonset "node-exporter" created
service "node-exporter" created
clusterrolebinding "kube-state-metrics" configured
clusterrole "kube-state-metrics" configured
deployment "kube-state-metrics" created
rolebinding "kube-state-metrics" created
role "kube-state-metrics-resizer" created
serviceaccount "kube-state-metrics" created
service "kube-state-metrics" created
secret "grafana-credentials" created
secret "grafana-credentials" unchanged
deployment "grafana" created
Error from server (Invalid): error when creating "manifests/grafana/grafana-dashboards.yaml": ConfigMap "grafana-dashboards-0" is invalid: metadata.annotations: Too long: must have at most 262144 characters

@prune998
Copy link
Contributor

prune998 commented Aug 1, 2018

Sorry to come back here, but I still have the exact same issue as mentioned here with latest prometheus-operator using the Helm chart deployment.
I'm on GKE-1.10.5/3

I'm wondering why we're not using a PVC to hold the dashboards instead of a Configfile ?

@Lord-Y
Copy link

Lord-Y commented Sep 29, 2019

@unguiculus thx for the tips, grouping them make it works

@pingvinton
Copy link

Sorry to come back here, but I still have the exact same issue as mentioned here with latest prometheus-operator using the Helm chart deployment. I'm on GKE-1.10.5/3

I'm wondering why we're not using a PVC to hold the dashboards instead of a Configfile ?

I think in this case the fault tolerance of grafana is lost, for example, something will happen to the server on which PVC and it turns out that even if the grafana service moves to another node it will not be able to get dashboards (in case of using local-storage of course).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants