-
Notifications
You must be signed in to change notification settings - Fork 661
Description
Describe the bug
The /api-metrics endpoint fails with the following response
$ curl -vvv --key metrics_server_key.crt --cacert metrics_server_ca.crt --cert metrics_server_cert.crt https://bosh-master.internal.paas:9091/api_metrics
< HTTP/1.1 500 Internal Server Error
< Server: nginx
< Date: Thu, 28 Oct 2021 13:53:40 GMT
< Content-Length: 5071
< Connection: keep-alive
<
Puma caught this error: undefined method `unpack' for nil:NilClass (NoMethodError)
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:238:in `block (2 levels) in all_values'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:236:in `each'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:236:in `map'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:236:in `block in all_values'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:271:in `with_file_lock'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:235:in `all_values'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:124:in `block in all_values'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:121:in `each'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:121:in `all_values'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/histogram.rb:84:in `values'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/formats/text.rb:30:in `block in marshal'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/formats/text.rb:26:in `each'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/formats/text.rb:26:in `marshal'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/middleware/exporter.rb:69:in `respond_with'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/middleware/exporter.rb:30:in `call'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/middleware/collector.rb:32:in `block in call'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/middleware/collector.rb:61:in `block in trace'
/var/vcap/data/packages/ruby-2.6.5-r0.29.0/b20c017a80a93122d1c536bbd2cfe28f2f6aef97/lib/ruby/2.6.0/benchmark.rb:308:in `realtime'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/middleware/collector.rb:61:in `trace'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/middleware/collector.rb:32:in `call'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/rack-2.2.3/lib/rack/common_logger.rb:38:in `call'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/sinatra-2.0.8.1/lib/sinatra/base.rb:231:in `call'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/puma-3.12.6/lib/puma/configuration.rb:227:in `call'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/puma-3.12.6/lib/puma/server.rb:706:in `handle_request'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/puma-3.12.6/lib/puma/server.rb:476:in `process_client'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/puma-3.12.6/lib/puma/server.rb:334:in `block in run'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/puma-3.12.6/lib/puma/thread_pool.rb:135:in `block in spawn_thread'
This is not reproducing on another environment, so I'm suspecting this is related to a problem in this lab environment, but this is hard to diagnose.
To Reproduce
Enabled metrics server as suggested in https://bosh.io/docs/director-metrics/
Expected behavior
- Helpful diagnostic message about invalid output being parsed
- Hints on how to troubleshoot the issue
Logs
the metrics-server logs are not showing errors
==> metrics_server.stderr.log <==
127.0.0.1 - - [28/Oct/2021:13:58:06 +0000] "GET /metrics HTTP/1.1" 200 - 0.0062
==> metrics_server.stdout.log <==
I, [2021-10-28T13:58:22.017392 #11] [] INFO -- MetricsServer: populating metrics
D, [2021-10-28T13:58:22.019427 #11] [] DEBUG -- Director: (0.000547s) (conn: 47272601494620) SELECT * FROM "director_attributes" WHERE ("name" = 'resurrection_paused') LIMIT 1
D, [2021-10-28T13:58:22.028558 #11] [] DEBUG -- Director: (0.008083s) (conn: 47272601494620) SELECT DISTINCT "type", count(*) AS "count" FROM "tasks" GROUP BY "type"
D, [2021-10-28T13:58:22.030640 #11] [] DEBUG -- Director: (0.000506s) (conn: 47272601494620) SELECT "type", "state", count(*) AS "count" FROM "tasks" WHERE ("state" IN ('processing', 'queued')) GROUP BY "type", "state"
D, [2021-10-28T13:58:22.034120 #11] [] DEBUG -- Director: (0.000447s) (conn: 47272601494620) SELECT * FROM "configs" WHERE ("id" IN (SELECT max("id") FROM "configs" WHERE ("type" = 'cloud') GROUP BY "name"))
D, [2021-10-28T13:58:22.080868 #11] [] DEBUG -- Director: (0.000284s) (conn: 47272601494620) SELECT count(*) AS "count" FROM "ip_addresses" WHERE (("network_name" = 'redacted') AND ("static" IS FALSE)) LIMIT 1
[...]
I, [2021-10-28T13:58:22.091951 #11] [] INFO -- MetricsServer: populated metrics
The director stderr logs include the same stack trace
/var/vcap/sys/log/director# less director.stderr.log
2021-10-28 15:06:54 +0000: Rack app error handling request { GET /metrics }
#<NoMethodError: undefined method `unpack' for nil:NilClass>
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:238:in `block (2 levels) in all_values'
/var/vcap/data/packages/director/4aade3c707199631e6d710c7f75a12f332b14bf3/gem_home/ruby/2.6.0/gems/prometheus-client-1.0.0/lib/prometheus/client/data_stores/direct_file_store.rb:236:in `each'
[...]
Versions (please complete the following information):
bosh 271.8.0*
running on vsphere
Deployment info:
If possible, share your (redacted) manifest and any ops files used to deploy
BOSH or any other releases on top of BOSH.
If you used any deployment strategy it'd be helpful to point it out and share as
much about it as possible (e.g. bosh-deployment, PCF, genesis, spiff, etc)
Additional context
The /var/vcap/store/director/metrics directory has associated metric_http_server_exceptions_total___25.bin file with NoMethodError content no much helping
bosh/c50e9533-15cf-447f-80c0-53c683c5a3cb:/var/vcap/store/director/metrics# ls -alrt /var/vcap/store/director/metrics | tail
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:26 metric_http_server_request_duration_seconds___21.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:29 metric_http_server_exceptions_total___37.bin
drwxr-xr-x 2 vcap vcap 4096 Oct 28 14:29 .
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:31 metric_http_server_exceptions_total___25.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:33 metric_http_server_requests_total___37.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:33 metric_http_server_request_duration_seconds___37.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:33 metric_http_server_requests_total___23.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:33 metric_http_server_request_duration_seconds___23.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:33 metric_http_server_requests_total___25.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:33 metric_http_server_request_duration_seconds___25.bin
bosh/c50e9533-15cf-447f-80c0-53c683c5a3cb:/var/vcap/store/director/metrics# cat metric_http_server_exceptions_total___22.bin
0exception=NoMethodError k@
The problem is'nt new on this environment.
bosh/c50e9533-15cf-447f-80c0-53c683c5a3cb:/var/vcap/store/director/metrics# grep NoMethodError *
Binary file metric_http_server_exceptions_total___20.bin matches
Binary file metric_http_server_exceptions_total___21.bin matches
Binary file metric_http_server_exceptions_total___22.bin matches
Binary file metric_http_server_exceptions_total___23.bin matches
Binary file metric_http_server_exceptions_total___24.bin matches
Binary file metric_http_server_exceptions_total___25.bin matches
Binary file metric_http_server_exceptions_total___28.bin matches
Binary file metric_http_server_exceptions_total___29.bin matches
Binary file metric_http_server_exceptions_total___31.bin matches
Binary file metric_http_server_exceptions_total___35.bin matches
Binary file metric_http_server_exceptions_total___37.bin matches
bosh/c50e9533-15cf-447f-80c0-53c683c5a3cb:/var/vcap/store/director/metrics# ls -alrt *exceptions*
-rw-r--r-- 1 vcap vcap 1048576 May 7 10:25 metric_http_server_exceptions_total___31.bin
-rw-r--r-- 1 vcap vcap 1048576 May 9 19:50 metric_http_server_exceptions_total___22.bin
-rw-r--r-- 1 vcap vcap 1048576 May 11 14:50 metric_http_server_exceptions_total___20.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 6 16:10 metric_http_server_exceptions_total___28.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 6 16:20 metric_http_server_exceptions_total___29.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 20 08:30 metric_http_server_exceptions_total___24.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:06 metric_http_server_exceptions_total___35.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:22 metric_http_server_exceptions_total___21.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:29 metric_http_server_exceptions_total___37.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:31 metric_http_server_exceptions_total___25.bin
-rw-r--r-- 1 vcap vcap 1048576 Oct 28 14:36 metric_http_server_exceptions_total___23.bin
How does the api-metrics endpoint collects its metrics ? Any way to inspect it to diagnose the root cause ?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status