Skip to content

emylincon/aws_quota_exporter

Repository files navigation

GitHub release (latest SemVer) GitHub Workflow Status GitHub Go Report Card pre-commit Github assets Downloads Docker Pulls

aws_quota_exporter

Export AWS quotas on Prometheus

Why?

A subset of the aws service quotas are labelled adjustable. This can be at the account or region level. If some of the quotas are adjusted for some regions, then the quotas per region would no longer be homogeneous. This would cause a rift when creating monitoring or alerting logic in prometheus based on the service quotas.

The aim of the aws_quota_exporter is to export these quotas in prometheus to solve the above problem. At the time of writing, this feature is not currently available in the prometheus yace exporter

Breaking Change! ⚠️

Version 1.0.0 + will introduce a clustering functionality that groups similar metrics. This was requested here. The common words from the metric group are extracted as a metric name. The unique words form the label. Two new labels are added:

  • kind: The label for the unique word.
  • name: The AWS metric name.

Example Transformation

Example transformation of the metric grouping can be seen below:

Before

# HELP aws_quota_ec2_all_dl_spot_instance_requests All DL Spot Instance Requests
aws_quota_ec2_all_dl_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",region="us-west-1",unit="None"} 0
# HELP aws_quota_ec2_all_f_spot_instance_requests All F Spot Instance Requests
aws_quota_ec2_all_f_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",region="us-west-1",unit="None"} 0
# HELP aws_quota_ec2_all_g_and_vt_spot_instance_requests All G and VT Spot Instance Requests
aws_quota_ec2_all_g_and_vt_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",region="us-west-1",unit="None"} 0
# HELP aws_quota_ec2_all_inf_spot_instance_requests All Inf Spot Instance Requests
aws_quota_ec2_all_inf_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",region="us-west-1",unit="None"} 0
# HELP aws_quota_ec2_all_p4__p3_and_p2_spot_instance_requests All P4, P3 and P2 Spot Instance Requests
aws_quota_ec2_all_p4__p3_and_p2_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",region="us-west-1",unit="None"} 0
# HELP aws_quota_ec2_all_p5_spot_instance_requests All P5 Spot Instance Requests
aws_quota_ec2_all_p5_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",region="us-west-1",unit="None"} 0
# HELP aws_quota_ec2_all_standard__a__c__d__h__i__m__r__t__z__spot_instance_requests All Standard (A, C, D, H, I, M, R, T, Z) Spot Instance Requests
aws_quota_ec2_all_standard__a__c__d__h__i__m__r__t__z__spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",region="us-west-1",unit="None"} 5
# HELP aws_quota_ec2_all_trn_spot_instance_requests All Trn Spot Instance Requests
aws_quota_ec2_all_trn_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",region="us-west-1",unit="None"} 0
# HELP aws_quota_ec2_all_x_spot_instance_requests All X Spot Instance Requests
aws_quota_ec2_all_x_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",region="us-west-1",unit="None"} 0

After

# HELP aws_quota_ec2_all_spot_instance_requests Amazon Elastic Compute Cloud (Amazon EC2): All Spot Instance Requests
# TYPE aws_quota_ec2_all_spot_instance_requests gauge
aws_quota_ec2_all_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",kind="DL",name="All DL Spot Instance Requests",region="us-west-1",unit="None"} 0
aws_quota_ec2_all_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",kind="F",name="All F Spot Instance Requests",region="us-west-1",unit="None"} 0
aws_quota_ec2_all_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",kind="G and VT",name="All G and VT Spot Instance Requests",region="us-west-1",unit="None"} 0
aws_quota_ec2_all_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",kind="Inf",name="All Inf Spot Instance Requests",region="us-west-1",unit="None"} 0
aws_quota_ec2_all_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",kind="P4, P3 P2",name="All P4, P3 and P2 Spot Instance Requests",region="us-west-1",unit="None"} 0
aws_quota_ec2_all_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",kind="P5",name="All P5 Spot Instance Requests",region="us-west-1",unit="None"} 0
aws_quota_ec2_all_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",kind="Standard (A, C, D, H, I, M, R, T, Z)",name="All Standard (A, C, D, H, I, M, R, T, Z) Spot Instance Requests",region="us-west-1",unit="None"} 5
aws_quota_ec2_all_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",kind="Trn",name="All Trn Spot Instance Requests",region="us-west-1",unit="None"} 0
aws_quota_ec2_all_spot_instance_requests{account="126485599999",adjustable="true",global_quota="false",kind="X",name="All X Spot Instance Requests",region="us-west-1",unit="None"} 0

Usage

  • Run the following command
go run . --prom.port=10100 --config.file=config.yml
  • Example of config.yml
jobs:
  - serviceCode: lambda
    accountName: dev-account # optional
    regions:
      - us-west-1
      - us-east-1
    role: arn:aws:iam::ACCOUNT-ID:role/rolename # optional
  - serviceCode: cloudformation
    accountName: prod-account # optional
    regions:
      - us-west-1
      - us-east-1
  • Use the optional role key if you want the exporter to assume the role when retrieving that specific job metrics

Help

  • View program help:
$ ./aws_quota_exporter -h
Usage of ./aws_quota_exporter:
  -cache.duration duration
        Cache expiry time. (default 5m0s)
  -cache.serve-stale
        Serve stale cache data during cache refresh. This avoids delays in serving metrics. (default: false)
  -collect.usage
        Collect quotas usage where available (NOTE: CloudWatch calls aren't free, default: false)
  -config.file string
        Path to configuration file. (default "/etc/aqe/config.yml")
  -log.folder string
        Folder to store logfiles. logs to stdout if not specified. (default "stdout")
  -log.format string
        Format of log messages (text or json). (default "text")
  -log.level string
        Log level to log from (DEBUG|INFO|WARN|ERROR). (default "INFO")
  -prom.port int
        Port to expose prometheus metrics. (default 10100)
  -version
        Display aqe version

Version

  • Display version
$ ./aws_quota_exporter -version
{
  App: "AWS Quota Exporter (AQE)",
  Version: "dev",
  Date: "Sun Sep  3 17:54:45 UTC 2023",
  Platform: "darwin/arm64",
  Commit: "none",
  GoVersion: "go1.21.13"
}

Service Codes

The serviceCode is the AWS service identifier. To identify the serviceCode for a particular service, use the following aws cli command:

aws service-quotas list-services

Quotas usage

You can enable quota usage collection with -collect.usage flag (ℹ️ Not all quotas have usage. see docs). The latest usage value from CloudWatch using GetMetricStatistics API method is collected. ⚠️ CloudWatch API calls aren't free! However, there are no charges to use GetMetricStatistics for up to 1 million API requests (docs). The label type="usage|quota is used to differentiate the metrics. This "type": "usage" will export usage metrics while "type": "quota" will export quota metrics. Example promQL query to get quota usage ratio: {job="quota-exporter", type="usage"} / {job="quota-exporter", type="quota"}

NOTE: It requires cloudwatch:GetMetricStatistics permission in IAM policy.

Docker Image Usage

Using the docker image avaliable on dockerhub

docker run --name my-aqe -d -p 10100:10100 -e AWS_ACCESS_KEY=111222 -e AWS_SECRET_KEY=secret ugwuanyi/aqe:main

AWS Authentication

This program relies on the AWS SDK for Go V2 for handling authentication. The AWS SDK uses its default credential chain to find AWS credentials. This default credential chain looks for credentials in the following order:

  1. Environment variables

    1. Static Credentials: (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN)
    2. Web Identity Token: (AWS_WEB_IDENTITY_TOKEN_FILE)
  2. Shared configuration files

    • SDK defaults to credentials file and config file under .aws folder that is placed in the home folder on the host.
  3. IAM role for tasks.

  4. IAM role for Amazon EC2.

By default, the SDK checks the AWS_PROFILE environment variable to determine which profile to use. If no AWS_PROFILE variable is set, the SDK uses the default profile.

To set profile to use:

$ AWS_PROFILE=test_profile

Helm Chart Usage

Steps to use the helm chart

  • Add chart to local repository
helm repo add aws_quota_exporter https://emylincon.github.io/aws_quota_exporter
  • To view configurable values. You can edit any of those the configurable values.
helm show values aws_quota_exporter/aqe
  • In this example, we will set the aws credentials in values.yaml
secret:
  # base64 encoded secrets
  AWS_ACCESS_KEY_ID: QVdTX0FDQ0VTU19LRVlfSUQK
  AWS_SECRET_ACCESS_KEY: QVdTX1NFQ1JFVF9BQ0NFU1NfS0VZCg==
  • We will create a new namespace and install the chart in the namespace
kubectl create namespace aqe
helm install -n aqe -f values.test aqe aws_quota_exporter/aqe
  • View installed chart
helm list -A
  • Uinstall chart
helm uninstall -n aqe aqe

AWS Permission Required

The exporter requires the AWS managed policy ServiceQuotasReadOnlyAccess. This also depends on the jobs specified in the config.yml file, as all of the permissions are probably not required. The permissions included in ServiceQuotasReadOnlyAccess are as follows in policy document:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAccountLimits",
                "cloudformation:DescribeAccountLimits",
                "cloudwatch:DescribeAlarmsForMetric",
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "cloudwatch:GetMetricStatistics",
                "dynamodb:DescribeLimits",
                "elasticloadbalancing:DescribeAccountLimits",
                "iam:GetAccountSummary",
                "kinesis:DescribeLimits",
                "organizations:DescribeAccount",
                "organizations:DescribeOrganization",
                "organizations:ListAWSServiceAccessForOrganization",
                "rds:DescribeAccountAttributes",
                "route53:GetAccountLimit",
                "tag:GetTagKeys",
                "tag:GetTagValues",
                "servicequotas:GetAssociationForServiceQuotaTemplate",
                "servicequotas:GetAWSDefaultServiceQuota",
                "servicequotas:GetRequestedServiceQuotaChange",
                "servicequotas:GetServiceQuota",
                "servicequotas:GetServiceQuotaIncreaseRequestFromTemplate",
                "servicequotas:ListAWSDefaultServiceQuotas",
                "servicequotas:ListRequestedServiceQuotaChangeHistory",
                "servicequotas:ListRequestedServiceQuotaChangeHistoryByQuota",
                "servicequotas:ListServices",
                "servicequotas:ListServiceQuotas",
                "servicequotas:ListServiceQuotaIncreaseRequestsInTemplate",
                "servicequotas:ListTagsForResource"
            ],
            "Resource": "*"
        }
    ]
}

Please Remove permissions that you would not use

Grafana Dashboard

Visualizing Quotas & Usage Dashboard

Useful resources

References