For AI agents: A markdown version of this page is available at https://docs.datadoghq.com/integrations/rundeck.md. A documentation index is available at /llms.txt.

Rundeck

Supported OS Linux Windows Mac OS

Integration version1.1.0

Overview

Rundeck further enhances Datadog notifications with automated workflow capabilities to help diagnose issues-and, optionally, to remediate them.

Learn more about automating your runbooks to reduce incident time on the Rundeck website.

Some example use cases are:

  • If a Windows/Linux service is down, attempt to restart it
  • If NTP sync is off, restart the NTP service on that machine
  • Clean up logs and other file waste when disk space becomes full
  • Restart services in response to hung work queues
  • Provision capacity in response to high utilization

Use the instructions below to configure your Datadog/Rundeck integration.

Setup

Installation

Prepare at least one Rundeck job that you would like to trigger using a Datadog alert.

Configuration

Rundeck

  1. In your Rundeck Project, click the Webhooks navigation option.
  2. Click Add.
  3. Give the webhook a name, for example: Datadog-Restart Service.
  4. Click the Choose Webhook Plugin button and select Run Job*.
  5. Select the job you’d like to run when this webhook is triggered.
  6. [optional] In the Options line, enter the following text: -raw ${raw} -event_type ${data.event_type} (This makes the full Datadog payload available as part of the job input options.)
  7. Click Create Webhook. The URL field is automatically populated after the webhook is created.

rundeck-setup

Note: If you are using a firewall, add the Datadog IP ranges to your allowlist.

Datadog setup

  1. Open Datadog and go to Integrations > Integrations.

  2. Search for “webhooks”.

    search-dd-2024

  3. Click on the webhooks entry shown above. It opens the configuration window.

    webhooks-config

  4. Click the New button and fill out the form:

  • Give the webhook a name. (a)

  • Paste the URL from your Rundeck webhook in the URL line. This corresponds to Step 7 in the section above. (b)

  • Click Save. (c)

    webhook-fill

Add this integration to any alert notification in Datadog by adding the recipient of @webhook-Rundeck_Restart_Service. The name varies based on what you name the webhook in step 4a. When the monitor triggers an alert, the webhook runs the associated job.

Other plugins, such as Advanced Run Job, can also be used, depending on your use case.

Data Collected

Metrics

rundeck.project.executions.duration.completed
(gauge)
The duration of completed executions for a project.
Shown as millisecond
rundeck.project.executions.duration.running
(gauge)
The duration of currently running executions for a project.
Shown as millisecond
rundeck.project.executions.status
(gauge)
The status of Rundeck job executions.
Shown as job
rundeck.system.stats.cpu.load_average.average
(gauge)
JVM load average percentage for the system for the previous minute.
Shown as percent
rundeck.system.stats.cpu.processors
(gauge)
Number of available system processors. The load_average might be calculated based on the total number of available processors.
rundeck.system.stats.memory.free
(gauge)
Free memory of the allocated memory.
Shown as byte
rundeck.system.stats.memory.max
(gauge)
Maximum JVM memory that can be allocated.
Shown as byte
rundeck.system.stats.memory.total
(gauge)
Total allocated memory for the JVM.
Shown as byte
rundeck.system.stats.scheduler.running
(gauge)
Number of running jobs in the scheduler.
Shown as job
rundeck.system.stats.scheduler.thread_pool_size
(gauge)
Size of the scheduler threadPool (maximum number of concurrent Rundeck executions).
Shown as thread
rundeck.system.stats.threads.active
(gauge)
Number of active threads in the JVM.
Shown as thread
rundeck.metrics.metrics.com.dtolabs.rundeck.server.auth_context_evaluator_cache_manager.auth_context_evaluator_cache.eviction_count
(gauge)
The number of entries evicted from the auth context cache.
Shown as entry
rundeck.metrics.metrics.com.dtolabs.rundeck.server.auth_context_evaluator_cache_manager.auth_context_evaluator_cache.hit_count
(gauge)
The number of times auth context cache lookup data was found.
Shown as hit
rundeck.metrics.metrics.com.dtolabs.rundeck.server.auth_context_evaluator_cache_manager.auth_context_evaluator_cache.hit_rate
(gauge)
The ratio of cache hits to total lookups for auth context.
Shown as percent
rundeck.metrics.metrics.com.dtolabs.rundeck.server.auth_context_evaluator_cache_manager.auth_context_evaluator_cache.load_exception_count
(gauge)
The number of times the auth context cache encountered an error loading data.
Shown as exception
rundeck.metrics.metrics.com.dtolabs.rundeck.server.auth_context_evaluator_cache_manager.auth_context_evaluator_cache.miss_count
(gauge)
The number of times auth context cache lookup data was not found.
Shown as miss
rundeck.metrics.metrics.data_source.connection.ping_time
(gauge)
The time taken to ping the backend database connection.
Shown as millisecond
rundeck.metrics.metrics.scheduler.quartz.running_executions
(gauge)
The number of jobs currently being executed by the Quartz scheduler.
Shown as execution
rundeck.metrics.metrics.scheduler.quartz.thread_pool_size
(gauge)
The total number of threads available in the Quartz thread pool.
Shown as thread
rundeck.metrics.metrics.services.authorization_service.source_cache.eviction_count
(gauge)
Number of evictions from the authorization source cache.
Shown as entry
rundeck.metrics.metrics.services.authorization_service.source_cache.hit_count
(gauge)
Number of hits in the authorization source cache.
Shown as hit
rundeck.metrics.metrics.services.authorization_service.source_cache.hit_rate
(gauge)
Hit rate for the authorization source cache.
Shown as percent
rundeck.metrics.metrics.services.authorization_service.source_cache.load_exception_count
(gauge)
Number of load exceptions in the authorization source cache.
Shown as exception
rundeck.metrics.metrics.services.authorization_service.source_cache.miss_count
(gauge)
Number of misses in the authorization source cache.
Shown as miss
rundeck.metrics.metrics.services.node_service.node_cache.eviction_count
(gauge)
Number of evictions from the node information cache.
Shown as entry
rundeck.metrics.metrics.services.node_service.node_cache.hit_count
(gauge)
Number of hits in the node information cache.
Shown as hit
rundeck.metrics.metrics.services.node_service.node_cache.hit_rate
(gauge)
Hit rate for the node information cache.
Shown as percent
rundeck.metrics.metrics.services.node_service.node_cache.load_exception_count
(gauge)
Number of load exceptions in the node information cache.
Shown as exception
rundeck.metrics.metrics.services.node_service.node_cache.miss_count
(gauge)
Number of misses in the node information cache.
Shown as miss
rundeck.metrics.metrics.services.project_manager_service.file_cache.eviction_count
(gauge)
Number of evictions from the project file cache.
Shown as entry
rundeck.metrics.metrics.services.project_manager_service.file_cache.hit_count
(gauge)
Number of hits in the project file cache.
Shown as hit
rundeck.metrics.metrics.services.project_manager_service.file_cache.hit_rate
(gauge)
Hit rate for the project file cache.
Shown as percent
rundeck.metrics.metrics.services.project_manager_service.file_cache.load_exception_count
(gauge)
Number of load exceptions in the project file cache.
Shown as exception
rundeck.metrics.metrics.services.project_manager_service.file_cache.miss_count
(gauge)
Number of misses in the project file cache.
Shown as miss
rundeck.metrics.metrics.services.project_manager_service.project_cache.eviction_count
(gauge)
Number of evictions from the project metadata cache.
Shown as entry
rundeck.metrics.metrics.services.project_manager_service.project_cache.hit_count
(gauge)
Number of hits in the project metadata cache.
Shown as hit
rundeck.metrics.metrics.services.project_manager_service.project_cache.hit_rate
(gauge)
Hit rate for the project metadata cache.
Shown as percent
rundeck.metrics.metrics.services.project_manager_service.project_cache.load_exception_count
(gauge)
Number of load exceptions in the project metadata cache.
Shown as exception
rundeck.metrics.metrics.services.project_manager_service.project_cache.miss_count
(gauge)
Number of misses in the project metadata cache.
Shown as miss
rundeck.metrics.metrics.services.project_manager_service.source_cache.eviction_count
(gauge)
Number of evictions from the project source cache.
Shown as entry
rundeck.metrics.metrics.services.project_manager_service.source_cache.hit_count
(gauge)
Number of hits in the project source cache.
Shown as hit
rundeck.metrics.metrics.services.project_manager_service.source_cache.hit_rate
(gauge)
Hit rate for the project source cache.
Shown as percent
rundeck.metrics.metrics.services.project_manager_service.source_cache.load_exception_count
(gauge)
Number of load exceptions in the project source cache.
Shown as exception
rundeck.metrics.metrics.services.project_manager_service.source_cache.miss_count
(gauge)
Number of misses in the project source cache.
Shown as miss
rundeck.metrics.metrics.api_authorization.failure
(count)
The number of failed API authorization attempts.
Shown as occurrence
rundeck.metrics.metrics.api_authorization.success
(count)
The number of successful API authorization attempts.
Shown as occurrence
rundeck.metrics.metrics.scheduler.quartz.scheduled_jobs
(count)
The total number of jobs scheduled in Quartz.
Shown as job
rundeck.metrics.metrics.user_login.failure
(count)
The number of failed user login attempts.
rundeck.metrics.metrics.user_login.success
(count)
The number of successful user login attempts.
rundeck.metrics.metrics.user_logout.success
(count)
The number of successful user logout events.
rundeck.metrics.metrics.com.dtolabs.rundeck.core.execution.workflow.workflow_execution_listener_step_metrics.finish_workflow_step_failed_meter
(count)
Rate of workflow steps that finished with a failure.
Shown as event
rundeck.metrics.metrics.com.dtolabs.rundeck.core.execution.workflow.workflow_execution_listener_step_metrics.finish_workflow_step_succeeded_meter
(count)
Rate of workflow steps that finished successfully.
Shown as event
rundeck.metrics.metrics.com.dtolabs.rundeck.core.execution.workflow.workflow_execution_listener_step_metrics.start_workflow_step_meter
(count)
Rate of workflow steps being started.
Shown as event
rundeck.metrics.metrics.controllers.framework_controller.create_project_post
(count)
Number of project creation POST requests.
Shown as event
rundeck.metrics.metrics.services.authorization_service.system_authorization.evaluate_meter
(count)
Rate of system-level authorization evaluations.
Shown as event
rundeck.metrics.metrics.services.authorization_service.system_authorization.evaluate_set_meter
(count)
Rate of system-level set authorization evaluations.
Shown as event
rundeck.metrics.metrics.services.execution_service.execution_failure_meter
(count)
Rate of general execution failures.
Shown as event
rundeck.metrics.metrics.services.execution_service.execution_job_run_failed_meter
(count)
Rate of job-specific execution failures.
Shown as event
rundeck.metrics.metrics.services.execution_service.execution_job_run_succeeded_meter
(count)
Rate of job-specific execution successes.
Shown as event
rundeck.metrics.metrics.services.execution_service.execution_job_start_meter
(count)
Rate of job executions starting.
Shown as event
rundeck.metrics.metrics.services.execution_service.execution_job_start_succeeded_meter
(count)
Rate of jobs that successfully started.
Shown as event
rundeck.metrics.metrics.services.execution_service.execution_start_meter
(count)
Rate of overall executions starting.
Shown as event
rundeck.metrics.metrics.services.execution_service.execution_success_meter
(count)
Rate of overall execution successes.
Shown as event
rundeck.metrics.metrics.counter.status.200.unmapped
(count)
Count of HTTP 200 responses not mapped to a specific endpoint metric.
rundeck.metrics.metrics.counter.status.201.unmapped
(count)
Count of HTTP 201 responses not mapped to a specific endpoint metric.
rundeck.metrics.metrics.gauge.response.unmapped
(gauge)
Response time for unmapped HTTP responses.

Service Checks

The Rundeck integration does not include any service checks.

Events

The Rundeck integration does not include any events.

Troubleshooting

Need help? Contact Datadog support.