Skip to content

Latest commit

 

History

History
350 lines (301 loc) · 14.6 KB

File metadata and controls

350 lines (301 loc) · 14.6 KB

Deployment Guide for v4.12

This page has instructions for collecting Kubernetes logs, metrics, and events; enriching them with deployment, pod, and service level metadata; and sending them to Sumo Logic. See our documentation guide for details on our Kubernetes Solution.

Documentation for other versions can be found in the main README file.


Documentation links:

Solution overview

The diagrams below illustrate the components of the Kubernetes collection solution.

Log Collection

logs

Metrics Collection

metrics

Kubernetes Events Collection

events

Minimum Requirements

Name Version
K8s 1.21+
Helm 3.5+

Support Matrix

The following table displays the tested Kubernetes and Helm versions.

Name Version
K8s with EKS 1.26
1.27
1.28
1.29
1.30
1.31
1.32
K8s with EKS (fargate) 1.26
1.27
1.28
1.29
1.30
1.31
K8s with Kops 1.26
1.27
1.28
1.29
1.30
K8s with GKE 1.26
1.27
1.28
1.29
1.30
1.31
1.32
K8s with AKS 1.26
1.27
1.28
1.29
1.30
1.31
OpenShift 4.12
4.13
4.14
4.15
4.16
Helm 3.14.3 (Linux)
kubectl 1.29.3

The following table displays the currently used software versions for our Helm chart.

Name Version
OpenTelemetry Collector 0.118.0
OpenTelemetry Operator 0.76.0
kube-prometheus-stack/Prometheus Operator 40.5.0
Falco 3.8.7
Metrics Server 6.11.2
Telegraf Operator 1.4.0
Tailing Sidecar Operator 0.16.0

ARM support

The collection Helm Chart supports AWS Graviton CPUs, and has been tested in ARM-based EKS clusters. In principle, it should run fine on any ARM64 node, but there is currently no official support for non-AWS ARM environments. If you do however run into problems in such an environment, don't hesitate to open an issue describing them.

Falco support

Falco is embedded in this Helm Chart for user convenience only - Sumo Logic does not provide production support for it.

Windows nodes support

Support for Windows is experimental.

Merics collection

Windows nodes are supported for metrics collection. To enable it, add the following configuration to your user-values.yaml

prometheus-windows-exporter:
  enabled: true

It will send windows_ prefixed metrics to Sumo Logic.

List of metrics:
go_gc_duration_seconds summary
go_goroutines gauge
go_info gauge
go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes_total counter
go_memstats_buck_hash_sys_bytes gauge
go_memstats_frees_total counter
go_memstats_gc_sys_bytes gauge
go_memstats_heap_alloc_bytes gauge
go_memstats_heap_idle_bytes gauge
go_memstats_heap_inuse_bytes gauge
go_memstats_heap_objects gauge
go_memstats_heap_released_bytes gauge
go_memstats_heap_sys_bytes gauge
go_memstats_last_gc_time_seconds gauge
go_memstats_lookups_total counter
go_memstats_mallocs_total counter
go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_sys_bytes gauge
go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_sys_bytes gauge
go_memstats_next_gc_bytes gauge
go_memstats_other_sys_bytes gauge
go_memstats_stack_inuse_bytes gauge
go_memstats_stack_sys_bytes gauge
go_memstats_sys_bytes gauge
go_threads gauge
process_cpu_seconds_total counter
process_max_fds gauge
process_open_fds gauge
process_resident_memory_bytes gauge
process_start_time_seconds gauge
process_virtual_memory_bytes gauge
windows_container_available counter
windows_container_count gauge
windows_container_cpu_usage_seconds_kernelmode counter
windows_container_cpu_usage_seconds_total counter
windows_container_cpu_usage_seconds_usermode counter
windows_container_memory_usage_commit_bytes gauge
windows_container_memory_usage_commit_peak_bytes gauge
windows_container_memory_usage_private_working_set_bytes gauge
windows_container_network_receive_bytes_total counter
windows_container_network_receive_packets_dropped_total counter
windows_container_network_receive_packets_total counter
windows_container_network_transmit_bytes_total counter
windows_container_network_transmit_packets_dropped_total counter
windows_container_network_transmit_packets_total counter
windows_container_storage_read_count_normalized_total counter
windows_container_storage_read_size_bytes_total counter
windows_container_storage_write_count_normalized_total counter
windows_container_storage_write_size_bytes_total counter
windows_cpu_clock_interrupts_total counter
windows_cpu_core_frequency_mhz gauge
windows_cpu_cstate_seconds_total counter
windows_cpu_dpcs_total counter
windows_cpu_idle_break_events_total counter
windows_cpu_interrupts_total counter
windows_cpu_parking_status gauge
windows_cpu_processor_mperf_total counter
windows_cpu_processor_performance_total counter
windows_cpu_processor_privileged_utility_total counter
windows_cpu_processor_rtc_total counter
windows_cpu_processor_utility_total counter
windows_cpu_time_total counter
windows_cs_hostname gauge
windows_cs_logical_processors gauge
windows_cs_physical_memory_bytes gauge
windows_exporter_build_info gauge
windows_exporter_collector_duration_seconds gauge
windows_exporter_collector_success gauge
windows_exporter_collector_timeout gauge
windows_exporter_perflib_snapshot_duration_seconds gauge
windows_logical_disk_avg_read_requests_queued gauge
windows_logical_disk_avg_write_requests_queued gauge
windows_logical_disk_free_bytes gauge
windows_logical_disk_idle_seconds_total counter
windows_logical_disk_read_bytes_total counter
windows_logical_disk_read_latency_seconds_total counter
windows_logical_disk_read_seconds_total counter
windows_logical_disk_read_write_latency_seconds_total counter
windows_logical_disk_reads_total counter
windows_logical_disk_requests_queued gauge
windows_logical_disk_size_bytes gauge
windows_logical_disk_split_ios_total counter
windows_logical_disk_write_bytes_total counter
windows_logical_disk_write_latency_seconds_total counter
windows_logical_disk_write_seconds_total counter
windows_logical_disk_writes_total counter
windows_memory_available_bytes gauge
windows_memory_cache_bytes gauge
windows_memory_cache_bytes_peak gauge
windows_memory_cache_faults_total counter
windows_memory_commit_limit gauge
windows_memory_committed_bytes gauge
windows_memory_demand_zero_faults_total counter
windows_memory_free_and_zero_page_list_bytes gauge
windows_memory_free_system_page_table_entries gauge
windows_memory_modified_page_list_bytes gauge
windows_memory_page_faults_total counter
windows_memory_pool_nonpaged_allocs_total gauge
windows_memory_pool_nonpaged_bytes gauge
windows_memory_pool_paged_allocs_total counter
windows_memory_pool_paged_bytes gauge
windows_memory_pool_paged_resident_bytes gauge
windows_memory_standby_cache_core_bytes gauge
windows_memory_standby_cache_normal_priority_bytes gauge
windows_memory_standby_cache_reserve_bytes gauge
windows_memory_swap_page_operations_total counter
windows_memory_swap_page_reads_total counter
windows_memory_swap_page_writes_total counter
windows_memory_swap_pages_read_total counter
windows_memory_swap_pages_written_total counter
windows_memory_system_cache_resident_bytes gauge
windows_memory_system_code_resident_bytes gauge
windows_memory_system_code_total_bytes gauge
windows_memory_system_driver_resident_bytes gauge
windows_memory_system_driver_total_bytes gauge
windows_memory_transition_faults_total counter
windows_memory_transition_pages_repurposed_total counter
windows_memory_write_copies_total counter
windows_net_bytes_received_total counter
windows_net_bytes_sent_total counter
windows_net_bytes_total counter
windows_net_current_bandwidth_bytes gauge
windows_net_output_queue_length_packets gauge
windows_net_packets_outbound_discarded_total counter
windows_net_packets_outbound_errors_total counter
windows_net_packets_received_discarded_total counter
windows_net_packets_received_errors_total counter
windows_net_packets_received_total counter
windows_net_packets_received_unknown_total counter
windows_net_packets_sent_total counter
windows_net_packets_total counter
windows_os_info gauge
windows_os_paging_free_bytes gauge
windows_os_paging_limit_bytes gauge
windows_os_physical_memory_free_bytes gauge
windows_os_process_memory_limit_bytes gauge
windows_os_processes gauge
windows_os_processes_limit gauge
windows_os_time gauge
windows_os_timezone gauge
windows_os_users gauge
windows_os_virtual_memory_bytes gauge
windows_os_virtual_memory_free_bytes gauge
windows_os_visible_memory_bytes gauge
windows_physical_disk_idle_seconds_total counter
windows_physical_disk_read_bytes_total counter
windows_physical_disk_read_latency_seconds_total counter
windows_physical_disk_read_seconds_total counter
windows_physical_disk_read_write_latency_seconds_total counter
windows_physical_disk_reads_total counter
windows_physical_disk_requests_queued gauge
windows_physical_disk_split_ios_total counter
windows_physical_disk_write_bytes_total counter
windows_physical_disk_write_latency_seconds_total counter
windows_physical_disk_write_seconds_total counter
windows_physical_disk_writes_total counter
windows_service_info gauge
windows_service_start_mode gauge
windows_service_state gauge
windows_service_status gauge
windows_system_context_switches_total counter
windows_system_exception_dispatches_total counter
windows_system_processor_queue_length gauge
windows_system_system_calls_total counter
windows_system_system_up_time gauge
windows_system_threads gauge
windows_textfile_scrape_error gauge

[!NOTE] We currently do not have dashboards using these metrics.

Logs collection

There is support for logs collection, but only container ones. In order to enable logs collection, please add the following configuration to your user-values.yaml:

sumologic:
  logs:
    collector:
      otellogswindows:
        enabled: true
otellogswindows:
  daemonset:
    nameservers:
      - ${NAMESERVER_IP}

where ${NAMESERVER_IP} is a cluster DNS server IP. For the following example:

kubectl get service kube-dns -n kube-system NAME       TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE
kube-dns   ClusterIP   10.100.0.10   <none>        53/UDP,53/TCP   13d

it will be 10.100.0.10.

[!NOTE] Nameserver will be forcefully used as primary DNS server for the whole Node. This is due to the Kubernetes limitation