This repository contains a fully automated Elastic Stack (Elasticsearch, Kibana, Logstash) used to ingest normalized rsync data and visualize transfer statistics via a preconfigured Kibana dashboard. Built with docker-compose.
To find out how to produce the rsync data for visualization, reference the scripts separate documentation: scripts/README.md
The stack is designed for:
- Reproducible local deployment
- Easy access to elastic-stack tooling without worrying about platform
- Clear separation of ingestion, storage, and visualization
- Easy maintenance and setup
This stack is intended for:
- Visualizing rsync output generated via rsync_runner and rsync_normalize shellscripts
- Analysis and experimentation with the elastic-stack
- The focus is on rsync in this project, but there is a fully usable elastic-stack in containers that can also be used for log analysis of e.g. security related logs and can therefore be extended without any issue.
- Local development
It is not hardened nor intended for direct public internet exposure in it's current form. In addition, it is a one-node cluster and not built with high-availability in mind.
- Quickstart Guide
- Repository Layout
- Architecture Overview
- Requirements
- Initial Setup
- Data Ingestion
- Kibana Content
- Configuration Files
- Security
- Maintenance
- Troubleshooting
- Elastic-Stack specific Design Decisions
- docker-compose and rsync installed
For the elastic-stack to work, it expects normalized output of rsync runs in the data directory. Using the provided scripts that data can be generated and will then automatically be indexed and visualized by the elastic-stack. For the stack itself to come up correctly, some environment variables need to be available in your shell environment when running the compose up command. After everything is started, login to kibana, and take a look at the visualization of the content inside the data directory.
This guide assumes you need to also get the normalized rsync output first. For this we provide scripts documented in their own README. For now it is fine to simply follow the steps here in the quickstart guide and use our defaults.
If you already have normalized data, skip the "Getting the data" steps and move on over to "Starting the elastic-stack and using kibana". Don't forget to copy your files to the data directory so logstash can ingest them.
- Set the needed configuration in rsync.conf file found in the scripts directory:
RSYNC_SOURCE="./data/source" RSYNC_DEST="./data/destination" RSYNC_MODE="normalize" - Run rsync via our script and wait for it to complete:
./scripts/rsync_runner.sh - A file should now have been created in the data directory, matching the "*.jsonl" naming.
We recommend running this script via cron or systemd-timers, replacing the current method of running rsync directly to get data that can actually be visualized by our stack. For the purpose of this quickstart guide, this step can be skipped.
Should you experience any issues during this stage, consult the README inside the scripts directory to learn more about our scripts.
- Follow the three steps outlined here: Initial Setup
- After logging into kibana as user elastic and your defined password, head over to the dashboards tab and open the rsync dashboard.
- Adjust the timeframe in the top right corner of the page. If you don't see data you expect to see, it is most likely the timeframe that is wrong. We recommend setting it to "last week".
- The visualizations inside the dashboard should now visualize your rsync run generated by the scripts beforehand. Once a new file matching the "*.jsonl" naming appears in the data directory, it will instantly be indexed and available in the visualizations aswell.
- Feel free to head to the "Discover" section of kibana, and take a look at the fields available to you. This is a fully working elastic-stack after all, so you can leverage the kibana query language or build your own visualizations as you see fit.
With your stack now working, keep in mind that all configurations made inside the webinterface of kibana are only preserved inside the docker-volumes, once these are wiped, so are the manual changes made. Therefore use the declarative approach and persist any kibana changes as saved objects inside the ndjson file that gets imported. Consult this README for more information.
.
├── docker-compose.yml
├── README.md
├── diagrams/ # README content
├── data/ # logstash looks for normalized data here
├── rsync_runner-logs/ # Non-normalized rsync run output lives here
├── scripts/ # contains all scripts to get data
├── es-setup/ # elastic-stack specific configs
│ ├── logs-rsync-index-template.json
│ ├── generate_certs.sh
│ └── kibana/
│ └── rsync-kibana-objects.ndjson
└── logstash/ # logstash config
└── pipeline/
└── logstash.conf
-
Elasticsearch
- Stores all rsync event data
- Security enabled
- Uses HTTP internally (no TLS) to reduce bootstrap complexity
-
Kibana
- Visualization and dashboard UI
- Exposed on
https://localhost:5601 - Uses HTTPS with a self-signed certificate
- Connects to Elasticsearch using the
kibana_systemuser
-
Logstash
- Reads rsync JSON files from disk
- Transforms and enriches fields
- Writes documents into daily Elasticsearch indices
-
elastic-certs
- Generates a self-signed CA and certificates
- Used only for Kibana HTTPS
- Certificates are stored in a Docker volume
-
es-init
- Applies the Elasticsearch index template
- Performs cluster initialization tasks
-
kibana-init
- Imports Kibana saved objects (data view, visualizations, dashboard)
- Uses overwrite mode to ensure a known-good state
- Docker Engine
- Docker Compose v2
- Open ports on the host:
9200(Elasticsearch)5601(Kibana HTTPS)
We tested and guarantee a working product with the following container images:
docker.elastic.co/elasticsearch/elasticsearch:8.15.3
docker.elastic.co/kibana/kibana:8.15.3
docker.elastic.co/logstash/logstash:8.15.3
curlimages/curl:8.10.1
Generally other minor versions of the elastic-images work aswell, however we cannot guarantee that the visualizations work on different major versions. All elastic-images need to have the same version to function together.
Create and source a .env file, ensure the variables are set as desired. The project does not look for this file anywhere, it simply assumes these variables are set. We recommend setting them in a env file for easy sourcing.
export ELASTIC_PASSWORD=change-me-elastic
export KIBANA_SYSTEM_PASSWORD=change-me-kibana-system
export KIBANA_ENCRYPTION_KEY=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
Notes:
KIBANA_ENCRYPTION_KEYmust be at least 32 characters- Keep the encryption key stable across restarts or Kibana will fail to start, unless you destroy the docker-volumes for a fresh initial setup.
- The ELASTIC_PASSWORD will be used to login to kibana.
In the project root where the docker-compose.yml file is located, run:
docker compose up -d
and wait for all services to spin up correctly.
Keep in mind that when the env is sourced by a non-root user, but docker is run via sudo as root, then root's env will not have the actual variables needed. Ensure the command is run by a user with all env variables correctly set in their environment.
When you run into above issue, before retrying, run:
docker compose down --volumes
this removes all containers again and destroys the volumes aswell, providing a clean slate.
- URL:
https://localhost:5601 - Browser warning about self-signed certificate is expected
- Login credentials:
- Username:
elastic - Password: value of
ELASTIC_PASSWORD
- Username:
Logstash reads rsync JSON files from:
./data/*.jsonl
Behavior:
- Files are read once (
mode => read) - Processing state is persisted via
sincedb - Restarting Logstash does not reprocess existing files, because the sincedb is stored on a docker-volume
- The data directory is bind-mounted inside the logstash container, meaning you will not lose the normalized rsync data when the docker-volumes are wiped.
- However this means you are responsible for ensuring the data directory of this repository is populated with the normalized rsync runs and has a backup incase you want to index those files again.
Documents are written to daily indices:
rsync-YYYY.MM.dd
All Kibana content is stored as code in:
es-setup/kibana/rsync-kibana-objects.ndjson
This includes:
- Data view (
rsync-*) - Visualizations
- Dashboard layout
The objects are imported automatically with overwrite enabled by the kibana-init container. Making it easy to re-apply the config.
- Modify dashboards or visualizations in the Kibana UI
- Export saved objects (Stack Management → Saved Objects → Export)
- Replace the NDJSON file
- Re-run the kibana-init container:
docker compose up -d kibana-init
Naturally the NDJSON file can also be edited directly if so desired, to apply the changes use above command.
| File | Description |
|---|---|
docker-compose.yml |
Full stack definition |
.env |
Local secrets and credentials (Optional as long as the ENV vars are set in your env when running the docker compose up command) |
logstash/pipeline/logstash.conf |
Logstash pipeline configuration |
es-setup/logs-rsync-index-template.json |
Elasticsearch index template |
es-setup/kibana/rsync-kibana-objects.ndjson |
Kibana saved objects e.g. visualizations |
es-setup/generate_certs.sh |
Self-signed certificate generation helper script |
Since this project mainly concerns itself with visualizing rsync data, and the stack is running locally anyway. We improved maintainability and easy automation at the cost of having https everywhere and proper secrets management for our env vars. This was deemed better than not having any auth or https whatsoever. So we provide the following:
- Security enabled
- HTTP only (no TLS internally)
- Authentication required
- HTTPS enabled using self-signed certificates
- Uses
kibana_systeminternally to handle elasticsearch operations - Human users authenticate via basic-auth to Kibanas webinterface via user "elastic"
- Encrypts browser traffic
- Browser warnings are expected
- Suitable for local or internal deployments
Basically since we don't have a public IP and no DNS, none of the "Let's Encrypt" challenges work for us. Therefore we opted to simply generate a self-signed cert for now.
# for more possibilities take a look at the docker compose logs command help
docker compose logs elasticsearch
docker compose logs kibana
docker compose logs logstash
docker compose restart
Reinstall Elasticsearch index template:
docker compose up -d es-init
Reimport Kibana saved objects:
docker compose up -d kibana-init
Removes all Elasticsearch data and Kibana state:
# shut down the containers and remove their volumes, then create all containers and their volumes again
docker compose down --volumes
docker compose up -d
You should not run into any issues if you have the required environment variables set when running any docker commands interacting with the stack. So always ensure first that they are set properly by running env in your shell.
- Verify
KIBANA_SYSTEM_PASSWORDin.env - Check initialization logs:
docker compose logs es-init
- Ensure
kibana-initauthenticates usingelasticin your request
- Ensure Logstash output (found in logstash's config file) uses:
user => "elastic"password => ${ELASTIC_PASSWORD}- If the logs show permission denied issues, ensure that the logstash container has sufficient permissions to read the mounted data directory
- Elasticsearch runs on HTTP to reduce bootstrap complexity
- Kibana uses HTTPS to protect browser traffic
- Built-in users are used instead of service account tokens to reduce bootstrap complexity
- We consider the users shell environment safe for secrets for use with this tooling
- Obviously in a production grade deployment of the elastic-stack secrets management would be handled differently, however for the scope and complexity of this project this is a good tradeoff vs not having any authentication whatsoever
- Kibana content is treated as declarative state and can be reapplied by restarting the kibana-init container
- No manual post-install steps are required to see rsync visualizations
- Logstash takes specifically formatted rsync output as input generated by our provided scripts
- This is to guarantee a automated and working solution without having to deal with any edge-cases