-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Labels
Description
Is this a bug report or feature request?
- Bug Report
Deviation from expected behavior:
The related osd will not be added to the cluster again after ungraceful K8s node restart.
Expected behavior:
- during K8s node outage the ceph cluster status should be degraded
- after the K8s node has been started again the osd should be integrated again in the ceph cluster
- no data loss, minimal replication effort
How to reproduce it (minimal and precise):
- Shutdown the K8s worker node:
virsh destroy --graceful --domain k8s-worker-04- ceph status will report
HEALTH_WARNafter a while - one mon and one osd reported as lost
- reduced total volume capacity
- ceph volume access still possible
- ceph status will report
- Start the K8s worker node:
virsh start --domain k8s-worker-04- ceph status still
HEALTH_WARN - one osd reported as lost
- reduced total volume capacity
- ceph volume access still possible
rook-ceph-osd-2-xyzin Init:CrashLoopBackOff because of
- ceph status still
Controlled By: ReplicaSet/rook-ceph-osd-2-85967dc998
Init Containers:
activate-osd:
Container ID: docker://60636a068071e44c9600d251a0015f873352faf6b79394c99ed5910f52160073
Image: ceph/ceph:v14.2.8
Image ID: docker-pullable://ceph/ceph@sha256:a3d6360ee9685447bb316b1e4ce10229580ba81e37d111c479788446e7233eef
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
set -ex
OSD_ID=2
OSD_UUID=de214744-b37b-44ff-a5f0-5522102babb5
OSD_STORE_FLAG="--bluestore"
TMP_DIR=$(mktemp -d)
OSD_DATA_DIR=/var/lib/ceph/osd/ceph-"$OSD_ID"
# active the osd with ceph-volume
ceph-volume lvm activate --no-systemd "$OSD_STORE_FLAG" "$OSD_ID" "$OSD_UUID"
# copy the tmpfs directory to a temporary directory
# this is needed because when the init container exits, the tmpfs goes away and its content with it
# this will result in the emptydir to be empty when accessed by the main osd container
cp --verbose --no-dereference "$OSD_DATA_DIR"/* "$TMP_DIR"/
# unmount the tmpfs since we don't need it anymore
umount "$OSD_DATA_DIR"
# copy back the content of the tmpfs into the original osd directory
cp --verbose --no-dereference "$TMP_DIR"/* "$OSD_DATA_DIR"
# retain ownership of files to the ceph user/group
chown --verbose --recursive ceph:ceph "$OSD_DATA_DIR"
# remove the temporary directory
rm --recursive --force "$TMP_DIR"
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 15 Mar 2020 18:06:29 +0100
Finished: Sun, 15 Mar 2020 18:06:30 +0100
Ready: False
Restart Count: 5File(s) to submit:
kubectl apply -f https://raw.githubusercontent.com/rook/rook/[v1.2.5|v1.2.6]/cluster/examples/kubernetes/ceph/common.yamlkubectl apply -f ceph-operator.yaml- based on
https://raw.githubusercontent.com/rook/rook/[v1.2.5|v1.2.6]/cluster/examples/kubernetes/ceph/operator.yaml - rke specific kubelet path added
- based on
- name: ROOK_CSI_KUBELET_DIR_PATH
value: "/opt/rke/var/lib/kubelet"kubectl apply -f ./rke/rook.io/ceph-cluster.yaml- based on
https://raw.githubusercontent.com/rook/rook/[v1.2.5|v1.2.6]/cluster/examples/kubernetes/ceph/cluster.yaml - filter for k8s worker Ceph disks added
- based on
storage:
deviceFilter: "^vd[b]"- enable pod disruption budgets
disruptionManagement:
managePodBudgets: truekubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.2.6/cluster/examples/kubernetes/ceph/enable-csi-2.0-rbac.yamlkubectl apply -f ceph-storageclass-erasurecoding.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block-erasurecoding
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
# clusterID is the namespace where the rook cluster is running
# If you change this namespace, also change the namespace below where the secret namespaces are defined
clusterID: rook-ceph
# If you want to use erasure coded pool with RBD, you need to create
# two pools. one erasure coded and one replicated.
# You need to specify the replicated pool here in the `pool` parameter, it is
# used for the metadata of the images.
# The erasure coded pool must be set as the `dataPool` parameter below.
dataPool: ec-data-pool
pool: replicated-metadata-pool
# RBD image format. Defaults to "2".
imageFormat: "2"
# RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
imageFeatures: layering
# The secrets contain Ceph admin credentials. These are generated automatically by the operator
# in the same namespace as the cluster.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# Specify the filesystem type of the volume. If not specified, csi-provisioner
# will set default as `ext4`.
csi.storage.k8s.io/fstype: xfs
# uncomment the following to use rbd-nbd as mounter on supported nodes
# **IMPORTANT**: If you are using rbd-nbd as the mounter, during upgrade you will be hit a ceph-csi
# issue that causes the mount to be disconnected. You will need to follow special upgrade steps
# to restart your application pods. Therefore, this option is not recommended.
#mounter: rbd-nbd
allowVolumeExpansion: true
reclaimPolicy: Deletekubectl apply -f ceph-erasurecodingpool.yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicated-metadata-pool
namespace: rook-ceph
spec:
replicated:
size: 2
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: ec-data-pool
namespace: rook-ceph
spec:
# Make sure you have enough nodes and OSDs running bluestore to support the replica size or erasure code chunks.
# For the below settings, you need at least 3 OSDs on different nodes (because the `failureDomain` is `host` by default).
erasureCoded:
dataChunks: 2
codingChunks: 1kubectl apply -f https://raw.githubusercontent.com/rook/rook/[v1.2.5|v1.2.6]/cluster/examples/kubernetes/ceph/dashboard-loadbalancer.yamlkubectl apply -f https://raw.githubusercontent.com/rook/rook/[v1.2.5|v1.2.6]/cluster/examples/kubernetes/ceph/toolbox.yaml
Environment:
- OS:
RancherOS 1.5.5 - Kernel:
4.14.138-rancher - hardware configuration:
- one physical server (8 cores, 64GB RAM, 2x SSD)
- KVM
- 3x Master Node VMs
- 4x Worker Node VMs
- same SSD for all VMs
- Rook version:
1.2.5and later1.2.6 - Storage backend version:
14.2.7and later14.2.8 - Kubernetes version:
v1.15.9-rancher1-1and laterv1.16.6-rancher1-2 - Kubernetes cluster type:
rke 1.0.4- kubelet settings for rook paths added
kubelet:
extra_args:
volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
root-dir: /opt/rke/var/lib/kubelet
extra_binds:
- "/usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec"
- "/var/lib/kubelet/plugins_registry:/var/lib/kubelet/plugins_registry"
- "/var/lib/kubelet/pods:/var/lib/kubelet/pods:shared,z"
- "/opt/rke/var/lib/kubelet:/opt/rke/var/lib/kubelet:shared,z"- Storage backend status:
HEALTH_WARN
Troubleshooting:
- i have tried to clean up the ceph config on the worker node, but without success
sudo shred -n 1 -z /dev/vdb
sudo lvremove --select lv_name=~'osd-.*'
sudo vgremove --select vg_name=~'ceph-.*'
sudo pvremove /dev/vdb
sudo rm -rfv /dev/ceph-*
sudo rm -rfv /var/lib/rook- I have also tried to remove the osd from the ceph cluster config, but then the operator complains about the missing osd
ceph osd out osd.2
ceph osd crush remove osd.2
ceph auth del osd.2
ceph osd rm osd.2
kubectl -n rook-ceph delete rook-ceph-osd-2-xyz- deleting the
rook-ceph-osd-2deployment also has not fixed the problem
kubectl -n rook-ceph delete deployment rook-ceph-osd-2Sorry for the lengthly description, but there are a lot of moving parts involved.
leonardteo