2022-01-05_07 Incident
Issue was the same as in #6, #9, #12 and #15: first pve-main's drive 2I:1:5
falled to Rebuilding
state, then 2I:1:9
failed too, causing the whole system to freeze and reboot. This time the reboot were really frequent. I don't have the whole details but here is an approximation of the timing:
- 2022-01-05 07:57 CET: pve-main reboots
- 2022-01-05 10:23 CET: pve-main reboots I then decide to move the few VMs still backed on this RAID array to lower as much as possible the I/O load on it. This caused a few more freezes/reboots.
- 2022-01-06, morning: I replace
2I:1:5
with a new SATA drive. During ~4 h the rebuilding state moved between "Failed" and 67 70%; it finally went over 74% so I decided to keep the drives in and wait to see how it would evolve. - 2022-01-07 00:41 CET: pve-main reboots
I discover it on friday morning because I (sometimes) sleep at night; I then swapped the 6 SSDs with 2 brand new SSDs and reinstalled Proxmox on one of them, then restored configuration files from backup hosted on
storage-2
.