fix(aula-08): prevenir volume stalling com CSI tolerations e PDB

- Adicionar hcloud-csi-values.yaml com tolerations para node failures - Configurar 2 replicas do CSI controller para HA - Criar statefulset-pdb.yaml para proteger StatefulSets durante drain - Documentar troubleshooting de volumes stuck no README
2026-01-23 18:45:00 -03:00
parent 9f96e97205
commit 2480c82944
4 changed files with 74 additions and 3 deletions
--- a/aula-08/README.md
+++ b/aula-08/README.md
@@ -202,7 +202,47 @@ aula-08/
 ├── install-nginx-ingress.sh   # Instala NGINX Ingress com LB
 ├── install-metrics-server.sh  # Instala Metrics Server (kubectl top, HPA)
 ├── nginx-ingress-values.yaml  # Configuracao do NGINX Ingress
-└── talos-patches/             # Patches de configuracao Talos
-    ├── control-plane.yaml
-    └── worker.yaml
+├── talos-patches/             # Patches de configuracao Talos
+│   ├── control-plane.yaml
+│   └── worker.yaml
+├── hcloud-csi-values.yaml     # Configuracao do CSI Driver
+└── statefulset-pdb.yaml       # PDB para proteger StatefulSets
 ```
+
+## Troubleshooting: Volume Stuck
+
+Se um pod ficar `Pending` aguardando volume:
+
+### 1. Verificar VolumeAttachment
+
+```bash
+kubectl get volumeattachments
+kubectl describe volumeattachment <name>
+```
+
+### 2. Se o node de origem nao existe mais
+
+```bash
+# Deletar o VolumeAttachment orfao (seguro pois node nao existe)
+kubectl delete volumeattachment <name>
+```
+
+### 3. Se o node existe mas pod morreu
+
+```bash
+# Aguardar - Kubernetes vai liberar automaticamente
+# Timeout padrao: 6 minutos
+```
+
+### 4. Verificar no Hetzner
+
+```bash
+hcloud volume list
+# Se volume mostra attached a server que nao existe, abrir ticket
+```
+
+### Limitacoes do Block Storage
+
+- Volumes Hetzner sao **RWO** (ReadWriteOnce) - single-attach por design
+- Podem ficar stuck por ate 6 min (timeout do Kubernetes)
+- Se node morrer abruptamente, recuperacao pode ser manual (deletar VolumeAttachment)