aula-07 e aula-08: Cluster Talos HA na Hetzner com Autoscaler
aula-07: Criação de imagem Talos customizada na Hetzner Cloud - Usa Talos Factory para gerar imagem ARM64/AMD64 - Inclui extensões: qemu-guest-agent, hcloud aula-08: Provisionamento de cluster Kubernetes Talos via OpenTofu - 3 Control Planes em HA (CAX11 ARM64) - 1 Worker Node (CAX11 ARM64) - Rede privada, Floating IP, Firewall - Cluster Autoscaler para Hetzner (0-5 workers extras) - Setup interativo com validação de pré-requisitos - Custo estimado: ~€18/mês (base) Também inclui: - .gitignore para ignorar arquivos sensíveis - CLAUDE.md com instruções do projeto
This commit is contained in:
30
.gitignore
vendored
Normal file
30
.gitignore
vendored
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
# OpenTofu / Terraform
|
||||||
|
**/.terraform/
|
||||||
|
**/.tofu/
|
||||||
|
**/*.tfstate
|
||||||
|
**/*.tfstate.*
|
||||||
|
**/tfplan
|
||||||
|
**/tfplan.out
|
||||||
|
**/.terraform.lock.hcl
|
||||||
|
|
||||||
|
# Credenciais e configs sensíveis
|
||||||
|
**/terraform.tfvars
|
||||||
|
**/kubeconfig
|
||||||
|
**/kubeconfig-*
|
||||||
|
**/talosconfig
|
||||||
|
**/*.pem
|
||||||
|
**/*.key
|
||||||
|
|
||||||
|
# OS
|
||||||
|
.DS_Store
|
||||||
|
Thumbs.db
|
||||||
|
|
||||||
|
# Editor
|
||||||
|
*.swp
|
||||||
|
*.swo
|
||||||
|
*~
|
||||||
|
.idea/
|
||||||
|
.vscode/
|
||||||
|
|
||||||
|
# Node (aula-01)
|
||||||
|
node_modules/
|
||||||
114
CLAUDE.md
Normal file
114
CLAUDE.md
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
|
This is a workshop repository for teaching Docker and Kubernetes concepts, specifically focusing on container health checks and liveness probes. It contains a deliberately "buggy" Node.js app that hangs after a configurable number of requests to demonstrate how container orchestration handles unhealthy containers.
|
||||||
|
|
||||||
|
## Repository Structure
|
||||||
|
|
||||||
|
- **aula-01/**: Docker Compose lesson - basic container deployment with restart policies
|
||||||
|
- **aula-02/**: Kubernetes lesson - deployment with liveness probes and ConfigMaps
|
||||||
|
- **aula-03/**: Kubernetes lesson - high availability with replicas and readiness probes
|
||||||
|
- **aula-04/**: Kubernetes lesson - NGINX Ingress with Keep Request (Lua) for zero-downtime
|
||||||
|
- **aula-05/**: Kubernetes lesson - KEDA + Victoria Metrics for metrics-based auto-scaling
|
||||||
|
- **aula-06/**: Kubernetes lesson - n8n deployment via Helm with Queue Mode (workers, webhooks, PostgreSQL, Redis)
|
||||||
|
- **aula-07/**: Talos Linux - creating custom Talos image for Hetzner Cloud
|
||||||
|
- **aula-08/**: OpenTofu - provisioning HA Talos Kubernetes cluster on Hetzner Cloud
|
||||||
|
|
||||||
|
## Running the Examples
|
||||||
|
|
||||||
|
### Aula 01 (Docker Compose)
|
||||||
|
```bash
|
||||||
|
cd aula-01
|
||||||
|
docker-compose up
|
||||||
|
```
|
||||||
|
The app runs on port 3000. After MAX_REQUESTS (default 3), the app stops responding.
|
||||||
|
|
||||||
|
### Aula 02 (Kubernetes)
|
||||||
|
```bash
|
||||||
|
cd aula-02
|
||||||
|
kubectl apply -f configmap.yaml
|
||||||
|
kubectl apply -f deployment.yaml
|
||||||
|
kubectl apply -f service.yaml
|
||||||
|
```
|
||||||
|
Access via NodePort 30080. The liveness probe at `/health` will detect when the app hangs and restart the container.
|
||||||
|
|
||||||
|
### Aula 03 (Kubernetes - High Availability)
|
||||||
|
```bash
|
||||||
|
cd aula-03
|
||||||
|
kubectl apply -f configmap.yaml
|
||||||
|
kubectl apply -f deployment.yaml
|
||||||
|
kubectl apply -f service.yaml
|
||||||
|
```
|
||||||
|
Builds on Aula 02 with multiple replicas and a readiness probe. When one pod hangs, the others continue serving requests. The readiness probe removes unhealthy pods from the Service immediately, while the liveness probe restarts them.
|
||||||
|
|
||||||
|
### Aula 04 (Kubernetes - NGINX Ingress with Keep Request)
|
||||||
|
Requires NGINX Ingress Controller with Lua support.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd aula-04
|
||||||
|
kubectl apply -f configmap.yaml
|
||||||
|
kubectl apply -f deployment.yaml
|
||||||
|
kubectl apply -f service.yaml
|
||||||
|
kubectl apply -f ingress-nginx.yaml
|
||||||
|
```
|
||||||
|
Access via NGINX Ingress. The Keep Request pattern uses Lua to hold requests when backends are unavailable, waiting up to 99s for a pod to become ready instead of returning 503 immediately. This eliminates user-visible failures during pod restarts.
|
||||||
|
|
||||||
|
### Aula 05 (Kubernetes - KEDA Auto-scaling)
|
||||||
|
```bash
|
||||||
|
cd aula-05
|
||||||
|
./setup.sh
|
||||||
|
```
|
||||||
|
Installs Victoria Metrics (metrics collection), KEDA (event-driven autoscaling), and NGINX Ingress. The ScaledObject monitors metrics like unavailable pods and restart counts, automatically scaling the deployment from 5 to 30 replicas based on demand.
|
||||||
|
|
||||||
|
### Aula 06 (Kubernetes - n8n via Helm)
|
||||||
|
```bash
|
||||||
|
cd aula-06
|
||||||
|
./setup.sh
|
||||||
|
```
|
||||||
|
Deploys n8n workflow automation platform via Helm chart with Queue Mode architecture: main node, workers (2-5 replicas with HPA), webhooks (1-3 replicas with HPA), PostgreSQL, and Redis. Access via http://n8n.localhost (requires NGINX Ingress).
|
||||||
|
|
||||||
|
### Aula 07 (Talos Linux - Custom Image)
|
||||||
|
Follow the instructions in `aula-07/README.md` to create a custom Talos Linux image on Hetzner Cloud using Talos Factory. This is a prerequisite for Aula 08.
|
||||||
|
|
||||||
|
### Aula 08 (OpenTofu - Talos Cluster on Hetzner Cloud)
|
||||||
|
```bash
|
||||||
|
cd aula-08
|
||||||
|
./setup.sh
|
||||||
|
```
|
||||||
|
Provisions a full HA Kubernetes cluster on Hetzner Cloud using OpenTofu:
|
||||||
|
- 3x Control Plane nodes (CAX11 ARM64)
|
||||||
|
- 1x Worker node (CAX11 ARM64)
|
||||||
|
- Private network, Floating IP, Firewall
|
||||||
|
- Cluster Autoscaler support (1-5 workers)
|
||||||
|
- Estimated cost: ~€18/month (base), up to ~€33/month with max autoscaling
|
||||||
|
|
||||||
|
Prerequisites:
|
||||||
|
- OpenTofu (`brew install opentofu`)
|
||||||
|
- talosctl (`brew install siderolabs/tap/talosctl`)
|
||||||
|
- kubectl
|
||||||
|
- Hetzner Cloud API token
|
||||||
|
- Talos image ID from Aula 07
|
||||||
|
|
||||||
|
Optional - Enable cluster autoscaling:
|
||||||
|
```bash
|
||||||
|
./install-autoscaler.sh
|
||||||
|
```
|
||||||
|
This installs the Kubernetes Cluster Autoscaler configured for Hetzner Cloud, automatically scaling workers from 1 to 5 based on pending pods.
|
||||||
|
|
||||||
|
To destroy the infrastructure: `./cleanup.sh`
|
||||||
|
|
||||||
|
## App Behavior
|
||||||
|
|
||||||
|
The Node.js app (`app.js`) is intentionally designed to:
|
||||||
|
1. Accept requests normally until `MAX_REQUESTS` is reached
|
||||||
|
2. Stop responding (hang) after the limit, simulating a crashed but running process
|
||||||
|
3. The `/health` endpoint also stops responding when the app is "stuck"
|
||||||
|
|
||||||
|
This behavior demonstrates why process-level monitoring (restart: always) is insufficient and why application-level health checks (liveness probes) are necessary.
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
- `MAX_REQUESTS`: Number of requests before the app hangs (default: 3)
|
||||||
@@ -1,26 +0,0 @@
|
|||||||
# Middleware Retry - tenta outros pods quando um falha
|
|
||||||
apiVersion: traefik.io/v1alpha1
|
|
||||||
kind: Middleware
|
|
||||||
metadata:
|
|
||||||
name: retry-middleware
|
|
||||||
spec:
|
|
||||||
retry:
|
|
||||||
attempts: 5 # 5 tentativas
|
|
||||||
initialInterval: 500ms # 500ms entre ciclos
|
|
||||||
---
|
|
||||||
# IngressRoute
|
|
||||||
apiVersion: traefik.io/v1alpha1
|
|
||||||
kind: IngressRoute
|
|
||||||
metadata:
|
|
||||||
name: node-bugado
|
|
||||||
spec:
|
|
||||||
entryPoints:
|
|
||||||
- web
|
|
||||||
routes:
|
|
||||||
- match: PathPrefix(`/`)
|
|
||||||
kind: Rule
|
|
||||||
middlewares:
|
|
||||||
- name: retry-middleware
|
|
||||||
services:
|
|
||||||
- name: node-bugado
|
|
||||||
port: 3000
|
|
||||||
@@ -204,7 +204,7 @@ echo " # Ver todos os pods"
|
|||||||
echo " kubectl get pods -n n8n"
|
echo " kubectl get pods -n n8n"
|
||||||
echo ""
|
echo ""
|
||||||
echo " # Ver logs do n8n"
|
echo " # Ver logs do n8n"
|
||||||
echo " kubectl logs -f -l app.kubernetes.io/component=main -n n8n"
|
echo " kubectl logs -f -n n8n deployment/n8n"
|
||||||
echo ""
|
echo ""
|
||||||
echo " # Ver HPA (autoscaler)"
|
echo " # Ver HPA (autoscaler)"
|
||||||
echo " kubectl get hpa -n n8n"
|
echo " kubectl get hpa -n n8n"
|
||||||
@@ -218,6 +218,12 @@ echo ""
|
|||||||
echo " # Fazer upgrade do helm chart"
|
echo " # Fazer upgrade do helm chart"
|
||||||
echo " helm upgrade --reuse-values --values --custom-values.yaml n8n community-charts/n8n --namespace n8n"
|
echo " helm upgrade --reuse-values --values --custom-values.yaml n8n community-charts/n8n --namespace n8n"
|
||||||
echo ""
|
echo ""
|
||||||
|
echo " # Verificar historico de releases"
|
||||||
|
echo " helm history n8n -n n8n"
|
||||||
|
echo ""
|
||||||
|
echo " # Fazer rollback do historico de releases"
|
||||||
|
echo " helm rollback n8n <nº da release>"
|
||||||
|
echo ""
|
||||||
echo "=============================================="
|
echo "=============================================="
|
||||||
echo ""
|
echo ""
|
||||||
|
|
||||||
|
|||||||
63
aula-07/README.md
Normal file
63
aula-07/README.md
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
---
|
||||||
|
criado: 2025-12-27T01:10:54-03:00
|
||||||
|
atualizado: 2025-12-27T02:25:34-03:00
|
||||||
|
---
|
||||||
|
|
||||||
|
|
||||||
|
A Hetzner Cloud não oferece suporte ao upload de imagens personalizadas. Somente via suporte [issue 3599](https://github.com/siderolabs/talos/issues/3599#issuecomment-841172018)
|
||||||
|
|
||||||
|
Workaround
|
||||||
|
1. Execute uma instância no modo de recuperação e substitua o sistema operacional pela imagem do Talos.
|
||||||
|
2. 🚧 De a cordo com a documentacao oficial é possivel usar [o Hashicorp Packer](https://www.packer.io/docs/builders/hetzner-cloud) para preparar uma imagem. Mas a documentação oficial foi removida dos builders. E nos meus testes deu kernel panic....
|
||||||
|
|
||||||
|
|
||||||
|
Passo 1 -> https://factory.talos.dev/
|
||||||
|
|
||||||
|
- [ ] siderolabs/amd-ucode / siderolabs/intel-ucode
|
||||||
|
- Spectre / Meltdown (V1, V2, V4)
|
||||||
|
- Predição de desvios
|
||||||
|
- Leitura de memória privilegiada a partir de userland
|
||||||
|
- Zenbleed (CVE-2023-20593)
|
||||||
|
- CPUs AMD Zen 2
|
||||||
|
- Vazamento de registros via execução especulativa
|
||||||
|
- Impacta **VMs e containers**
|
||||||
|
- Speculative Return Stack Overflow (SRSO)
|
||||||
|
- CPUs AMD modernas
|
||||||
|
- Jailbreak
|
||||||
|
- [ ] siderolabs/qemu-guest-agent (Hetzner usa QEMU / KVM)
|
||||||
|
- [ ] siderolabs/stargz-snapshotter (https://github.com/containerd/stargz-snapshotter)
|
||||||
|
- [ ] siderolabs/util-linux-tools (lsblk, mount, findmnt)
|
||||||
|
- [ ] siderolabs/binfmt-misc (Se for usar imagem multi-arch)
|
||||||
|
- siderolabs/tailscale OU cloudflared -> https://spot.rackspace.com/
|
||||||
|
- zfs -> Se for Baremetal (~50% mais rapido que ext4)
|
||||||
|
|
||||||
|
bootloader: dual-boot
|
||||||
|
|
||||||
|
https://factory.talos.dev/?arch=amd64&board=undefined&bootloader=dual-boot&cmdline-set=true&extensions=-&extensions=siderolabs%2Famd-ucode&extensions=siderolabs%2Fbinfmt-misc&extensions=siderolabs%2Fintel-ucode&extensions=siderolabs%2Fqemu-guest-agent&extensions=siderolabs%2Fstargz-snapshotter&extensions=siderolabs%2Futil-linux-tools&platform=hcloud&secureboot=undefined&target=cloud&version=1.12.0
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Testar se vc entrou em Rescue mode
|
||||||
|
df
|
||||||
|
|
||||||
|
### Resultado será tipo:
|
||||||
|
# Filesystem 1K-blocks Used Available Use% Mounted on
|
||||||
|
# udev 987432 0 987432 0% /dev
|
||||||
|
# 213.133.99.101:/nfs 308577696 247015616 45817536 85% /root/.oldroot/nfs
|
||||||
|
# overlay 995672 8340 987332 1% /
|
||||||
|
# tmpfs 995672 0 995672 0% /dev/shm
|
||||||
|
# tmpfs 398272 572 397700 1% /run
|
||||||
|
# tmpfs 5120 0 5120 0% /run/lock
|
||||||
|
# tmpfs 199132 0 199132 0% /run/user/0
|
||||||
|
|
||||||
|
# Baixar a imagem do Talos
|
||||||
|
cd /tmp
|
||||||
|
wget -O /tmp/talos.raw.xz https://factory.talos.dev/image/c4f17c623d4ac547a243489f1b3285afd64a76b491b1c5c24ef6363587cef55f/v1.12.0/hcloud-amd64.raw.xz
|
||||||
|
|
||||||
|
# Escrever o sistema (Vai demorar uns 4 a 5 minutos)
|
||||||
|
xz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync
|
||||||
|
|
||||||
|
# Desligue a instancia antes do snapshot
|
||||||
|
shutdown -h now
|
||||||
|
```
|
||||||
132
aula-08/cleanup.sh
Executable file
132
aula-08/cleanup.sh
Executable file
@@ -0,0 +1,132 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# Aula 08 - Cleanup
|
||||||
|
# Destrói a infraestrutura provisionada
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Cores para output
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
cd "$SCRIPT_DIR"
|
||||||
|
|
||||||
|
log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
|
||||||
|
log_success() { echo -e "${GREEN}[OK]${NC} $1"; }
|
||||||
|
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
|
||||||
|
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "============================================"
|
||||||
|
echo " Cleanup - Destruir Infraestrutura"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Verificar se tofu está instalado
|
||||||
|
if ! command -v tofu &> /dev/null; then
|
||||||
|
log_error "OpenTofu não encontrado!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Verificar se há state
|
||||||
|
if [ ! -f "terraform.tfstate" ] && [ ! -d ".terraform" ]; then
|
||||||
|
log_warn "Nenhuma infraestrutura encontrada para destruir."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Verificar workers do autoscaler (criados fora do OpenTofu)
|
||||||
|
if [ -f "kubeconfig" ]; then
|
||||||
|
export KUBECONFIG="$SCRIPT_DIR/kubeconfig"
|
||||||
|
|
||||||
|
AUTOSCALER_WORKERS=$(kubectl get nodes -l node.kubernetes.io/instance-type=cax11 \
|
||||||
|
--no-headers 2>/dev/null | wc -l | tr -d ' ' || echo "0")
|
||||||
|
|
||||||
|
if [ "$AUTOSCALER_WORKERS" -gt "1" ]; then
|
||||||
|
log_warn "Detectados $AUTOSCALER_WORKERS workers (incluindo os do autoscaler)"
|
||||||
|
log_warn "Workers criados pelo autoscaler serão removidos via API Hetzner"
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
log_warn "ATENÇÃO: Esta operação irá DESTRUIR todos os recursos!"
|
||||||
|
echo ""
|
||||||
|
echo "Recursos que serão removidos:"
|
||||||
|
echo " - 3x Control Plane nodes"
|
||||||
|
echo " - Workers (incluindo os criados pelo autoscaler)"
|
||||||
|
echo " - Rede privada"
|
||||||
|
echo " - Floating IP"
|
||||||
|
echo " - Firewall"
|
||||||
|
echo " - Placement Group"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
read -p "Tem certeza que deseja continuar? (digite 'sim' para confirmar): " confirm
|
||||||
|
|
||||||
|
if [ "$confirm" != "sim" ]; then
|
||||||
|
log_info "Operação cancelada"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Remover workers do autoscaler primeiro (se existirem)
|
||||||
|
if [ -f "terraform.tfvars" ]; then
|
||||||
|
HCLOUD_TOKEN=$(grep 'hcloud_token' terraform.tfvars | cut -d'"' -f2)
|
||||||
|
CLUSTER_NAME=$(tofu output -raw cluster_name 2>/dev/null || echo "")
|
||||||
|
|
||||||
|
if [ -n "$HCLOUD_TOKEN" ] && [ -n "$CLUSTER_NAME" ]; then
|
||||||
|
log_info "Verificando workers do autoscaler..."
|
||||||
|
|
||||||
|
# Listar servers com label do cluster que NÃO são gerenciados pelo tofu
|
||||||
|
AUTOSCALER_SERVERS=$(HCLOUD_TOKEN="$HCLOUD_TOKEN" hcloud server list \
|
||||||
|
-l cluster="$CLUSTER_NAME" \
|
||||||
|
-o noheader -o columns=id,name 2>/dev/null | \
|
||||||
|
grep -E "worker-pool" || true)
|
||||||
|
|
||||||
|
if [ -n "$AUTOSCALER_SERVERS" ]; then
|
||||||
|
log_warn "Removendo workers criados pelo autoscaler..."
|
||||||
|
echo "$AUTOSCALER_SERVERS" | while read -r server_id server_name; do
|
||||||
|
log_info " Removendo $server_name (ID: $server_id)..."
|
||||||
|
HCLOUD_TOKEN="$HCLOUD_TOKEN" hcloud server delete "$server_id" --quiet 2>/dev/null || true
|
||||||
|
done
|
||||||
|
log_success "Workers do autoscaler removidos"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
log_info "Destruindo infraestrutura via OpenTofu..."
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
tofu destroy -auto-approve
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
log_success "Infraestrutura destruída!"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Limpar arquivos locais (manter .terraform para re-deploy rápido)
|
||||||
|
log_info "Limpando arquivos gerados..."
|
||||||
|
|
||||||
|
rm -f kubeconfig talosconfig tfplan terraform.tfstate terraform.tfstate.backup
|
||||||
|
|
||||||
|
log_success "Arquivos removidos"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Perguntar sobre terraform.tfvars
|
||||||
|
if [ -f "terraform.tfvars" ]; then
|
||||||
|
read -p "Remover terraform.tfvars também? (s/N): " remove_tfvars
|
||||||
|
if [[ "$remove_tfvars" =~ ^[Ss]$ ]]; then
|
||||||
|
rm -f terraform.tfvars
|
||||||
|
log_success "terraform.tfvars removido"
|
||||||
|
else
|
||||||
|
log_info "terraform.tfvars mantido (útil para re-deploy)"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
log_success "Cleanup concluído!"
|
||||||
158
aula-08/cluster-autoscaler.yaml
Normal file
158
aula-08/cluster-autoscaler.yaml
Normal file
@@ -0,0 +1,158 @@
|
|||||||
|
############################################################
|
||||||
|
# Cluster Autoscaler para Hetzner Cloud + Talos
|
||||||
|
# Escala workers automaticamente de 1 a 5 nodes
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
---
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Namespace
|
||||||
|
metadata:
|
||||||
|
name: cluster-autoscaler
|
||||||
|
|
||||||
|
# Secret is created via install-autoscaler.sh (kubectl create secret)
|
||||||
|
# to properly handle base64 encoding of cloud-init
|
||||||
|
|
||||||
|
---
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ServiceAccount
|
||||||
|
metadata:
|
||||||
|
name: cluster-autoscaler
|
||||||
|
namespace: cluster-autoscaler
|
||||||
|
|
||||||
|
---
|
||||||
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: ClusterRole
|
||||||
|
metadata:
|
||||||
|
name: cluster-autoscaler
|
||||||
|
rules:
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["events", "endpoints"]
|
||||||
|
verbs: ["create", "patch"]
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["pods/eviction"]
|
||||||
|
verbs: ["create"]
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["pods/status"]
|
||||||
|
verbs: ["update"]
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["endpoints"]
|
||||||
|
resourceNames: ["cluster-autoscaler"]
|
||||||
|
verbs: ["get", "update"]
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["nodes"]
|
||||||
|
verbs: ["watch", "list", "get", "update"]
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["namespaces", "pods", "services", "replicationcontrollers", "persistentvolumeclaims", "persistentvolumes"]
|
||||||
|
verbs: ["watch", "list", "get"]
|
||||||
|
- apiGroups: ["batch"]
|
||||||
|
resources: ["jobs", "cronjobs"]
|
||||||
|
verbs: ["watch", "list", "get"]
|
||||||
|
- apiGroups: ["batch", "extensions"]
|
||||||
|
resources: ["jobs"]
|
||||||
|
verbs: ["get", "list", "patch", "watch"]
|
||||||
|
- apiGroups: ["extensions"]
|
||||||
|
resources: ["replicasets", "daemonsets"]
|
||||||
|
verbs: ["watch", "list", "get"]
|
||||||
|
- apiGroups: ["policy"]
|
||||||
|
resources: ["poddisruptionbudgets"]
|
||||||
|
verbs: ["watch", "list"]
|
||||||
|
- apiGroups: ["apps"]
|
||||||
|
resources: ["statefulsets", "replicasets", "daemonsets"]
|
||||||
|
verbs: ["watch", "list", "get"]
|
||||||
|
- apiGroups: ["storage.k8s.io"]
|
||||||
|
resources: ["storageclasses", "csinodes", "csidrivers", "csistoragecapacities"]
|
||||||
|
verbs: ["watch", "list", "get"]
|
||||||
|
- apiGroups: ["coordination.k8s.io"]
|
||||||
|
resources: ["leases"]
|
||||||
|
verbs: ["create"]
|
||||||
|
- apiGroups: ["coordination.k8s.io"]
|
||||||
|
resources: ["leases"]
|
||||||
|
resourceNames: ["cluster-autoscaler"]
|
||||||
|
verbs: ["get", "update"]
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["configmaps"]
|
||||||
|
verbs: ["create", "get", "update", "delete", "list", "watch"]
|
||||||
|
|
||||||
|
---
|
||||||
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: ClusterRoleBinding
|
||||||
|
metadata:
|
||||||
|
name: cluster-autoscaler
|
||||||
|
roleRef:
|
||||||
|
apiGroup: rbac.authorization.k8s.io
|
||||||
|
kind: ClusterRole
|
||||||
|
name: cluster-autoscaler
|
||||||
|
subjects:
|
||||||
|
- kind: ServiceAccount
|
||||||
|
name: cluster-autoscaler
|
||||||
|
namespace: cluster-autoscaler
|
||||||
|
|
||||||
|
---
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: cluster-autoscaler
|
||||||
|
namespace: cluster-autoscaler
|
||||||
|
labels:
|
||||||
|
app: cluster-autoscaler
|
||||||
|
spec:
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: cluster-autoscaler
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: cluster-autoscaler
|
||||||
|
spec:
|
||||||
|
serviceAccountName: cluster-autoscaler
|
||||||
|
# Use host network to access external APIs (Hetzner)
|
||||||
|
hostNetwork: true
|
||||||
|
dnsPolicy: ClusterFirstWithHostNet
|
||||||
|
# Workaround: Talos DNS proxy doesn't forward to upstream correctly
|
||||||
|
hostAliases:
|
||||||
|
- ip: "213.239.246.73"
|
||||||
|
hostnames:
|
||||||
|
- "api.hetzner.cloud"
|
||||||
|
containers:
|
||||||
|
- name: cluster-autoscaler
|
||||||
|
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.31.0
|
||||||
|
command:
|
||||||
|
- ./cluster-autoscaler
|
||||||
|
- --cloud-provider=hetzner
|
||||||
|
- --nodes=0:5:CAX11:nbg1:worker-pool
|
||||||
|
- --nodes=0:0:CAX11:nbg1:draining-node-pool
|
||||||
|
- --scale-down-enabled=true
|
||||||
|
- --scale-down-delay-after-add=5m
|
||||||
|
- --scale-down-unneeded-time=3m
|
||||||
|
- --scale-down-utilization-threshold=0.5
|
||||||
|
- --skip-nodes-with-local-storage=false
|
||||||
|
- --skip-nodes-with-system-pods=false
|
||||||
|
- --balance-similar-node-groups=true
|
||||||
|
- --v=4
|
||||||
|
env:
|
||||||
|
- name: HCLOUD_TOKEN
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: hcloud-autoscaler
|
||||||
|
key: token
|
||||||
|
- name: HCLOUD_CLOUD_INIT
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: hcloud-autoscaler
|
||||||
|
key: cloud-init
|
||||||
|
- name: HCLOUD_IMAGE
|
||||||
|
value: "${TALOS_IMAGE_ID}"
|
||||||
|
- name: HCLOUD_NETWORK
|
||||||
|
value: "${NETWORK_NAME}"
|
||||||
|
- name: HCLOUD_FIREWALL
|
||||||
|
value: "${FIREWALL_NAME}"
|
||||||
|
- name: HCLOUD_SSH_KEY
|
||||||
|
value: "${SSH_KEY_NAME}"
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 300Mi
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 500Mi
|
||||||
139
aula-08/install-autoscaler.sh
Executable file
139
aula-08/install-autoscaler.sh
Executable file
@@ -0,0 +1,139 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# Instala o Cluster Autoscaler no cluster Talos
|
||||||
|
# Requer: cluster provisionado via setup.sh
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Cores
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m'
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
cd "$SCRIPT_DIR"
|
||||||
|
|
||||||
|
log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
|
||||||
|
log_success() { echo -e "${GREEN}[OK]${NC} $1"; }
|
||||||
|
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
|
||||||
|
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "============================================"
|
||||||
|
echo " Instalando Cluster Autoscaler"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Verificar pré-requisitos
|
||||||
|
if [ ! -f "kubeconfig" ]; then
|
||||||
|
log_error "kubeconfig não encontrado! Execute setup.sh primeiro."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ ! -f "terraform.tfvars" ]; then
|
||||||
|
log_error "terraform.tfvars não encontrado!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
export KUBECONFIG="$SCRIPT_DIR/kubeconfig"
|
||||||
|
|
||||||
|
# Verificar conexão com cluster
|
||||||
|
log_info "Verificando conexão com o cluster..."
|
||||||
|
if ! kubectl get nodes &>/dev/null; then
|
||||||
|
log_error "Não foi possível conectar ao cluster!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
log_success "Conectado ao cluster"
|
||||||
|
|
||||||
|
# Obter valores do OpenTofu
|
||||||
|
log_info "Obtendo configurações do OpenTofu..."
|
||||||
|
|
||||||
|
WORKER_CONFIG_BASE64=$(tofu output -raw autoscaler_worker_config 2>/dev/null)
|
||||||
|
TALOS_IMAGE_ID=$(tofu output -raw autoscaler_image_id 2>/dev/null)
|
||||||
|
CLUSTER_NAME=$(tofu output -raw cluster_name 2>/dev/null)
|
||||||
|
NETWORK_ID=$(tofu output -raw network_id 2>/dev/null)
|
||||||
|
FIREWALL_ID=$(tofu output -raw firewall_id 2>/dev/null)
|
||||||
|
SSH_KEY_NAME=$(tofu output -raw ssh_key_name 2>/dev/null)
|
||||||
|
|
||||||
|
# Obter token do terraform.tfvars
|
||||||
|
HCLOUD_TOKEN=$(grep 'hcloud_token' terraform.tfvars | cut -d'"' -f2)
|
||||||
|
|
||||||
|
if [ -z "$WORKER_CONFIG_BASE64" ] || [ -z "$HCLOUD_TOKEN" ]; then
|
||||||
|
log_error "Não foi possível obter as configurações necessárias!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
log_success "Configurações obtidas"
|
||||||
|
echo " - Cluster: $CLUSTER_NAME"
|
||||||
|
echo " - Image ID: $TALOS_IMAGE_ID"
|
||||||
|
echo " - Network ID: $NETWORK_ID"
|
||||||
|
echo " - SSH Key: $SSH_KEY_NAME"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Criar namespace com política privileged (necessário para hostNetwork)
|
||||||
|
log_info "Criando namespace cluster-autoscaler..."
|
||||||
|
kubectl create namespace cluster-autoscaler --dry-run=client -o yaml | kubectl apply -f -
|
||||||
|
kubectl label namespace cluster-autoscaler pod-security.kubernetes.io/enforce=privileged --overwrite
|
||||||
|
|
||||||
|
# Criar secret com credenciais
|
||||||
|
log_info "Criando secret com credenciais..."
|
||||||
|
kubectl create secret generic hcloud-autoscaler \
|
||||||
|
--namespace cluster-autoscaler \
|
||||||
|
--from-literal=token="$HCLOUD_TOKEN" \
|
||||||
|
--from-literal=cloud-init="$WORKER_CONFIG_BASE64" \
|
||||||
|
--dry-run=client -o yaml | kubectl apply -f -
|
||||||
|
|
||||||
|
log_success "Secret criado"
|
||||||
|
|
||||||
|
# Aplicar RBAC e Deployment
|
||||||
|
log_info "Aplicando manifesto do cluster-autoscaler..."
|
||||||
|
|
||||||
|
# Substituir variáveis no template e aplicar
|
||||||
|
cat cluster-autoscaler.yaml | \
|
||||||
|
sed "s|\${TALOS_IMAGE_ID}|$TALOS_IMAGE_ID|g" | \
|
||||||
|
sed "s|\${NETWORK_NAME}|$CLUSTER_NAME-network|g" | \
|
||||||
|
sed "s|\${FIREWALL_NAME}|$CLUSTER_NAME-firewall|g" | \
|
||||||
|
sed "s|\${SSH_KEY_NAME}|$SSH_KEY_NAME|g" | \
|
||||||
|
kubectl apply -f -
|
||||||
|
|
||||||
|
log_success "Cluster Autoscaler instalado!"
|
||||||
|
|
||||||
|
# Aguardar pod ficar pronto
|
||||||
|
log_info "Aguardando pod do autoscaler..."
|
||||||
|
kubectl wait --for=condition=ready pod \
|
||||||
|
-l app=cluster-autoscaler \
|
||||||
|
-n cluster-autoscaler \
|
||||||
|
--timeout=120s
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
log_success "Cluster Autoscaler pronto!"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "============================================"
|
||||||
|
echo " Configuração do Autoscaler"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
echo " Pool: worker-pool"
|
||||||
|
echo " Tipo: CAX11 (ARM64)"
|
||||||
|
echo " Região: nbg1 (Nuremberg)"
|
||||||
|
echo " Min nodes: 1"
|
||||||
|
echo " Max nodes: 5"
|
||||||
|
echo ""
|
||||||
|
echo " Scale down após: 5 minutos"
|
||||||
|
echo " Utilização mínima: 50%"
|
||||||
|
echo ""
|
||||||
|
echo "Comandos úteis:"
|
||||||
|
echo ""
|
||||||
|
echo " # Ver logs do autoscaler"
|
||||||
|
echo " kubectl logs -n cluster-autoscaler -l app=cluster-autoscaler -f"
|
||||||
|
echo ""
|
||||||
|
echo " # Ver status dos nodes"
|
||||||
|
echo " kubectl get nodes"
|
||||||
|
echo ""
|
||||||
|
echo " # Testar scale up (criar pods pending)"
|
||||||
|
echo " kubectl create deployment test --image=nginx --replicas=10"
|
||||||
|
echo ""
|
||||||
391
aula-08/main.tf
Normal file
391
aula-08/main.tf
Normal file
@@ -0,0 +1,391 @@
|
|||||||
|
############################################################
|
||||||
|
# Hetzner Talos Kubernetes Cluster - Base Infrastructure
|
||||||
|
# Using custom Talos image created from ISO
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# PROVIDERS CONFIGURATION
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
provider "hcloud" {
|
||||||
|
token = var.hcloud_token
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# DATA SOURCES
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
# Use the custom Talos image created in aula-07
|
||||||
|
data "hcloud_image" "talos" {
|
||||||
|
id = var.talos_image_id
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# RANDOM RESOURCES
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
resource "random_string" "cluster_id" {
|
||||||
|
length = 6
|
||||||
|
special = false
|
||||||
|
lower = true
|
||||||
|
upper = false
|
||||||
|
}
|
||||||
|
|
||||||
|
locals {
|
||||||
|
cluster_name = "talos-${random_string.cluster_id.result}"
|
||||||
|
common_labels = {
|
||||||
|
cluster = local.cluster_name
|
||||||
|
environment = var.environment
|
||||||
|
managed_by = "terraform"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# SSH KEY (for emergency access only)
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
data "hcloud_ssh_keys" "all" {}
|
||||||
|
|
||||||
|
locals {
|
||||||
|
ssh_key_normalized = trimspace(split(" ", var.ssh_public_key)[0] == "ssh-rsa" ?
|
||||||
|
join(" ", slice(split(" ", var.ssh_public_key), 0, 2)) :
|
||||||
|
var.ssh_public_key)
|
||||||
|
|
||||||
|
ssh_key_matches = [
|
||||||
|
for key in data.hcloud_ssh_keys.all.ssh_keys : key.id
|
||||||
|
if key.public_key == local.ssh_key_normalized || key.public_key == var.ssh_public_key
|
||||||
|
]
|
||||||
|
|
||||||
|
ssh_key_id = length(local.ssh_key_matches) > 0 ? local.ssh_key_matches[0] : hcloud_ssh_key.admin[0].id
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "hcloud_ssh_key" "admin" {
|
||||||
|
count = length(local.ssh_key_matches) == 0 ? 1 : 0
|
||||||
|
name = "${local.cluster_name}-admin"
|
||||||
|
public_key = var.ssh_public_key
|
||||||
|
labels = local.common_labels
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# NETWORK CONFIGURATION
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
resource "hcloud_network" "cluster" {
|
||||||
|
name = "${local.cluster_name}-network"
|
||||||
|
ip_range = "10.0.0.0/16"
|
||||||
|
labels = local.common_labels
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "hcloud_network_subnet" "cluster" {
|
||||||
|
type = "cloud"
|
||||||
|
network_id = hcloud_network.cluster.id
|
||||||
|
network_zone = "eu-central"
|
||||||
|
ip_range = "10.0.1.0/24"
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# FIREWALL CONFIGURATION
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
resource "hcloud_firewall" "cluster" {
|
||||||
|
name = "${local.cluster_name}-firewall"
|
||||||
|
labels = local.common_labels
|
||||||
|
|
||||||
|
# Talos API access
|
||||||
|
rule {
|
||||||
|
direction = "in"
|
||||||
|
protocol = "tcp"
|
||||||
|
port = "50000"
|
||||||
|
source_ips = ["0.0.0.0/0", "::/0"]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Kubernetes API
|
||||||
|
rule {
|
||||||
|
direction = "in"
|
||||||
|
protocol = "tcp"
|
||||||
|
port = "6443"
|
||||||
|
source_ips = ["0.0.0.0/0", "::/0"]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Allow HTTP/HTTPS for Ingress
|
||||||
|
rule {
|
||||||
|
direction = "in"
|
||||||
|
protocol = "tcp"
|
||||||
|
port = "80"
|
||||||
|
source_ips = ["0.0.0.0/0", "::/0"]
|
||||||
|
}
|
||||||
|
|
||||||
|
rule {
|
||||||
|
direction = "in"
|
||||||
|
protocol = "tcp"
|
||||||
|
port = "443"
|
||||||
|
source_ips = ["0.0.0.0/0", "::/0"]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Allow NodePort range (for services)
|
||||||
|
rule {
|
||||||
|
direction = "in"
|
||||||
|
protocol = "tcp"
|
||||||
|
port = "30000-32767"
|
||||||
|
source_ips = ["0.0.0.0/0", "::/0"]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Allow all outbound traffic
|
||||||
|
rule {
|
||||||
|
direction = "out"
|
||||||
|
protocol = "tcp"
|
||||||
|
port = "any"
|
||||||
|
destination_ips = ["0.0.0.0/0", "::/0"]
|
||||||
|
}
|
||||||
|
|
||||||
|
rule {
|
||||||
|
direction = "out"
|
||||||
|
protocol = "udp"
|
||||||
|
port = "any"
|
||||||
|
destination_ips = ["0.0.0.0/0", "::/0"]
|
||||||
|
}
|
||||||
|
|
||||||
|
rule {
|
||||||
|
direction = "out"
|
||||||
|
protocol = "icmp"
|
||||||
|
destination_ips = ["0.0.0.0/0", "::/0"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# PLACEMENT GROUP (keep nodes close for low latency)
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
resource "hcloud_placement_group" "cluster" {
|
||||||
|
name = "${local.cluster_name}-pg"
|
||||||
|
type = "spread"
|
||||||
|
labels = local.common_labels
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# CONTROL PLANE NODES (HA with 3 CAX11 nodes)
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
resource "hcloud_server" "control_plane" {
|
||||||
|
count = 3
|
||||||
|
name = "${local.cluster_name}-cp-${count.index}"
|
||||||
|
server_type = "cax11"
|
||||||
|
image = data.hcloud_image.talos.id
|
||||||
|
location = "nbg1" # CAX11 only available in Nuremberg
|
||||||
|
ssh_keys = [local.ssh_key_id]
|
||||||
|
|
||||||
|
firewall_ids = [hcloud_firewall.cluster.id]
|
||||||
|
placement_group_id = hcloud_placement_group.cluster.id
|
||||||
|
|
||||||
|
labels = merge(local.common_labels, {
|
||||||
|
role = "control-plane"
|
||||||
|
node = "cp-${count.index}"
|
||||||
|
arch = "arm64"
|
||||||
|
})
|
||||||
|
|
||||||
|
public_net {
|
||||||
|
ipv4_enabled = true
|
||||||
|
ipv6_enabled = true
|
||||||
|
}
|
||||||
|
|
||||||
|
lifecycle {
|
||||||
|
ignore_changes = [ssh_keys]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "hcloud_server_network" "control_plane" {
|
||||||
|
count = 3
|
||||||
|
server_id = hcloud_server.control_plane[count.index].id
|
||||||
|
network_id = hcloud_network.cluster.id
|
||||||
|
ip = "10.0.1.${10 + count.index}"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Floating IP for stable control plane access
|
||||||
|
resource "hcloud_floating_ip" "control_plane" {
|
||||||
|
type = "ipv4"
|
||||||
|
name = "${local.cluster_name}-cp-ip"
|
||||||
|
home_location = "nbg1"
|
||||||
|
labels = local.common_labels
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "hcloud_floating_ip_assignment" "control_plane" {
|
||||||
|
floating_ip_id = hcloud_floating_ip.control_plane.id
|
||||||
|
server_id = hcloud_server.control_plane[0].id
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# WORKER NODE (Single CAX11)
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
resource "hcloud_server" "worker" {
|
||||||
|
count = 1
|
||||||
|
name = "${local.cluster_name}-worker-${count.index}"
|
||||||
|
server_type = "cax11"
|
||||||
|
image = data.hcloud_image.talos.id
|
||||||
|
location = "nbg1"
|
||||||
|
ssh_keys = [local.ssh_key_id]
|
||||||
|
|
||||||
|
firewall_ids = [hcloud_firewall.cluster.id]
|
||||||
|
placement_group_id = hcloud_placement_group.cluster.id
|
||||||
|
|
||||||
|
labels = merge(local.common_labels, {
|
||||||
|
role = "worker"
|
||||||
|
node = "worker-${count.index}"
|
||||||
|
arch = "arm64"
|
||||||
|
})
|
||||||
|
|
||||||
|
public_net {
|
||||||
|
ipv4_enabled = true
|
||||||
|
ipv6_enabled = true
|
||||||
|
}
|
||||||
|
|
||||||
|
lifecycle {
|
||||||
|
ignore_changes = [ssh_keys]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "hcloud_server_network" "worker" {
|
||||||
|
count = 1
|
||||||
|
server_id = hcloud_server.worker[count.index].id
|
||||||
|
network_id = hcloud_network.cluster.id
|
||||||
|
ip = "10.0.1.${20 + count.index}"
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# TALOS CONFIGURATION
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
# Generate Talos machine secrets
|
||||||
|
resource "talos_machine_secrets" "this" {
|
||||||
|
talos_version = var.talos_version
|
||||||
|
}
|
||||||
|
|
||||||
|
# Generate Talos client configuration
|
||||||
|
data "talos_client_configuration" "this" {
|
||||||
|
cluster_name = local.cluster_name
|
||||||
|
client_configuration = talos_machine_secrets.this.client_configuration
|
||||||
|
endpoints = [hcloud_floating_ip.control_plane.ip_address]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Control plane configuration
|
||||||
|
data "talos_machine_configuration" "control_plane" {
|
||||||
|
count = 3
|
||||||
|
cluster_name = local.cluster_name
|
||||||
|
machine_type = "controlplane"
|
||||||
|
cluster_endpoint = "https://${hcloud_floating_ip.control_plane.ip_address}:6443"
|
||||||
|
machine_secrets = talos_machine_secrets.this.machine_secrets
|
||||||
|
talos_version = var.talos_version
|
||||||
|
|
||||||
|
config_patches = [
|
||||||
|
templatefile("${path.module}/talos-patches/control-plane.yaml", {
|
||||||
|
cluster_name = local.cluster_name
|
||||||
|
node_name = hcloud_server.control_plane[count.index].name
|
||||||
|
is_ha = true
|
||||||
|
is_first_cp = count.index == 0
|
||||||
|
etcd_peers = [for i in range(3) : "10.0.1.${10 + i}"]
|
||||||
|
floating_ip = hcloud_floating_ip.control_plane.ip_address
|
||||||
|
})
|
||||||
|
]
|
||||||
|
|
||||||
|
depends_on = [
|
||||||
|
hcloud_server.control_plane,
|
||||||
|
hcloud_floating_ip_assignment.control_plane
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Worker configuration
|
||||||
|
data "talos_machine_configuration" "worker" {
|
||||||
|
count = 1
|
||||||
|
cluster_name = local.cluster_name
|
||||||
|
machine_type = "worker"
|
||||||
|
cluster_endpoint = "https://${hcloud_floating_ip.control_plane.ip_address}:6443"
|
||||||
|
machine_secrets = talos_machine_secrets.this.machine_secrets
|
||||||
|
talos_version = var.talos_version
|
||||||
|
|
||||||
|
config_patches = [
|
||||||
|
templatefile("${path.module}/talos-patches/worker.yaml", {
|
||||||
|
cluster_name = local.cluster_name
|
||||||
|
node_name = hcloud_server.worker[count.index].name
|
||||||
|
})
|
||||||
|
]
|
||||||
|
|
||||||
|
depends_on = [
|
||||||
|
hcloud_server.worker,
|
||||||
|
hcloud_floating_ip_assignment.control_plane
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# APPLY TALOS CONFIGURATION
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
resource "talos_machine_configuration_apply" "control_plane" {
|
||||||
|
count = 3
|
||||||
|
client_configuration = talos_machine_secrets.this.client_configuration
|
||||||
|
machine_configuration_input = data.talos_machine_configuration.control_plane[count.index].machine_configuration
|
||||||
|
endpoint = hcloud_server.control_plane[count.index].ipv4_address
|
||||||
|
node = hcloud_server.control_plane[count.index].ipv4_address
|
||||||
|
|
||||||
|
depends_on = [
|
||||||
|
hcloud_server_network.control_plane,
|
||||||
|
data.talos_machine_configuration.control_plane
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "talos_machine_configuration_apply" "worker" {
|
||||||
|
count = 1
|
||||||
|
client_configuration = talos_machine_secrets.this.client_configuration
|
||||||
|
machine_configuration_input = data.talos_machine_configuration.worker[count.index].machine_configuration
|
||||||
|
endpoint = hcloud_server.worker[count.index].ipv4_address
|
||||||
|
node = hcloud_server.worker[count.index].ipv4_address
|
||||||
|
|
||||||
|
depends_on = [
|
||||||
|
hcloud_server_network.worker,
|
||||||
|
data.talos_machine_configuration.worker,
|
||||||
|
talos_machine_configuration_apply.control_plane
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# BOOTSTRAP KUBERNETES
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
resource "talos_machine_bootstrap" "this" {
|
||||||
|
client_configuration = talos_machine_secrets.this.client_configuration
|
||||||
|
node = hcloud_server.control_plane[0].ipv4_address
|
||||||
|
|
||||||
|
depends_on = [
|
||||||
|
talos_machine_configuration_apply.control_plane,
|
||||||
|
talos_machine_configuration_apply.worker
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# GET KUBECONFIG
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
resource "talos_cluster_kubeconfig" "this" {
|
||||||
|
client_configuration = talos_machine_secrets.this.client_configuration
|
||||||
|
node = hcloud_server.control_plane[0].ipv4_address
|
||||||
|
|
||||||
|
depends_on = [talos_machine_bootstrap.this]
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# SAVE CONFIGURATIONS
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
resource "local_sensitive_file" "kubeconfig" {
|
||||||
|
# Replace the internal hostname with the floating IP for external access
|
||||||
|
content = replace(
|
||||||
|
talos_cluster_kubeconfig.this.kubeconfig_raw,
|
||||||
|
"https://${local.cluster_name}.local:6443",
|
||||||
|
"https://${hcloud_floating_ip.control_plane.ip_address}:6443"
|
||||||
|
)
|
||||||
|
filename = "${path.root}/kubeconfig"
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "local_sensitive_file" "talosconfig" {
|
||||||
|
content = data.talos_client_configuration.this.talos_config
|
||||||
|
filename = "${path.root}/talosconfig"
|
||||||
|
}
|
||||||
153
aula-08/outputs.tf
Normal file
153
aula-08/outputs.tf
Normal file
@@ -0,0 +1,153 @@
|
|||||||
|
############################################################
|
||||||
|
# Outputs for Hetzner Talos Kubernetes Cluster
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
# Cluster Information
|
||||||
|
output "cluster_name" {
|
||||||
|
description = "The name of the Kubernetes cluster"
|
||||||
|
value = local.cluster_name
|
||||||
|
}
|
||||||
|
|
||||||
|
output "cluster_id" {
|
||||||
|
description = "The unique identifier for the cluster"
|
||||||
|
value = random_string.cluster_id.result
|
||||||
|
}
|
||||||
|
|
||||||
|
# Network Information
|
||||||
|
output "network_id" {
|
||||||
|
description = "The ID of the cluster's private network"
|
||||||
|
value = hcloud_network.cluster.id
|
||||||
|
}
|
||||||
|
|
||||||
|
output "network_cidr" {
|
||||||
|
description = "The CIDR range of the cluster network"
|
||||||
|
value = hcloud_network_subnet.cluster.ip_range
|
||||||
|
}
|
||||||
|
|
||||||
|
# Control Plane Information
|
||||||
|
output "control_plane_ip" {
|
||||||
|
description = "Public IP address of the control plane"
|
||||||
|
value = hcloud_floating_ip.control_plane.ip_address
|
||||||
|
}
|
||||||
|
|
||||||
|
output "control_plane_private_ips" {
|
||||||
|
description = "Private IP addresses of control plane nodes"
|
||||||
|
value = [for cp in hcloud_server_network.control_plane : cp.ip]
|
||||||
|
}
|
||||||
|
|
||||||
|
output "control_plane_ids" {
|
||||||
|
description = "Server IDs of control plane nodes"
|
||||||
|
value = [for cp in hcloud_server.control_plane : cp.id]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Worker Nodes Information
|
||||||
|
output "worker_ips" {
|
||||||
|
description = "Public IP addresses of worker nodes"
|
||||||
|
value = [for w in hcloud_server.worker : w.ipv4_address]
|
||||||
|
}
|
||||||
|
|
||||||
|
output "worker_private_ips" {
|
||||||
|
description = "Private IP addresses of worker nodes"
|
||||||
|
value = [for w in hcloud_server_network.worker : w.ip]
|
||||||
|
}
|
||||||
|
|
||||||
|
output "worker_ids" {
|
||||||
|
description = "Server IDs of worker nodes"
|
||||||
|
value = [for w in hcloud_server.worker : w.id]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Kubernetes Access
|
||||||
|
output "kubeconfig_path" {
|
||||||
|
description = "Path to the generated kubeconfig file"
|
||||||
|
value = local_sensitive_file.kubeconfig.filename
|
||||||
|
}
|
||||||
|
|
||||||
|
output "talosconfig_path" {
|
||||||
|
description = "Path to the generated talosconfig file"
|
||||||
|
value = local_sensitive_file.talosconfig.filename
|
||||||
|
}
|
||||||
|
|
||||||
|
# API Endpoints
|
||||||
|
output "kubernetes_api_endpoint" {
|
||||||
|
description = "Kubernetes API server endpoint"
|
||||||
|
value = "https://${hcloud_floating_ip.control_plane.ip_address}:6443"
|
||||||
|
}
|
||||||
|
|
||||||
|
output "talos_api_endpoint" {
|
||||||
|
description = "Talos API endpoint for management"
|
||||||
|
value = "https://${hcloud_floating_ip.control_plane.ip_address}:50000"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Cost Information
|
||||||
|
output "estimated_monthly_cost" {
|
||||||
|
description = "Estimated monthly cost for the infrastructure (EUR)"
|
||||||
|
value = {
|
||||||
|
control_plane = 3 * 3.79 # 3x CAX11
|
||||||
|
worker = 1 * 3.79 # 1x CAX11
|
||||||
|
floating_ip = 3.00 # Floating IPv4
|
||||||
|
total = (4 * 3.79) + 3.00 # ~€18.16
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Connection Instructions
|
||||||
|
output "connection_instructions" {
|
||||||
|
description = "Instructions for connecting to the cluster"
|
||||||
|
value = <<-EOT
|
||||||
|
|
||||||
|
====================================
|
||||||
|
Kubernetes Cluster Ready!
|
||||||
|
====================================
|
||||||
|
|
||||||
|
1. Configure kubectl:
|
||||||
|
export KUBECONFIG=${local_sensitive_file.kubeconfig.filename}
|
||||||
|
kubectl get nodes
|
||||||
|
|
||||||
|
2. Configure talosctl:
|
||||||
|
export TALOSCONFIG=${local_sensitive_file.talosconfig.filename}
|
||||||
|
talosctl --nodes ${hcloud_floating_ip.control_plane.ip_address} health
|
||||||
|
|
||||||
|
3. Access Kubernetes API:
|
||||||
|
${"https://${hcloud_floating_ip.control_plane.ip_address}:6443"}
|
||||||
|
|
||||||
|
4. Nodes:
|
||||||
|
Control Plane: 3x CAX11 (ARM64)
|
||||||
|
Workers: 1x CAX11 (ARM64)
|
||||||
|
|
||||||
|
5. Total Monthly Cost: ~€18/month
|
||||||
|
|
||||||
|
====================================
|
||||||
|
EOT
|
||||||
|
}
|
||||||
|
|
||||||
|
# Cluster Autoscaler Configuration
|
||||||
|
output "autoscaler_worker_config" {
|
||||||
|
description = "Worker machine config for cluster autoscaler (base64)"
|
||||||
|
value = base64encode(data.talos_machine_configuration.worker[0].machine_configuration)
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
output "autoscaler_image_id" {
|
||||||
|
description = "Talos image ID for cluster autoscaler"
|
||||||
|
value = var.talos_image_id
|
||||||
|
}
|
||||||
|
|
||||||
|
# Resource Labels
|
||||||
|
output "resource_labels" {
|
||||||
|
description = "Labels applied to all resources"
|
||||||
|
value = local.common_labels
|
||||||
|
}
|
||||||
|
|
||||||
|
# Firewall Information
|
||||||
|
output "firewall_id" {
|
||||||
|
description = "ID of the firewall protecting the cluster"
|
||||||
|
value = hcloud_firewall.cluster.id
|
||||||
|
}
|
||||||
|
|
||||||
|
# SSH Key Information (for autoscaler)
|
||||||
|
output "ssh_key_name" {
|
||||||
|
description = "Name of the SSH key used by the cluster"
|
||||||
|
value = length(local.ssh_key_matches) > 0 ? [
|
||||||
|
for key in data.hcloud_ssh_keys.all.ssh_keys : key.name
|
||||||
|
if key.id == local.ssh_key_matches[0]
|
||||||
|
][0] : "${local.cluster_name}-admin"
|
||||||
|
}
|
||||||
361
aula-08/setup.sh
Executable file
361
aula-08/setup.sh
Executable file
@@ -0,0 +1,361 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# Aula 08 - OpenTofu + Talos + Hetzner Cloud
|
||||||
|
# Provisiona cluster Kubernetes Talos em HA
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Cores para output
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m' # No Color
|
||||||
|
|
||||||
|
# Diretório do script
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
cd "$SCRIPT_DIR"
|
||||||
|
|
||||||
|
# Funções de log
|
||||||
|
log_info() {
|
||||||
|
echo -e "${BLUE}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_success() {
|
||||||
|
echo -e "${GREEN}[OK]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_warn() {
|
||||||
|
echo -e "${YELLOW}[WARN]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_error() {
|
||||||
|
echo -e "${RED}[ERROR]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# VERIFICAÇÃO DE PRÉ-REQUISITOS
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "============================================"
|
||||||
|
echo " Aula 08 - Cluster Talos via OpenTofu"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
log_info "Verificando pré-requisitos..."
|
||||||
|
|
||||||
|
# Verificar OpenTofu
|
||||||
|
if ! command -v tofu &> /dev/null; then
|
||||||
|
log_error "OpenTofu não encontrado!"
|
||||||
|
echo ""
|
||||||
|
echo "Instale o OpenTofu:"
|
||||||
|
echo " brew install opentofu # macOS"
|
||||||
|
echo " snap install opentofu # Linux"
|
||||||
|
echo ""
|
||||||
|
echo "Mais info: https://opentofu.org/docs/intro/install/"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
log_success "OpenTofu $(tofu version | head -1)"
|
||||||
|
|
||||||
|
# Verificar talosctl
|
||||||
|
if ! command -v talosctl &> /dev/null; then
|
||||||
|
log_error "talosctl não encontrado!"
|
||||||
|
echo ""
|
||||||
|
echo "Instale o talosctl:"
|
||||||
|
echo " brew install siderolabs/tap/talosctl # macOS"
|
||||||
|
echo " curl -sL https://talos.dev/install | sh # Linux"
|
||||||
|
echo ""
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
log_success "talosctl $(talosctl version --client 2>/dev/null | grep 'Client' | awk '{print $2}' || echo 'instalado')"
|
||||||
|
|
||||||
|
# Verificar kubectl
|
||||||
|
if ! command -v kubectl &> /dev/null; then
|
||||||
|
log_error "kubectl não encontrado!"
|
||||||
|
echo ""
|
||||||
|
echo "Instale o kubectl:"
|
||||||
|
echo " brew install kubectl # macOS"
|
||||||
|
echo " snap install kubectl # Linux"
|
||||||
|
echo ""
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
log_success "kubectl $(kubectl version --client -o yaml 2>/dev/null | grep gitVersion | awk '{print $2}' || echo 'instalado')"
|
||||||
|
|
||||||
|
# Verificar hcloud CLI (opcional, mas útil)
|
||||||
|
if command -v hcloud &> /dev/null; then
|
||||||
|
log_success "hcloud CLI instalado"
|
||||||
|
else
|
||||||
|
log_warn "hcloud CLI não instalado (opcional)"
|
||||||
|
echo " Para listar imagens: brew install hcloud"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# COLETA DE CREDENCIAIS
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
# Verificar se terraform.tfvars já existe
|
||||||
|
if [ -f "terraform.tfvars" ]; then
|
||||||
|
log_warn "terraform.tfvars já existe!"
|
||||||
|
read -p "Deseja sobrescrever? (s/N): " overwrite
|
||||||
|
if [[ ! "$overwrite" =~ ^[Ss]$ ]]; then
|
||||||
|
log_info "Usando terraform.tfvars existente"
|
||||||
|
SKIP_CREDENTIALS=true
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$SKIP_CREDENTIALS" != "true" ]; then
|
||||||
|
echo "============================================"
|
||||||
|
echo " Configuração de Credenciais"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Token Hetzner
|
||||||
|
echo "1. Token da API Hetzner Cloud"
|
||||||
|
echo " Obtenha em: https://console.hetzner.cloud/projects/*/security/tokens"
|
||||||
|
echo ""
|
||||||
|
read -sp " Digite o token: " HCLOUD_TOKEN
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
if [ -z "$HCLOUD_TOKEN" ]; then
|
||||||
|
log_error "Token não pode ser vazio!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
log_success "Token configurado"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# SSH Key
|
||||||
|
echo "2. Chave SSH pública"
|
||||||
|
DEFAULT_SSH_KEY="$HOME/.ssh/id_rsa.pub"
|
||||||
|
if [ -f "$DEFAULT_SSH_KEY" ]; then
|
||||||
|
echo " Encontrada: $DEFAULT_SSH_KEY"
|
||||||
|
read -p " Usar esta chave? (S/n): " use_default
|
||||||
|
if [[ ! "$use_default" =~ ^[Nn]$ ]]; then
|
||||||
|
SSH_PUBLIC_KEY=$(cat "$DEFAULT_SSH_KEY")
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "$SSH_PUBLIC_KEY" ]; then
|
||||||
|
read -p " Caminho da chave pública: " ssh_path
|
||||||
|
if [ -f "$ssh_path" ]; then
|
||||||
|
SSH_PUBLIC_KEY=$(cat "$ssh_path")
|
||||||
|
else
|
||||||
|
log_error "Arquivo não encontrado: $ssh_path"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
log_success "Chave SSH configurada"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# ID da imagem Talos
|
||||||
|
echo "3. ID da imagem Talos (snapshot da aula-07)"
|
||||||
|
echo " Para listar: hcloud image list --type snapshot"
|
||||||
|
echo ""
|
||||||
|
read -p " Digite o ID da imagem: " TALOS_IMAGE_ID
|
||||||
|
|
||||||
|
if [ -z "$TALOS_IMAGE_ID" ]; then
|
||||||
|
log_error "ID da imagem não pode ser vazio!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validar que é número
|
||||||
|
if ! [[ "$TALOS_IMAGE_ID" =~ ^[0-9]+$ ]]; then
|
||||||
|
log_error "ID deve ser um número!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
log_success "Image ID: $TALOS_IMAGE_ID"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Criar terraform.tfvars
|
||||||
|
log_info "Criando terraform.tfvars..."
|
||||||
|
cat > terraform.tfvars << EOF
|
||||||
|
# Gerado automaticamente por setup.sh
|
||||||
|
# $(date)
|
||||||
|
|
||||||
|
hcloud_token = "$HCLOUD_TOKEN"
|
||||||
|
ssh_public_key = "$SSH_PUBLIC_KEY"
|
||||||
|
talos_image_id = $TALOS_IMAGE_ID
|
||||||
|
|
||||||
|
environment = "workshop"
|
||||||
|
enable_monitoring = true
|
||||||
|
EOF
|
||||||
|
log_success "terraform.tfvars criado"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# INICIALIZAÇÃO DO OPENTOFU
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
echo "============================================"
|
||||||
|
echo " Inicializando OpenTofu"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
log_info "Executando tofu init..."
|
||||||
|
tofu init
|
||||||
|
|
||||||
|
log_success "OpenTofu inicializado"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# PLANEJAMENTO
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
echo "============================================"
|
||||||
|
echo " Planejando Infraestrutura"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
log_info "Executando tofu plan..."
|
||||||
|
tofu plan -out=tfplan
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
log_success "Plano criado!"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Mostrar resumo
|
||||||
|
echo "============================================"
|
||||||
|
echo " Recursos a serem criados:"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
echo " - 4x CAX11 (3 CP + 1 Worker) = 4 x €3.79 = €15.16"
|
||||||
|
echo " - 1x Floating IPv4 = €3.00"
|
||||||
|
echo " - Rede/Firewall/Placement = Grátis"
|
||||||
|
echo ""
|
||||||
|
echo " Custo estimado: ~€18.16/mês (sem VAT)"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# APLICAÇÃO
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
read -p "Deseja aplicar o plano? (s/N): " apply
|
||||||
|
if [[ ! "$apply" =~ ^[Ss]$ ]]; then
|
||||||
|
log_warn "Operação cancelada pelo usuário"
|
||||||
|
echo ""
|
||||||
|
echo "Para aplicar manualmente:"
|
||||||
|
echo " tofu apply tfplan"
|
||||||
|
echo ""
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
log_info "Aplicando infraestrutura..."
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
tofu apply tfplan
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
log_success "Infraestrutura provisionada!"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# CONFIGURAÇÃO PÓS-DEPLOY
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
echo "============================================"
|
||||||
|
echo " Configuração Pós-Deploy"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Aguardar cluster ficar pronto
|
||||||
|
log_info "Aguardando cluster Talos ficar pronto..."
|
||||||
|
sleep 10
|
||||||
|
|
||||||
|
# Configurar talosctl
|
||||||
|
if [ -f "talosconfig" ]; then
|
||||||
|
log_info "Configurando talosctl..."
|
||||||
|
export TALOSCONFIG="$SCRIPT_DIR/talosconfig"
|
||||||
|
|
||||||
|
# Obter IP do control plane
|
||||||
|
CP_IP=$(tofu output -raw control_plane_ip 2>/dev/null || echo "")
|
||||||
|
|
||||||
|
if [ -n "$CP_IP" ]; then
|
||||||
|
log_info "Aguardando API do Talos em $CP_IP..."
|
||||||
|
|
||||||
|
# Tentar health check (pode demorar alguns minutos)
|
||||||
|
for i in {1..30}; do
|
||||||
|
if talosctl --talosconfig talosconfig -n "$CP_IP" health --wait-timeout 10s 2>/dev/null; then
|
||||||
|
log_success "Cluster Talos saudável!"
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
echo -n "."
|
||||||
|
sleep 10
|
||||||
|
done
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Configurar kubectl
|
||||||
|
if [ -f "kubeconfig" ]; then
|
||||||
|
log_info "Configurando kubectl..."
|
||||||
|
export KUBECONFIG="$SCRIPT_DIR/kubeconfig"
|
||||||
|
|
||||||
|
log_info "Aguardando nodes ficarem Ready..."
|
||||||
|
for i in {1..30}; do
|
||||||
|
if kubectl get nodes 2>/dev/null | grep -q "Ready"; then
|
||||||
|
log_success "Nodes prontos!"
|
||||||
|
kubectl get nodes
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
echo -n "."
|
||||||
|
sleep 10
|
||||||
|
done
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
############################################################
|
||||||
|
# RESUMO FINAL
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
echo "============================================"
|
||||||
|
echo " Cluster Provisionado com Sucesso!"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Mostrar outputs
|
||||||
|
echo "Endpoints:"
|
||||||
|
tofu output -raw kubernetes_api_endpoint 2>/dev/null && echo "" || true
|
||||||
|
tofu output -raw talos_api_endpoint 2>/dev/null && echo "" || true
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "Arquivos gerados:"
|
||||||
|
echo " - kubeconfig : Configuração do kubectl"
|
||||||
|
echo " - talosconfig : Configuração do talosctl"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "Comandos úteis:"
|
||||||
|
echo ""
|
||||||
|
echo " # Usar kubectl com este cluster"
|
||||||
|
echo " export KUBECONFIG=$SCRIPT_DIR/kubeconfig"
|
||||||
|
echo " kubectl get nodes"
|
||||||
|
echo ""
|
||||||
|
echo " # Usar talosctl com este cluster"
|
||||||
|
echo " export TALOSCONFIG=$SCRIPT_DIR/talosconfig"
|
||||||
|
echo " talosctl -n <IP> health"
|
||||||
|
echo ""
|
||||||
|
echo " # Ver outputs do OpenTofu"
|
||||||
|
echo " tofu output"
|
||||||
|
echo ""
|
||||||
|
echo " # Destruir infraestrutura (CUIDADO!)"
|
||||||
|
echo " ./cleanup.sh"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
log_success "Setup concluído!"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "============================================"
|
||||||
|
echo " Próximo passo (opcional)"
|
||||||
|
echo "============================================"
|
||||||
|
echo ""
|
||||||
|
echo " Para habilitar autoscaling de 1-5 workers:"
|
||||||
|
echo " ./install-autoscaler.sh"
|
||||||
|
echo ""
|
||||||
63
aula-08/talos-patches/control-plane.yaml
Normal file
63
aula-08/talos-patches/control-plane.yaml
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
# Talos Control Plane Configuration Patch
|
||||||
|
# Base configuration for HA control plane
|
||||||
|
machine:
|
||||||
|
# Network configuration for Floating IP
|
||||||
|
network:
|
||||||
|
interfaces:
|
||||||
|
- interface: eth0
|
||||||
|
dhcp: true
|
||||||
|
%{ if is_first_cp ~}
|
||||||
|
addresses:
|
||||||
|
- ${floating_ip}/32
|
||||||
|
%{ endif ~}
|
||||||
|
|
||||||
|
# Network optimizations
|
||||||
|
sysctls:
|
||||||
|
net.core.somaxconn: "8192"
|
||||||
|
net.ipv4.tcp_max_syn_backlog: "8192"
|
||||||
|
net.core.netdev_max_backlog: "5000"
|
||||||
|
net.ipv4.ip_local_port_range: "1024 65535"
|
||||||
|
net.ipv4.tcp_tw_reuse: "1"
|
||||||
|
net.ipv4.tcp_fin_timeout: "15"
|
||||||
|
fs.file-max: "2097152"
|
||||||
|
fs.inotify.max_user_watches: "524288"
|
||||||
|
vm.max_map_count: "262144"
|
||||||
|
|
||||||
|
# Kubelet configuration
|
||||||
|
kubelet:
|
||||||
|
extraArgs:
|
||||||
|
max-pods: "110"
|
||||||
|
kube-reserved: "cpu=200m,memory=300Mi"
|
||||||
|
system-reserved: "cpu=200m,memory=200Mi"
|
||||||
|
|
||||||
|
# Time sync
|
||||||
|
time:
|
||||||
|
servers:
|
||||||
|
- ntp1.hetzner.de
|
||||||
|
- ntp2.hetzner.com
|
||||||
|
- ntp3.hetzner.net
|
||||||
|
|
||||||
|
# Features
|
||||||
|
features:
|
||||||
|
rbac: true
|
||||||
|
stableHostname: true
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
# Control plane configuration
|
||||||
|
controlPlane:
|
||||||
|
endpoint: https://${floating_ip}:6443
|
||||||
|
|
||||||
|
# Network configuration
|
||||||
|
network:
|
||||||
|
cni:
|
||||||
|
name: flannel
|
||||||
|
dnsDomain: cluster.local
|
||||||
|
serviceSubnets:
|
||||||
|
- 10.96.0.0/12
|
||||||
|
podSubnets:
|
||||||
|
- 10.244.0.0/16
|
||||||
|
|
||||||
|
# Etcd configuration for HA
|
||||||
|
etcd:
|
||||||
|
advertisedSubnets:
|
||||||
|
- 10.0.1.0/24
|
||||||
44
aula-08/talos-patches/worker.yaml
Normal file
44
aula-08/talos-patches/worker.yaml
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
# Talos Worker Configuration Patch
|
||||||
|
# Base configuration for worker nodes
|
||||||
|
machine:
|
||||||
|
# Network optimizations
|
||||||
|
sysctls:
|
||||||
|
net.core.somaxconn: "8192"
|
||||||
|
net.ipv4.tcp_max_syn_backlog: "8192"
|
||||||
|
net.core.netdev_max_backlog: "5000"
|
||||||
|
net.ipv4.ip_local_port_range: "1024 65535"
|
||||||
|
net.ipv4.tcp_tw_reuse: "1"
|
||||||
|
net.ipv4.tcp_fin_timeout: "15"
|
||||||
|
fs.file-max: "2097152"
|
||||||
|
fs.inotify.max_user_watches: "524288"
|
||||||
|
vm.max_map_count: "262144"
|
||||||
|
|
||||||
|
# Kubelet configuration
|
||||||
|
kubelet:
|
||||||
|
extraArgs:
|
||||||
|
max-pods: "110"
|
||||||
|
kube-reserved: "cpu=100m,memory=200Mi"
|
||||||
|
system-reserved: "cpu=100m,memory=100Mi"
|
||||||
|
|
||||||
|
# Time sync
|
||||||
|
time:
|
||||||
|
servers:
|
||||||
|
- ntp1.hetzner.de
|
||||||
|
- ntp2.hetzner.com
|
||||||
|
- ntp3.hetzner.net
|
||||||
|
|
||||||
|
# Features
|
||||||
|
features:
|
||||||
|
rbac: true
|
||||||
|
stableHostname: true
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
# Network configuration
|
||||||
|
network:
|
||||||
|
cni:
|
||||||
|
name: flannel
|
||||||
|
dnsDomain: cluster.local
|
||||||
|
serviceSubnets:
|
||||||
|
- 10.96.0.0/12
|
||||||
|
podSubnets:
|
||||||
|
- 10.244.0.0/16
|
||||||
53
aula-08/terraform.tfvars.example
Normal file
53
aula-08/terraform.tfvars.example
Normal file
@@ -0,0 +1,53 @@
|
|||||||
|
# Exemplo de arquivo terraform.tfvars
|
||||||
|
# Copie este arquivo para terraform.tfvars e preencha com seus valores
|
||||||
|
|
||||||
|
# ============================================
|
||||||
|
# CREDENCIAIS (OBRIGATÓRIO)
|
||||||
|
# ============================================
|
||||||
|
|
||||||
|
# Token da API Hetzner Cloud
|
||||||
|
# Obtenha em: https://console.hetzner.cloud/projects/[PROJECT_ID]/security/tokens
|
||||||
|
hcloud_token = "seu_token_hetzner_aqui"
|
||||||
|
|
||||||
|
# Chave SSH pública para acesso emergencial aos nodes
|
||||||
|
# Obtenha com: cat ~/.ssh/id_rsa.pub
|
||||||
|
ssh_public_key = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC... seu@email.com"
|
||||||
|
|
||||||
|
# ID da imagem Talos customizada (criada na aula-07)
|
||||||
|
# Obtenha com: hcloud image list --type snapshot
|
||||||
|
talos_image_id = 123456789
|
||||||
|
|
||||||
|
# ============================================
|
||||||
|
# CONFIGURAÇÃO DO CLUSTER
|
||||||
|
# ============================================
|
||||||
|
|
||||||
|
# Ambiente (prod, staging, dev)
|
||||||
|
environment = "workshop"
|
||||||
|
|
||||||
|
# Versão do Talos OS (opcional - default: v1.11.2)
|
||||||
|
# talos_version = "v1.11.2"
|
||||||
|
|
||||||
|
# ============================================
|
||||||
|
# MONITORAMENTO
|
||||||
|
# ============================================
|
||||||
|
|
||||||
|
# Habilitar Victoria Metrics
|
||||||
|
enable_monitoring = true
|
||||||
|
|
||||||
|
# ============================================
|
||||||
|
# AUTO-SCALING
|
||||||
|
# ============================================
|
||||||
|
|
||||||
|
# Thresholds de CPU para scaling
|
||||||
|
scale_up_threshold = 70 # Escala quando CPU > 70%
|
||||||
|
scale_down_threshold = 30 # Reduz quando CPU < 30%
|
||||||
|
|
||||||
|
# ============================================
|
||||||
|
# LABELS CUSTOMIZADAS (OPCIONAL)
|
||||||
|
# ============================================
|
||||||
|
|
||||||
|
# Labels adicionais para todos os recursos
|
||||||
|
custom_labels = {
|
||||||
|
projeto = "k8s-base"
|
||||||
|
responsavel = "devops"
|
||||||
|
}
|
||||||
38
aula-08/test-autoscaler.yaml
Normal file
38
aula-08/test-autoscaler.yaml
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
############################################################
|
||||||
|
# Deployment de teste para o Cluster Autoscaler
|
||||||
|
# Cria pods que consomem recursos para forçar scale-up
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: test-autoscaler
|
||||||
|
namespace: default
|
||||||
|
spec:
|
||||||
|
replicas: 10
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: test-autoscaler
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: test-autoscaler
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: nginx
|
||||||
|
image: nginx:alpine
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 400m # Cada pod pede 0.4 CPU
|
||||||
|
memory: 512Mi # Cada pod pede 512MB RAM
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 640Mi
|
||||||
|
# Evita que pods rodem nos control-planes
|
||||||
|
affinity:
|
||||||
|
nodeAffinity:
|
||||||
|
requiredDuringSchedulingIgnoredDuringExecution:
|
||||||
|
nodeSelectorTerms:
|
||||||
|
- matchExpressions:
|
||||||
|
- key: node-role.kubernetes.io/control-plane
|
||||||
|
operator: DoesNotExist
|
||||||
63
aula-08/variables.tf
Normal file
63
aula-08/variables.tf
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
############################################################
|
||||||
|
# Variables for Hetzner Talos Kubernetes Cluster
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
# Authentication
|
||||||
|
variable "hcloud_token" {
|
||||||
|
type = string
|
||||||
|
description = "Hetzner Cloud API token"
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
# Cluster Configuration
|
||||||
|
variable "environment" {
|
||||||
|
type = string
|
||||||
|
description = "Environment name (prod, staging, dev)"
|
||||||
|
default = "prod"
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# SSH Configuration
|
||||||
|
variable "ssh_public_key" {
|
||||||
|
type = string
|
||||||
|
description = "Public SSH key for emergency access to nodes"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Talos Configuration
|
||||||
|
variable "talos_image_id" {
|
||||||
|
type = number
|
||||||
|
description = "ID da imagem Talos customizada na Hetzner (criada na aula-07). Obtenha com: hcloud image list --type snapshot"
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "talos_version" {
|
||||||
|
type = string
|
||||||
|
description = "Talos version to use"
|
||||||
|
default = "v1.11.2" # Match the official image version
|
||||||
|
}
|
||||||
|
|
||||||
|
# Monitoring Configuration
|
||||||
|
variable "enable_monitoring" {
|
||||||
|
type = bool
|
||||||
|
description = "Enable Victoria Metrics monitoring stack"
|
||||||
|
default = true
|
||||||
|
}
|
||||||
|
|
||||||
|
# Auto-scaling Configuration
|
||||||
|
variable "scale_up_threshold" {
|
||||||
|
type = number
|
||||||
|
description = "CPU percentage to trigger scale up"
|
||||||
|
default = 70
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "scale_down_threshold" {
|
||||||
|
type = number
|
||||||
|
description = "CPU percentage to trigger scale down"
|
||||||
|
default = 30
|
||||||
|
}
|
||||||
|
|
||||||
|
# Tags for resource management
|
||||||
|
variable "custom_labels" {
|
||||||
|
type = map(string)
|
||||||
|
description = "Custom labels to add to all resources"
|
||||||
|
default = {}
|
||||||
|
}
|
||||||
35
aula-08/versions.tf
Normal file
35
aula-08/versions.tf
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
############################################################
|
||||||
|
# OpenTofu Version and Provider Requirements
|
||||||
|
# Compatible with OpenTofu >= 1.6.0
|
||||||
|
############################################################
|
||||||
|
|
||||||
|
terraform {
|
||||||
|
required_version = ">= 1.6.0"
|
||||||
|
|
||||||
|
required_providers {
|
||||||
|
hcloud = {
|
||||||
|
source = "hetznercloud/hcloud"
|
||||||
|
version = "~> 1.45"
|
||||||
|
}
|
||||||
|
|
||||||
|
talos = {
|
||||||
|
source = "siderolabs/talos"
|
||||||
|
version = "0.6.0"
|
||||||
|
}
|
||||||
|
|
||||||
|
random = {
|
||||||
|
source = "hashicorp/random"
|
||||||
|
version = "~> 3.5"
|
||||||
|
}
|
||||||
|
|
||||||
|
null = {
|
||||||
|
source = "hashicorp/null"
|
||||||
|
version = "~> 3.2"
|
||||||
|
}
|
||||||
|
|
||||||
|
local = {
|
||||||
|
source = "hashicorp/local"
|
||||||
|
version = "~> 2.4"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user