aula-07 e aula-08: Cluster Talos HA na Hetzner com Autoscaler
aula-07: Criação de imagem Talos customizada na Hetzner Cloud - Usa Talos Factory para gerar imagem ARM64/AMD64 - Inclui extensões: qemu-guest-agent, hcloud aula-08: Provisionamento de cluster Kubernetes Talos via OpenTofu - 3 Control Planes em HA (CAX11 ARM64) - 1 Worker Node (CAX11 ARM64) - Rede privada, Floating IP, Firewall - Cluster Autoscaler para Hetzner (0-5 workers extras) - Setup interativo com validação de pré-requisitos - Custo estimado: ~€18/mês (base) Também inclui: - .gitignore para ignorar arquivos sensíveis - CLAUDE.md com instruções do projeto
This commit is contained in:
30
.gitignore
vendored
Normal file
30
.gitignore
vendored
Normal file
@@ -0,0 +1,30 @@
|
||||
# OpenTofu / Terraform
|
||||
**/.terraform/
|
||||
**/.tofu/
|
||||
**/*.tfstate
|
||||
**/*.tfstate.*
|
||||
**/tfplan
|
||||
**/tfplan.out
|
||||
**/.terraform.lock.hcl
|
||||
|
||||
# Credenciais e configs sensíveis
|
||||
**/terraform.tfvars
|
||||
**/kubeconfig
|
||||
**/kubeconfig-*
|
||||
**/talosconfig
|
||||
**/*.pem
|
||||
**/*.key
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Editor
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
.idea/
|
||||
.vscode/
|
||||
|
||||
# Node (aula-01)
|
||||
node_modules/
|
||||
114
CLAUDE.md
Normal file
114
CLAUDE.md
Normal file
@@ -0,0 +1,114 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
This is a workshop repository for teaching Docker and Kubernetes concepts, specifically focusing on container health checks and liveness probes. It contains a deliberately "buggy" Node.js app that hangs after a configurable number of requests to demonstrate how container orchestration handles unhealthy containers.
|
||||
|
||||
## Repository Structure
|
||||
|
||||
- **aula-01/**: Docker Compose lesson - basic container deployment with restart policies
|
||||
- **aula-02/**: Kubernetes lesson - deployment with liveness probes and ConfigMaps
|
||||
- **aula-03/**: Kubernetes lesson - high availability with replicas and readiness probes
|
||||
- **aula-04/**: Kubernetes lesson - NGINX Ingress with Keep Request (Lua) for zero-downtime
|
||||
- **aula-05/**: Kubernetes lesson - KEDA + Victoria Metrics for metrics-based auto-scaling
|
||||
- **aula-06/**: Kubernetes lesson - n8n deployment via Helm with Queue Mode (workers, webhooks, PostgreSQL, Redis)
|
||||
- **aula-07/**: Talos Linux - creating custom Talos image for Hetzner Cloud
|
||||
- **aula-08/**: OpenTofu - provisioning HA Talos Kubernetes cluster on Hetzner Cloud
|
||||
|
||||
## Running the Examples
|
||||
|
||||
### Aula 01 (Docker Compose)
|
||||
```bash
|
||||
cd aula-01
|
||||
docker-compose up
|
||||
```
|
||||
The app runs on port 3000. After MAX_REQUESTS (default 3), the app stops responding.
|
||||
|
||||
### Aula 02 (Kubernetes)
|
||||
```bash
|
||||
cd aula-02
|
||||
kubectl apply -f configmap.yaml
|
||||
kubectl apply -f deployment.yaml
|
||||
kubectl apply -f service.yaml
|
||||
```
|
||||
Access via NodePort 30080. The liveness probe at `/health` will detect when the app hangs and restart the container.
|
||||
|
||||
### Aula 03 (Kubernetes - High Availability)
|
||||
```bash
|
||||
cd aula-03
|
||||
kubectl apply -f configmap.yaml
|
||||
kubectl apply -f deployment.yaml
|
||||
kubectl apply -f service.yaml
|
||||
```
|
||||
Builds on Aula 02 with multiple replicas and a readiness probe. When one pod hangs, the others continue serving requests. The readiness probe removes unhealthy pods from the Service immediately, while the liveness probe restarts them.
|
||||
|
||||
### Aula 04 (Kubernetes - NGINX Ingress with Keep Request)
|
||||
Requires NGINX Ingress Controller with Lua support.
|
||||
|
||||
```bash
|
||||
cd aula-04
|
||||
kubectl apply -f configmap.yaml
|
||||
kubectl apply -f deployment.yaml
|
||||
kubectl apply -f service.yaml
|
||||
kubectl apply -f ingress-nginx.yaml
|
||||
```
|
||||
Access via NGINX Ingress. The Keep Request pattern uses Lua to hold requests when backends are unavailable, waiting up to 99s for a pod to become ready instead of returning 503 immediately. This eliminates user-visible failures during pod restarts.
|
||||
|
||||
### Aula 05 (Kubernetes - KEDA Auto-scaling)
|
||||
```bash
|
||||
cd aula-05
|
||||
./setup.sh
|
||||
```
|
||||
Installs Victoria Metrics (metrics collection), KEDA (event-driven autoscaling), and NGINX Ingress. The ScaledObject monitors metrics like unavailable pods and restart counts, automatically scaling the deployment from 5 to 30 replicas based on demand.
|
||||
|
||||
### Aula 06 (Kubernetes - n8n via Helm)
|
||||
```bash
|
||||
cd aula-06
|
||||
./setup.sh
|
||||
```
|
||||
Deploys n8n workflow automation platform via Helm chart with Queue Mode architecture: main node, workers (2-5 replicas with HPA), webhooks (1-3 replicas with HPA), PostgreSQL, and Redis. Access via http://n8n.localhost (requires NGINX Ingress).
|
||||
|
||||
### Aula 07 (Talos Linux - Custom Image)
|
||||
Follow the instructions in `aula-07/README.md` to create a custom Talos Linux image on Hetzner Cloud using Talos Factory. This is a prerequisite for Aula 08.
|
||||
|
||||
### Aula 08 (OpenTofu - Talos Cluster on Hetzner Cloud)
|
||||
```bash
|
||||
cd aula-08
|
||||
./setup.sh
|
||||
```
|
||||
Provisions a full HA Kubernetes cluster on Hetzner Cloud using OpenTofu:
|
||||
- 3x Control Plane nodes (CAX11 ARM64)
|
||||
- 1x Worker node (CAX11 ARM64)
|
||||
- Private network, Floating IP, Firewall
|
||||
- Cluster Autoscaler support (1-5 workers)
|
||||
- Estimated cost: ~€18/month (base), up to ~€33/month with max autoscaling
|
||||
|
||||
Prerequisites:
|
||||
- OpenTofu (`brew install opentofu`)
|
||||
- talosctl (`brew install siderolabs/tap/talosctl`)
|
||||
- kubectl
|
||||
- Hetzner Cloud API token
|
||||
- Talos image ID from Aula 07
|
||||
|
||||
Optional - Enable cluster autoscaling:
|
||||
```bash
|
||||
./install-autoscaler.sh
|
||||
```
|
||||
This installs the Kubernetes Cluster Autoscaler configured for Hetzner Cloud, automatically scaling workers from 1 to 5 based on pending pods.
|
||||
|
||||
To destroy the infrastructure: `./cleanup.sh`
|
||||
|
||||
## App Behavior
|
||||
|
||||
The Node.js app (`app.js`) is intentionally designed to:
|
||||
1. Accept requests normally until `MAX_REQUESTS` is reached
|
||||
2. Stop responding (hang) after the limit, simulating a crashed but running process
|
||||
3. The `/health` endpoint also stops responding when the app is "stuck"
|
||||
|
||||
This behavior demonstrates why process-level monitoring (restart: always) is insufficient and why application-level health checks (liveness probes) are necessary.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
- `MAX_REQUESTS`: Number of requests before the app hangs (default: 3)
|
||||
@@ -1,26 +0,0 @@
|
||||
# Middleware Retry - tenta outros pods quando um falha
|
||||
apiVersion: traefik.io/v1alpha1
|
||||
kind: Middleware
|
||||
metadata:
|
||||
name: retry-middleware
|
||||
spec:
|
||||
retry:
|
||||
attempts: 5 # 5 tentativas
|
||||
initialInterval: 500ms # 500ms entre ciclos
|
||||
---
|
||||
# IngressRoute
|
||||
apiVersion: traefik.io/v1alpha1
|
||||
kind: IngressRoute
|
||||
metadata:
|
||||
name: node-bugado
|
||||
spec:
|
||||
entryPoints:
|
||||
- web
|
||||
routes:
|
||||
- match: PathPrefix(`/`)
|
||||
kind: Rule
|
||||
middlewares:
|
||||
- name: retry-middleware
|
||||
services:
|
||||
- name: node-bugado
|
||||
port: 3000
|
||||
@@ -204,7 +204,7 @@ echo " # Ver todos os pods"
|
||||
echo " kubectl get pods -n n8n"
|
||||
echo ""
|
||||
echo " # Ver logs do n8n"
|
||||
echo " kubectl logs -f -l app.kubernetes.io/component=main -n n8n"
|
||||
echo " kubectl logs -f -n n8n deployment/n8n"
|
||||
echo ""
|
||||
echo " # Ver HPA (autoscaler)"
|
||||
echo " kubectl get hpa -n n8n"
|
||||
@@ -218,6 +218,12 @@ echo ""
|
||||
echo " # Fazer upgrade do helm chart"
|
||||
echo " helm upgrade --reuse-values --values --custom-values.yaml n8n community-charts/n8n --namespace n8n"
|
||||
echo ""
|
||||
echo " # Verificar historico de releases"
|
||||
echo " helm history n8n -n n8n"
|
||||
echo ""
|
||||
echo " # Fazer rollback do historico de releases"
|
||||
echo " helm rollback n8n <nº da release>"
|
||||
echo ""
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
|
||||
63
aula-07/README.md
Normal file
63
aula-07/README.md
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
criado: 2025-12-27T01:10:54-03:00
|
||||
atualizado: 2025-12-27T02:25:34-03:00
|
||||
---
|
||||
|
||||
|
||||
A Hetzner Cloud não oferece suporte ao upload de imagens personalizadas. Somente via suporte [issue 3599](https://github.com/siderolabs/talos/issues/3599#issuecomment-841172018)
|
||||
|
||||
Workaround
|
||||
1. Execute uma instância no modo de recuperação e substitua o sistema operacional pela imagem do Talos.
|
||||
2. 🚧 De a cordo com a documentacao oficial é possivel usar [o Hashicorp Packer](https://www.packer.io/docs/builders/hetzner-cloud) para preparar uma imagem. Mas a documentação oficial foi removida dos builders. E nos meus testes deu kernel panic....
|
||||
|
||||
|
||||
Passo 1 -> https://factory.talos.dev/
|
||||
|
||||
- [ ] siderolabs/amd-ucode / siderolabs/intel-ucode
|
||||
- Spectre / Meltdown (V1, V2, V4)
|
||||
- Predição de desvios
|
||||
- Leitura de memória privilegiada a partir de userland
|
||||
- Zenbleed (CVE-2023-20593)
|
||||
- CPUs AMD Zen 2
|
||||
- Vazamento de registros via execução especulativa
|
||||
- Impacta **VMs e containers**
|
||||
- Speculative Return Stack Overflow (SRSO)
|
||||
- CPUs AMD modernas
|
||||
- Jailbreak
|
||||
- [ ] siderolabs/qemu-guest-agent (Hetzner usa QEMU / KVM)
|
||||
- [ ] siderolabs/stargz-snapshotter (https://github.com/containerd/stargz-snapshotter)
|
||||
- [ ] siderolabs/util-linux-tools (lsblk, mount, findmnt)
|
||||
- [ ] siderolabs/binfmt-misc (Se for usar imagem multi-arch)
|
||||
- siderolabs/tailscale OU cloudflared -> https://spot.rackspace.com/
|
||||
- zfs -> Se for Baremetal (~50% mais rapido que ext4)
|
||||
|
||||
bootloader: dual-boot
|
||||
|
||||
https://factory.talos.dev/?arch=amd64&board=undefined&bootloader=dual-boot&cmdline-set=true&extensions=-&extensions=siderolabs%2Famd-ucode&extensions=siderolabs%2Fbinfmt-misc&extensions=siderolabs%2Fintel-ucode&extensions=siderolabs%2Fqemu-guest-agent&extensions=siderolabs%2Fstargz-snapshotter&extensions=siderolabs%2Futil-linux-tools&platform=hcloud&secureboot=undefined&target=cloud&version=1.12.0
|
||||
|
||||
|
||||
|
||||
```bash
|
||||
# Testar se vc entrou em Rescue mode
|
||||
df
|
||||
|
||||
### Resultado será tipo:
|
||||
# Filesystem 1K-blocks Used Available Use% Mounted on
|
||||
# udev 987432 0 987432 0% /dev
|
||||
# 213.133.99.101:/nfs 308577696 247015616 45817536 85% /root/.oldroot/nfs
|
||||
# overlay 995672 8340 987332 1% /
|
||||
# tmpfs 995672 0 995672 0% /dev/shm
|
||||
# tmpfs 398272 572 397700 1% /run
|
||||
# tmpfs 5120 0 5120 0% /run/lock
|
||||
# tmpfs 199132 0 199132 0% /run/user/0
|
||||
|
||||
# Baixar a imagem do Talos
|
||||
cd /tmp
|
||||
wget -O /tmp/talos.raw.xz https://factory.talos.dev/image/c4f17c623d4ac547a243489f1b3285afd64a76b491b1c5c24ef6363587cef55f/v1.12.0/hcloud-amd64.raw.xz
|
||||
|
||||
# Escrever o sistema (Vai demorar uns 4 a 5 minutos)
|
||||
xz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync
|
||||
|
||||
# Desligue a instancia antes do snapshot
|
||||
shutdown -h now
|
||||
```
|
||||
132
aula-08/cleanup.sh
Executable file
132
aula-08/cleanup.sh
Executable file
@@ -0,0 +1,132 @@
|
||||
#!/bin/bash
|
||||
|
||||
############################################################
|
||||
# Aula 08 - Cleanup
|
||||
# Destrói a infraestrutura provisionada
|
||||
############################################################
|
||||
|
||||
set -e
|
||||
|
||||
# Cores para output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m'
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
|
||||
log_success() { echo -e "${GREEN}[OK]${NC} $1"; }
|
||||
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
|
||||
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
|
||||
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo " Cleanup - Destruir Infraestrutura"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
# Verificar se tofu está instalado
|
||||
if ! command -v tofu &> /dev/null; then
|
||||
log_error "OpenTofu não encontrado!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Verificar se há state
|
||||
if [ ! -f "terraform.tfstate" ] && [ ! -d ".terraform" ]; then
|
||||
log_warn "Nenhuma infraestrutura encontrada para destruir."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Verificar workers do autoscaler (criados fora do OpenTofu)
|
||||
if [ -f "kubeconfig" ]; then
|
||||
export KUBECONFIG="$SCRIPT_DIR/kubeconfig"
|
||||
|
||||
AUTOSCALER_WORKERS=$(kubectl get nodes -l node.kubernetes.io/instance-type=cax11 \
|
||||
--no-headers 2>/dev/null | wc -l | tr -d ' ' || echo "0")
|
||||
|
||||
if [ "$AUTOSCALER_WORKERS" -gt "1" ]; then
|
||||
log_warn "Detectados $AUTOSCALER_WORKERS workers (incluindo os do autoscaler)"
|
||||
log_warn "Workers criados pelo autoscaler serão removidos via API Hetzner"
|
||||
echo ""
|
||||
fi
|
||||
fi
|
||||
|
||||
log_warn "ATENÇÃO: Esta operação irá DESTRUIR todos os recursos!"
|
||||
echo ""
|
||||
echo "Recursos que serão removidos:"
|
||||
echo " - 3x Control Plane nodes"
|
||||
echo " - Workers (incluindo os criados pelo autoscaler)"
|
||||
echo " - Rede privada"
|
||||
echo " - Floating IP"
|
||||
echo " - Firewall"
|
||||
echo " - Placement Group"
|
||||
echo ""
|
||||
|
||||
read -p "Tem certeza que deseja continuar? (digite 'sim' para confirmar): " confirm
|
||||
|
||||
if [ "$confirm" != "sim" ]; then
|
||||
log_info "Operação cancelada"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Remover workers do autoscaler primeiro (se existirem)
|
||||
if [ -f "terraform.tfvars" ]; then
|
||||
HCLOUD_TOKEN=$(grep 'hcloud_token' terraform.tfvars | cut -d'"' -f2)
|
||||
CLUSTER_NAME=$(tofu output -raw cluster_name 2>/dev/null || echo "")
|
||||
|
||||
if [ -n "$HCLOUD_TOKEN" ] && [ -n "$CLUSTER_NAME" ]; then
|
||||
log_info "Verificando workers do autoscaler..."
|
||||
|
||||
# Listar servers com label do cluster que NÃO são gerenciados pelo tofu
|
||||
AUTOSCALER_SERVERS=$(HCLOUD_TOKEN="$HCLOUD_TOKEN" hcloud server list \
|
||||
-l cluster="$CLUSTER_NAME" \
|
||||
-o noheader -o columns=id,name 2>/dev/null | \
|
||||
grep -E "worker-pool" || true)
|
||||
|
||||
if [ -n "$AUTOSCALER_SERVERS" ]; then
|
||||
log_warn "Removendo workers criados pelo autoscaler..."
|
||||
echo "$AUTOSCALER_SERVERS" | while read -r server_id server_name; do
|
||||
log_info " Removendo $server_name (ID: $server_id)..."
|
||||
HCLOUD_TOKEN="$HCLOUD_TOKEN" hcloud server delete "$server_id" --quiet 2>/dev/null || true
|
||||
done
|
||||
log_success "Workers do autoscaler removidos"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
log_info "Destruindo infraestrutura via OpenTofu..."
|
||||
echo ""
|
||||
|
||||
tofu destroy -auto-approve
|
||||
|
||||
echo ""
|
||||
log_success "Infraestrutura destruída!"
|
||||
echo ""
|
||||
|
||||
# Limpar arquivos locais (manter .terraform para re-deploy rápido)
|
||||
log_info "Limpando arquivos gerados..."
|
||||
|
||||
rm -f kubeconfig talosconfig tfplan terraform.tfstate terraform.tfstate.backup
|
||||
|
||||
log_success "Arquivos removidos"
|
||||
echo ""
|
||||
|
||||
# Perguntar sobre terraform.tfvars
|
||||
if [ -f "terraform.tfvars" ]; then
|
||||
read -p "Remover terraform.tfvars também? (s/N): " remove_tfvars
|
||||
if [[ "$remove_tfvars" =~ ^[Ss]$ ]]; then
|
||||
rm -f terraform.tfvars
|
||||
log_success "terraform.tfvars removido"
|
||||
else
|
||||
log_info "terraform.tfvars mantido (útil para re-deploy)"
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
log_success "Cleanup concluído!"
|
||||
158
aula-08/cluster-autoscaler.yaml
Normal file
158
aula-08/cluster-autoscaler.yaml
Normal file
@@ -0,0 +1,158 @@
|
||||
############################################################
|
||||
# Cluster Autoscaler para Hetzner Cloud + Talos
|
||||
# Escala workers automaticamente de 1 a 5 nodes
|
||||
############################################################
|
||||
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: cluster-autoscaler
|
||||
|
||||
# Secret is created via install-autoscaler.sh (kubectl create secret)
|
||||
# to properly handle base64 encoding of cloud-init
|
||||
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: cluster-autoscaler
|
||||
namespace: cluster-autoscaler
|
||||
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: cluster-autoscaler
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["events", "endpoints"]
|
||||
verbs: ["create", "patch"]
|
||||
- apiGroups: [""]
|
||||
resources: ["pods/eviction"]
|
||||
verbs: ["create"]
|
||||
- apiGroups: [""]
|
||||
resources: ["pods/status"]
|
||||
verbs: ["update"]
|
||||
- apiGroups: [""]
|
||||
resources: ["endpoints"]
|
||||
resourceNames: ["cluster-autoscaler"]
|
||||
verbs: ["get", "update"]
|
||||
- apiGroups: [""]
|
||||
resources: ["nodes"]
|
||||
verbs: ["watch", "list", "get", "update"]
|
||||
- apiGroups: [""]
|
||||
resources: ["namespaces", "pods", "services", "replicationcontrollers", "persistentvolumeclaims", "persistentvolumes"]
|
||||
verbs: ["watch", "list", "get"]
|
||||
- apiGroups: ["batch"]
|
||||
resources: ["jobs", "cronjobs"]
|
||||
verbs: ["watch", "list", "get"]
|
||||
- apiGroups: ["batch", "extensions"]
|
||||
resources: ["jobs"]
|
||||
verbs: ["get", "list", "patch", "watch"]
|
||||
- apiGroups: ["extensions"]
|
||||
resources: ["replicasets", "daemonsets"]
|
||||
verbs: ["watch", "list", "get"]
|
||||
- apiGroups: ["policy"]
|
||||
resources: ["poddisruptionbudgets"]
|
||||
verbs: ["watch", "list"]
|
||||
- apiGroups: ["apps"]
|
||||
resources: ["statefulsets", "replicasets", "daemonsets"]
|
||||
verbs: ["watch", "list", "get"]
|
||||
- apiGroups: ["storage.k8s.io"]
|
||||
resources: ["storageclasses", "csinodes", "csidrivers", "csistoragecapacities"]
|
||||
verbs: ["watch", "list", "get"]
|
||||
- apiGroups: ["coordination.k8s.io"]
|
||||
resources: ["leases"]
|
||||
verbs: ["create"]
|
||||
- apiGroups: ["coordination.k8s.io"]
|
||||
resources: ["leases"]
|
||||
resourceNames: ["cluster-autoscaler"]
|
||||
verbs: ["get", "update"]
|
||||
- apiGroups: [""]
|
||||
resources: ["configmaps"]
|
||||
verbs: ["create", "get", "update", "delete", "list", "watch"]
|
||||
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: cluster-autoscaler
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: ClusterRole
|
||||
name: cluster-autoscaler
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: cluster-autoscaler
|
||||
namespace: cluster-autoscaler
|
||||
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: cluster-autoscaler
|
||||
namespace: cluster-autoscaler
|
||||
labels:
|
||||
app: cluster-autoscaler
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: cluster-autoscaler
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: cluster-autoscaler
|
||||
spec:
|
||||
serviceAccountName: cluster-autoscaler
|
||||
# Use host network to access external APIs (Hetzner)
|
||||
hostNetwork: true
|
||||
dnsPolicy: ClusterFirstWithHostNet
|
||||
# Workaround: Talos DNS proxy doesn't forward to upstream correctly
|
||||
hostAliases:
|
||||
- ip: "213.239.246.73"
|
||||
hostnames:
|
||||
- "api.hetzner.cloud"
|
||||
containers:
|
||||
- name: cluster-autoscaler
|
||||
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.31.0
|
||||
command:
|
||||
- ./cluster-autoscaler
|
||||
- --cloud-provider=hetzner
|
||||
- --nodes=0:5:CAX11:nbg1:worker-pool
|
||||
- --nodes=0:0:CAX11:nbg1:draining-node-pool
|
||||
- --scale-down-enabled=true
|
||||
- --scale-down-delay-after-add=5m
|
||||
- --scale-down-unneeded-time=3m
|
||||
- --scale-down-utilization-threshold=0.5
|
||||
- --skip-nodes-with-local-storage=false
|
||||
- --skip-nodes-with-system-pods=false
|
||||
- --balance-similar-node-groups=true
|
||||
- --v=4
|
||||
env:
|
||||
- name: HCLOUD_TOKEN
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: hcloud-autoscaler
|
||||
key: token
|
||||
- name: HCLOUD_CLOUD_INIT
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: hcloud-autoscaler
|
||||
key: cloud-init
|
||||
- name: HCLOUD_IMAGE
|
||||
value: "${TALOS_IMAGE_ID}"
|
||||
- name: HCLOUD_NETWORK
|
||||
value: "${NETWORK_NAME}"
|
||||
- name: HCLOUD_FIREWALL
|
||||
value: "${FIREWALL_NAME}"
|
||||
- name: HCLOUD_SSH_KEY
|
||||
value: "${SSH_KEY_NAME}"
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 300Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 500Mi
|
||||
139
aula-08/install-autoscaler.sh
Executable file
139
aula-08/install-autoscaler.sh
Executable file
@@ -0,0 +1,139 @@
|
||||
#!/bin/bash
|
||||
|
||||
############################################################
|
||||
# Instala o Cluster Autoscaler no cluster Talos
|
||||
# Requer: cluster provisionado via setup.sh
|
||||
############################################################
|
||||
|
||||
set -e
|
||||
|
||||
# Cores
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m'
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
|
||||
log_success() { echo -e "${GREEN}[OK]${NC} $1"; }
|
||||
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
|
||||
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
|
||||
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo " Instalando Cluster Autoscaler"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
# Verificar pré-requisitos
|
||||
if [ ! -f "kubeconfig" ]; then
|
||||
log_error "kubeconfig não encontrado! Execute setup.sh primeiro."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ ! -f "terraform.tfvars" ]; then
|
||||
log_error "terraform.tfvars não encontrado!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
export KUBECONFIG="$SCRIPT_DIR/kubeconfig"
|
||||
|
||||
# Verificar conexão com cluster
|
||||
log_info "Verificando conexão com o cluster..."
|
||||
if ! kubectl get nodes &>/dev/null; then
|
||||
log_error "Não foi possível conectar ao cluster!"
|
||||
exit 1
|
||||
fi
|
||||
log_success "Conectado ao cluster"
|
||||
|
||||
# Obter valores do OpenTofu
|
||||
log_info "Obtendo configurações do OpenTofu..."
|
||||
|
||||
WORKER_CONFIG_BASE64=$(tofu output -raw autoscaler_worker_config 2>/dev/null)
|
||||
TALOS_IMAGE_ID=$(tofu output -raw autoscaler_image_id 2>/dev/null)
|
||||
CLUSTER_NAME=$(tofu output -raw cluster_name 2>/dev/null)
|
||||
NETWORK_ID=$(tofu output -raw network_id 2>/dev/null)
|
||||
FIREWALL_ID=$(tofu output -raw firewall_id 2>/dev/null)
|
||||
SSH_KEY_NAME=$(tofu output -raw ssh_key_name 2>/dev/null)
|
||||
|
||||
# Obter token do terraform.tfvars
|
||||
HCLOUD_TOKEN=$(grep 'hcloud_token' terraform.tfvars | cut -d'"' -f2)
|
||||
|
||||
if [ -z "$WORKER_CONFIG_BASE64" ] || [ -z "$HCLOUD_TOKEN" ]; then
|
||||
log_error "Não foi possível obter as configurações necessárias!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
log_success "Configurações obtidas"
|
||||
echo " - Cluster: $CLUSTER_NAME"
|
||||
echo " - Image ID: $TALOS_IMAGE_ID"
|
||||
echo " - Network ID: $NETWORK_ID"
|
||||
echo " - SSH Key: $SSH_KEY_NAME"
|
||||
echo ""
|
||||
|
||||
# Criar namespace com política privileged (necessário para hostNetwork)
|
||||
log_info "Criando namespace cluster-autoscaler..."
|
||||
kubectl create namespace cluster-autoscaler --dry-run=client -o yaml | kubectl apply -f -
|
||||
kubectl label namespace cluster-autoscaler pod-security.kubernetes.io/enforce=privileged --overwrite
|
||||
|
||||
# Criar secret com credenciais
|
||||
log_info "Criando secret com credenciais..."
|
||||
kubectl create secret generic hcloud-autoscaler \
|
||||
--namespace cluster-autoscaler \
|
||||
--from-literal=token="$HCLOUD_TOKEN" \
|
||||
--from-literal=cloud-init="$WORKER_CONFIG_BASE64" \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
log_success "Secret criado"
|
||||
|
||||
# Aplicar RBAC e Deployment
|
||||
log_info "Aplicando manifesto do cluster-autoscaler..."
|
||||
|
||||
# Substituir variáveis no template e aplicar
|
||||
cat cluster-autoscaler.yaml | \
|
||||
sed "s|\${TALOS_IMAGE_ID}|$TALOS_IMAGE_ID|g" | \
|
||||
sed "s|\${NETWORK_NAME}|$CLUSTER_NAME-network|g" | \
|
||||
sed "s|\${FIREWALL_NAME}|$CLUSTER_NAME-firewall|g" | \
|
||||
sed "s|\${SSH_KEY_NAME}|$SSH_KEY_NAME|g" | \
|
||||
kubectl apply -f -
|
||||
|
||||
log_success "Cluster Autoscaler instalado!"
|
||||
|
||||
# Aguardar pod ficar pronto
|
||||
log_info "Aguardando pod do autoscaler..."
|
||||
kubectl wait --for=condition=ready pod \
|
||||
-l app=cluster-autoscaler \
|
||||
-n cluster-autoscaler \
|
||||
--timeout=120s
|
||||
|
||||
echo ""
|
||||
log_success "Cluster Autoscaler pronto!"
|
||||
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo " Configuração do Autoscaler"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
echo " Pool: worker-pool"
|
||||
echo " Tipo: CAX11 (ARM64)"
|
||||
echo " Região: nbg1 (Nuremberg)"
|
||||
echo " Min nodes: 1"
|
||||
echo " Max nodes: 5"
|
||||
echo ""
|
||||
echo " Scale down após: 5 minutos"
|
||||
echo " Utilização mínima: 50%"
|
||||
echo ""
|
||||
echo "Comandos úteis:"
|
||||
echo ""
|
||||
echo " # Ver logs do autoscaler"
|
||||
echo " kubectl logs -n cluster-autoscaler -l app=cluster-autoscaler -f"
|
||||
echo ""
|
||||
echo " # Ver status dos nodes"
|
||||
echo " kubectl get nodes"
|
||||
echo ""
|
||||
echo " # Testar scale up (criar pods pending)"
|
||||
echo " kubectl create deployment test --image=nginx --replicas=10"
|
||||
echo ""
|
||||
391
aula-08/main.tf
Normal file
391
aula-08/main.tf
Normal file
@@ -0,0 +1,391 @@
|
||||
############################################################
|
||||
# Hetzner Talos Kubernetes Cluster - Base Infrastructure
|
||||
# Using custom Talos image created from ISO
|
||||
############################################################
|
||||
|
||||
############################################################
|
||||
# PROVIDERS CONFIGURATION
|
||||
############################################################
|
||||
|
||||
provider "hcloud" {
|
||||
token = var.hcloud_token
|
||||
}
|
||||
|
||||
############################################################
|
||||
# DATA SOURCES
|
||||
############################################################
|
||||
|
||||
# Use the custom Talos image created in aula-07
|
||||
data "hcloud_image" "talos" {
|
||||
id = var.talos_image_id
|
||||
}
|
||||
|
||||
############################################################
|
||||
# RANDOM RESOURCES
|
||||
############################################################
|
||||
|
||||
resource "random_string" "cluster_id" {
|
||||
length = 6
|
||||
special = false
|
||||
lower = true
|
||||
upper = false
|
||||
}
|
||||
|
||||
locals {
|
||||
cluster_name = "talos-${random_string.cluster_id.result}"
|
||||
common_labels = {
|
||||
cluster = local.cluster_name
|
||||
environment = var.environment
|
||||
managed_by = "terraform"
|
||||
}
|
||||
}
|
||||
|
||||
############################################################
|
||||
# SSH KEY (for emergency access only)
|
||||
############################################################
|
||||
|
||||
data "hcloud_ssh_keys" "all" {}
|
||||
|
||||
locals {
|
||||
ssh_key_normalized = trimspace(split(" ", var.ssh_public_key)[0] == "ssh-rsa" ?
|
||||
join(" ", slice(split(" ", var.ssh_public_key), 0, 2)) :
|
||||
var.ssh_public_key)
|
||||
|
||||
ssh_key_matches = [
|
||||
for key in data.hcloud_ssh_keys.all.ssh_keys : key.id
|
||||
if key.public_key == local.ssh_key_normalized || key.public_key == var.ssh_public_key
|
||||
]
|
||||
|
||||
ssh_key_id = length(local.ssh_key_matches) > 0 ? local.ssh_key_matches[0] : hcloud_ssh_key.admin[0].id
|
||||
}
|
||||
|
||||
resource "hcloud_ssh_key" "admin" {
|
||||
count = length(local.ssh_key_matches) == 0 ? 1 : 0
|
||||
name = "${local.cluster_name}-admin"
|
||||
public_key = var.ssh_public_key
|
||||
labels = local.common_labels
|
||||
}
|
||||
|
||||
############################################################
|
||||
# NETWORK CONFIGURATION
|
||||
############################################################
|
||||
|
||||
resource "hcloud_network" "cluster" {
|
||||
name = "${local.cluster_name}-network"
|
||||
ip_range = "10.0.0.0/16"
|
||||
labels = local.common_labels
|
||||
}
|
||||
|
||||
resource "hcloud_network_subnet" "cluster" {
|
||||
type = "cloud"
|
||||
network_id = hcloud_network.cluster.id
|
||||
network_zone = "eu-central"
|
||||
ip_range = "10.0.1.0/24"
|
||||
}
|
||||
|
||||
############################################################
|
||||
# FIREWALL CONFIGURATION
|
||||
############################################################
|
||||
|
||||
resource "hcloud_firewall" "cluster" {
|
||||
name = "${local.cluster_name}-firewall"
|
||||
labels = local.common_labels
|
||||
|
||||
# Talos API access
|
||||
rule {
|
||||
direction = "in"
|
||||
protocol = "tcp"
|
||||
port = "50000"
|
||||
source_ips = ["0.0.0.0/0", "::/0"]
|
||||
}
|
||||
|
||||
# Kubernetes API
|
||||
rule {
|
||||
direction = "in"
|
||||
protocol = "tcp"
|
||||
port = "6443"
|
||||
source_ips = ["0.0.0.0/0", "::/0"]
|
||||
}
|
||||
|
||||
# Allow HTTP/HTTPS for Ingress
|
||||
rule {
|
||||
direction = "in"
|
||||
protocol = "tcp"
|
||||
port = "80"
|
||||
source_ips = ["0.0.0.0/0", "::/0"]
|
||||
}
|
||||
|
||||
rule {
|
||||
direction = "in"
|
||||
protocol = "tcp"
|
||||
port = "443"
|
||||
source_ips = ["0.0.0.0/0", "::/0"]
|
||||
}
|
||||
|
||||
# Allow NodePort range (for services)
|
||||
rule {
|
||||
direction = "in"
|
||||
protocol = "tcp"
|
||||
port = "30000-32767"
|
||||
source_ips = ["0.0.0.0/0", "::/0"]
|
||||
}
|
||||
|
||||
# Allow all outbound traffic
|
||||
rule {
|
||||
direction = "out"
|
||||
protocol = "tcp"
|
||||
port = "any"
|
||||
destination_ips = ["0.0.0.0/0", "::/0"]
|
||||
}
|
||||
|
||||
rule {
|
||||
direction = "out"
|
||||
protocol = "udp"
|
||||
port = "any"
|
||||
destination_ips = ["0.0.0.0/0", "::/0"]
|
||||
}
|
||||
|
||||
rule {
|
||||
direction = "out"
|
||||
protocol = "icmp"
|
||||
destination_ips = ["0.0.0.0/0", "::/0"]
|
||||
}
|
||||
}
|
||||
|
||||
############################################################
|
||||
# PLACEMENT GROUP (keep nodes close for low latency)
|
||||
############################################################
|
||||
|
||||
resource "hcloud_placement_group" "cluster" {
|
||||
name = "${local.cluster_name}-pg"
|
||||
type = "spread"
|
||||
labels = local.common_labels
|
||||
}
|
||||
|
||||
############################################################
|
||||
# CONTROL PLANE NODES (HA with 3 CAX11 nodes)
|
||||
############################################################
|
||||
|
||||
resource "hcloud_server" "control_plane" {
|
||||
count = 3
|
||||
name = "${local.cluster_name}-cp-${count.index}"
|
||||
server_type = "cax11"
|
||||
image = data.hcloud_image.talos.id
|
||||
location = "nbg1" # CAX11 only available in Nuremberg
|
||||
ssh_keys = [local.ssh_key_id]
|
||||
|
||||
firewall_ids = [hcloud_firewall.cluster.id]
|
||||
placement_group_id = hcloud_placement_group.cluster.id
|
||||
|
||||
labels = merge(local.common_labels, {
|
||||
role = "control-plane"
|
||||
node = "cp-${count.index}"
|
||||
arch = "arm64"
|
||||
})
|
||||
|
||||
public_net {
|
||||
ipv4_enabled = true
|
||||
ipv6_enabled = true
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [ssh_keys]
|
||||
}
|
||||
}
|
||||
|
||||
resource "hcloud_server_network" "control_plane" {
|
||||
count = 3
|
||||
server_id = hcloud_server.control_plane[count.index].id
|
||||
network_id = hcloud_network.cluster.id
|
||||
ip = "10.0.1.${10 + count.index}"
|
||||
}
|
||||
|
||||
# Floating IP for stable control plane access
|
||||
resource "hcloud_floating_ip" "control_plane" {
|
||||
type = "ipv4"
|
||||
name = "${local.cluster_name}-cp-ip"
|
||||
home_location = "nbg1"
|
||||
labels = local.common_labels
|
||||
}
|
||||
|
||||
resource "hcloud_floating_ip_assignment" "control_plane" {
|
||||
floating_ip_id = hcloud_floating_ip.control_plane.id
|
||||
server_id = hcloud_server.control_plane[0].id
|
||||
}
|
||||
|
||||
############################################################
|
||||
# WORKER NODE (Single CAX11)
|
||||
############################################################
|
||||
|
||||
resource "hcloud_server" "worker" {
|
||||
count = 1
|
||||
name = "${local.cluster_name}-worker-${count.index}"
|
||||
server_type = "cax11"
|
||||
image = data.hcloud_image.talos.id
|
||||
location = "nbg1"
|
||||
ssh_keys = [local.ssh_key_id]
|
||||
|
||||
firewall_ids = [hcloud_firewall.cluster.id]
|
||||
placement_group_id = hcloud_placement_group.cluster.id
|
||||
|
||||
labels = merge(local.common_labels, {
|
||||
role = "worker"
|
||||
node = "worker-${count.index}"
|
||||
arch = "arm64"
|
||||
})
|
||||
|
||||
public_net {
|
||||
ipv4_enabled = true
|
||||
ipv6_enabled = true
|
||||
}
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [ssh_keys]
|
||||
}
|
||||
}
|
||||
|
||||
resource "hcloud_server_network" "worker" {
|
||||
count = 1
|
||||
server_id = hcloud_server.worker[count.index].id
|
||||
network_id = hcloud_network.cluster.id
|
||||
ip = "10.0.1.${20 + count.index}"
|
||||
}
|
||||
|
||||
############################################################
|
||||
# TALOS CONFIGURATION
|
||||
############################################################
|
||||
|
||||
# Generate Talos machine secrets
|
||||
resource "talos_machine_secrets" "this" {
|
||||
talos_version = var.talos_version
|
||||
}
|
||||
|
||||
# Generate Talos client configuration
|
||||
data "talos_client_configuration" "this" {
|
||||
cluster_name = local.cluster_name
|
||||
client_configuration = talos_machine_secrets.this.client_configuration
|
||||
endpoints = [hcloud_floating_ip.control_plane.ip_address]
|
||||
}
|
||||
|
||||
# Control plane configuration
|
||||
data "talos_machine_configuration" "control_plane" {
|
||||
count = 3
|
||||
cluster_name = local.cluster_name
|
||||
machine_type = "controlplane"
|
||||
cluster_endpoint = "https://${hcloud_floating_ip.control_plane.ip_address}:6443"
|
||||
machine_secrets = talos_machine_secrets.this.machine_secrets
|
||||
talos_version = var.talos_version
|
||||
|
||||
config_patches = [
|
||||
templatefile("${path.module}/talos-patches/control-plane.yaml", {
|
||||
cluster_name = local.cluster_name
|
||||
node_name = hcloud_server.control_plane[count.index].name
|
||||
is_ha = true
|
||||
is_first_cp = count.index == 0
|
||||
etcd_peers = [for i in range(3) : "10.0.1.${10 + i}"]
|
||||
floating_ip = hcloud_floating_ip.control_plane.ip_address
|
||||
})
|
||||
]
|
||||
|
||||
depends_on = [
|
||||
hcloud_server.control_plane,
|
||||
hcloud_floating_ip_assignment.control_plane
|
||||
]
|
||||
}
|
||||
|
||||
# Worker configuration
|
||||
data "talos_machine_configuration" "worker" {
|
||||
count = 1
|
||||
cluster_name = local.cluster_name
|
||||
machine_type = "worker"
|
||||
cluster_endpoint = "https://${hcloud_floating_ip.control_plane.ip_address}:6443"
|
||||
machine_secrets = talos_machine_secrets.this.machine_secrets
|
||||
talos_version = var.talos_version
|
||||
|
||||
config_patches = [
|
||||
templatefile("${path.module}/talos-patches/worker.yaml", {
|
||||
cluster_name = local.cluster_name
|
||||
node_name = hcloud_server.worker[count.index].name
|
||||
})
|
||||
]
|
||||
|
||||
depends_on = [
|
||||
hcloud_server.worker,
|
||||
hcloud_floating_ip_assignment.control_plane
|
||||
]
|
||||
}
|
||||
|
||||
############################################################
|
||||
# APPLY TALOS CONFIGURATION
|
||||
############################################################
|
||||
|
||||
resource "talos_machine_configuration_apply" "control_plane" {
|
||||
count = 3
|
||||
client_configuration = talos_machine_secrets.this.client_configuration
|
||||
machine_configuration_input = data.talos_machine_configuration.control_plane[count.index].machine_configuration
|
||||
endpoint = hcloud_server.control_plane[count.index].ipv4_address
|
||||
node = hcloud_server.control_plane[count.index].ipv4_address
|
||||
|
||||
depends_on = [
|
||||
hcloud_server_network.control_plane,
|
||||
data.talos_machine_configuration.control_plane
|
||||
]
|
||||
}
|
||||
|
||||
resource "talos_machine_configuration_apply" "worker" {
|
||||
count = 1
|
||||
client_configuration = talos_machine_secrets.this.client_configuration
|
||||
machine_configuration_input = data.talos_machine_configuration.worker[count.index].machine_configuration
|
||||
endpoint = hcloud_server.worker[count.index].ipv4_address
|
||||
node = hcloud_server.worker[count.index].ipv4_address
|
||||
|
||||
depends_on = [
|
||||
hcloud_server_network.worker,
|
||||
data.talos_machine_configuration.worker,
|
||||
talos_machine_configuration_apply.control_plane
|
||||
]
|
||||
}
|
||||
|
||||
############################################################
|
||||
# BOOTSTRAP KUBERNETES
|
||||
############################################################
|
||||
|
||||
resource "talos_machine_bootstrap" "this" {
|
||||
client_configuration = talos_machine_secrets.this.client_configuration
|
||||
node = hcloud_server.control_plane[0].ipv4_address
|
||||
|
||||
depends_on = [
|
||||
talos_machine_configuration_apply.control_plane,
|
||||
talos_machine_configuration_apply.worker
|
||||
]
|
||||
}
|
||||
|
||||
############################################################
|
||||
# GET KUBECONFIG
|
||||
############################################################
|
||||
|
||||
resource "talos_cluster_kubeconfig" "this" {
|
||||
client_configuration = talos_machine_secrets.this.client_configuration
|
||||
node = hcloud_server.control_plane[0].ipv4_address
|
||||
|
||||
depends_on = [talos_machine_bootstrap.this]
|
||||
}
|
||||
|
||||
############################################################
|
||||
# SAVE CONFIGURATIONS
|
||||
############################################################
|
||||
|
||||
resource "local_sensitive_file" "kubeconfig" {
|
||||
# Replace the internal hostname with the floating IP for external access
|
||||
content = replace(
|
||||
talos_cluster_kubeconfig.this.kubeconfig_raw,
|
||||
"https://${local.cluster_name}.local:6443",
|
||||
"https://${hcloud_floating_ip.control_plane.ip_address}:6443"
|
||||
)
|
||||
filename = "${path.root}/kubeconfig"
|
||||
}
|
||||
|
||||
resource "local_sensitive_file" "talosconfig" {
|
||||
content = data.talos_client_configuration.this.talos_config
|
||||
filename = "${path.root}/talosconfig"
|
||||
}
|
||||
153
aula-08/outputs.tf
Normal file
153
aula-08/outputs.tf
Normal file
@@ -0,0 +1,153 @@
|
||||
############################################################
|
||||
# Outputs for Hetzner Talos Kubernetes Cluster
|
||||
############################################################
|
||||
|
||||
# Cluster Information
|
||||
output "cluster_name" {
|
||||
description = "The name of the Kubernetes cluster"
|
||||
value = local.cluster_name
|
||||
}
|
||||
|
||||
output "cluster_id" {
|
||||
description = "The unique identifier for the cluster"
|
||||
value = random_string.cluster_id.result
|
||||
}
|
||||
|
||||
# Network Information
|
||||
output "network_id" {
|
||||
description = "The ID of the cluster's private network"
|
||||
value = hcloud_network.cluster.id
|
||||
}
|
||||
|
||||
output "network_cidr" {
|
||||
description = "The CIDR range of the cluster network"
|
||||
value = hcloud_network_subnet.cluster.ip_range
|
||||
}
|
||||
|
||||
# Control Plane Information
|
||||
output "control_plane_ip" {
|
||||
description = "Public IP address of the control plane"
|
||||
value = hcloud_floating_ip.control_plane.ip_address
|
||||
}
|
||||
|
||||
output "control_plane_private_ips" {
|
||||
description = "Private IP addresses of control plane nodes"
|
||||
value = [for cp in hcloud_server_network.control_plane : cp.ip]
|
||||
}
|
||||
|
||||
output "control_plane_ids" {
|
||||
description = "Server IDs of control plane nodes"
|
||||
value = [for cp in hcloud_server.control_plane : cp.id]
|
||||
}
|
||||
|
||||
# Worker Nodes Information
|
||||
output "worker_ips" {
|
||||
description = "Public IP addresses of worker nodes"
|
||||
value = [for w in hcloud_server.worker : w.ipv4_address]
|
||||
}
|
||||
|
||||
output "worker_private_ips" {
|
||||
description = "Private IP addresses of worker nodes"
|
||||
value = [for w in hcloud_server_network.worker : w.ip]
|
||||
}
|
||||
|
||||
output "worker_ids" {
|
||||
description = "Server IDs of worker nodes"
|
||||
value = [for w in hcloud_server.worker : w.id]
|
||||
}
|
||||
|
||||
# Kubernetes Access
|
||||
output "kubeconfig_path" {
|
||||
description = "Path to the generated kubeconfig file"
|
||||
value = local_sensitive_file.kubeconfig.filename
|
||||
}
|
||||
|
||||
output "talosconfig_path" {
|
||||
description = "Path to the generated talosconfig file"
|
||||
value = local_sensitive_file.talosconfig.filename
|
||||
}
|
||||
|
||||
# API Endpoints
|
||||
output "kubernetes_api_endpoint" {
|
||||
description = "Kubernetes API server endpoint"
|
||||
value = "https://${hcloud_floating_ip.control_plane.ip_address}:6443"
|
||||
}
|
||||
|
||||
output "talos_api_endpoint" {
|
||||
description = "Talos API endpoint for management"
|
||||
value = "https://${hcloud_floating_ip.control_plane.ip_address}:50000"
|
||||
}
|
||||
|
||||
# Cost Information
|
||||
output "estimated_monthly_cost" {
|
||||
description = "Estimated monthly cost for the infrastructure (EUR)"
|
||||
value = {
|
||||
control_plane = 3 * 3.79 # 3x CAX11
|
||||
worker = 1 * 3.79 # 1x CAX11
|
||||
floating_ip = 3.00 # Floating IPv4
|
||||
total = (4 * 3.79) + 3.00 # ~€18.16
|
||||
}
|
||||
}
|
||||
|
||||
# Connection Instructions
|
||||
output "connection_instructions" {
|
||||
description = "Instructions for connecting to the cluster"
|
||||
value = <<-EOT
|
||||
|
||||
====================================
|
||||
Kubernetes Cluster Ready!
|
||||
====================================
|
||||
|
||||
1. Configure kubectl:
|
||||
export KUBECONFIG=${local_sensitive_file.kubeconfig.filename}
|
||||
kubectl get nodes
|
||||
|
||||
2. Configure talosctl:
|
||||
export TALOSCONFIG=${local_sensitive_file.talosconfig.filename}
|
||||
talosctl --nodes ${hcloud_floating_ip.control_plane.ip_address} health
|
||||
|
||||
3. Access Kubernetes API:
|
||||
${"https://${hcloud_floating_ip.control_plane.ip_address}:6443"}
|
||||
|
||||
4. Nodes:
|
||||
Control Plane: 3x CAX11 (ARM64)
|
||||
Workers: 1x CAX11 (ARM64)
|
||||
|
||||
5. Total Monthly Cost: ~€18/month
|
||||
|
||||
====================================
|
||||
EOT
|
||||
}
|
||||
|
||||
# Cluster Autoscaler Configuration
|
||||
output "autoscaler_worker_config" {
|
||||
description = "Worker machine config for cluster autoscaler (base64)"
|
||||
value = base64encode(data.talos_machine_configuration.worker[0].machine_configuration)
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "autoscaler_image_id" {
|
||||
description = "Talos image ID for cluster autoscaler"
|
||||
value = var.talos_image_id
|
||||
}
|
||||
|
||||
# Resource Labels
|
||||
output "resource_labels" {
|
||||
description = "Labels applied to all resources"
|
||||
value = local.common_labels
|
||||
}
|
||||
|
||||
# Firewall Information
|
||||
output "firewall_id" {
|
||||
description = "ID of the firewall protecting the cluster"
|
||||
value = hcloud_firewall.cluster.id
|
||||
}
|
||||
|
||||
# SSH Key Information (for autoscaler)
|
||||
output "ssh_key_name" {
|
||||
description = "Name of the SSH key used by the cluster"
|
||||
value = length(local.ssh_key_matches) > 0 ? [
|
||||
for key in data.hcloud_ssh_keys.all.ssh_keys : key.name
|
||||
if key.id == local.ssh_key_matches[0]
|
||||
][0] : "${local.cluster_name}-admin"
|
||||
}
|
||||
361
aula-08/setup.sh
Executable file
361
aula-08/setup.sh
Executable file
@@ -0,0 +1,361 @@
|
||||
#!/bin/bash
|
||||
|
||||
############################################################
|
||||
# Aula 08 - OpenTofu + Talos + Hetzner Cloud
|
||||
# Provisiona cluster Kubernetes Talos em HA
|
||||
############################################################
|
||||
|
||||
set -e
|
||||
|
||||
# Cores para output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# Diretório do script
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
# Funções de log
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[OK]${NC} $1"
|
||||
}
|
||||
|
||||
log_warn() {
|
||||
echo -e "${YELLOW}[WARN]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
############################################################
|
||||
# VERIFICAÇÃO DE PRÉ-REQUISITOS
|
||||
############################################################
|
||||
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo " Aula 08 - Cluster Talos via OpenTofu"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
log_info "Verificando pré-requisitos..."
|
||||
|
||||
# Verificar OpenTofu
|
||||
if ! command -v tofu &> /dev/null; then
|
||||
log_error "OpenTofu não encontrado!"
|
||||
echo ""
|
||||
echo "Instale o OpenTofu:"
|
||||
echo " brew install opentofu # macOS"
|
||||
echo " snap install opentofu # Linux"
|
||||
echo ""
|
||||
echo "Mais info: https://opentofu.org/docs/intro/install/"
|
||||
exit 1
|
||||
fi
|
||||
log_success "OpenTofu $(tofu version | head -1)"
|
||||
|
||||
# Verificar talosctl
|
||||
if ! command -v talosctl &> /dev/null; then
|
||||
log_error "talosctl não encontrado!"
|
||||
echo ""
|
||||
echo "Instale o talosctl:"
|
||||
echo " brew install siderolabs/tap/talosctl # macOS"
|
||||
echo " curl -sL https://talos.dev/install | sh # Linux"
|
||||
echo ""
|
||||
exit 1
|
||||
fi
|
||||
log_success "talosctl $(talosctl version --client 2>/dev/null | grep 'Client' | awk '{print $2}' || echo 'instalado')"
|
||||
|
||||
# Verificar kubectl
|
||||
if ! command -v kubectl &> /dev/null; then
|
||||
log_error "kubectl não encontrado!"
|
||||
echo ""
|
||||
echo "Instale o kubectl:"
|
||||
echo " brew install kubectl # macOS"
|
||||
echo " snap install kubectl # Linux"
|
||||
echo ""
|
||||
exit 1
|
||||
fi
|
||||
log_success "kubectl $(kubectl version --client -o yaml 2>/dev/null | grep gitVersion | awk '{print $2}' || echo 'instalado')"
|
||||
|
||||
# Verificar hcloud CLI (opcional, mas útil)
|
||||
if command -v hcloud &> /dev/null; then
|
||||
log_success "hcloud CLI instalado"
|
||||
else
|
||||
log_warn "hcloud CLI não instalado (opcional)"
|
||||
echo " Para listar imagens: brew install hcloud"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
############################################################
|
||||
# COLETA DE CREDENCIAIS
|
||||
############################################################
|
||||
|
||||
# Verificar se terraform.tfvars já existe
|
||||
if [ -f "terraform.tfvars" ]; then
|
||||
log_warn "terraform.tfvars já existe!"
|
||||
read -p "Deseja sobrescrever? (s/N): " overwrite
|
||||
if [[ ! "$overwrite" =~ ^[Ss]$ ]]; then
|
||||
log_info "Usando terraform.tfvars existente"
|
||||
SKIP_CREDENTIALS=true
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ "$SKIP_CREDENTIALS" != "true" ]; then
|
||||
echo "============================================"
|
||||
echo " Configuração de Credenciais"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
# Token Hetzner
|
||||
echo "1. Token da API Hetzner Cloud"
|
||||
echo " Obtenha em: https://console.hetzner.cloud/projects/*/security/tokens"
|
||||
echo ""
|
||||
read -sp " Digite o token: " HCLOUD_TOKEN
|
||||
echo ""
|
||||
|
||||
if [ -z "$HCLOUD_TOKEN" ]; then
|
||||
log_error "Token não pode ser vazio!"
|
||||
exit 1
|
||||
fi
|
||||
log_success "Token configurado"
|
||||
echo ""
|
||||
|
||||
# SSH Key
|
||||
echo "2. Chave SSH pública"
|
||||
DEFAULT_SSH_KEY="$HOME/.ssh/id_rsa.pub"
|
||||
if [ -f "$DEFAULT_SSH_KEY" ]; then
|
||||
echo " Encontrada: $DEFAULT_SSH_KEY"
|
||||
read -p " Usar esta chave? (S/n): " use_default
|
||||
if [[ ! "$use_default" =~ ^[Nn]$ ]]; then
|
||||
SSH_PUBLIC_KEY=$(cat "$DEFAULT_SSH_KEY")
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ -z "$SSH_PUBLIC_KEY" ]; then
|
||||
read -p " Caminho da chave pública: " ssh_path
|
||||
if [ -f "$ssh_path" ]; then
|
||||
SSH_PUBLIC_KEY=$(cat "$ssh_path")
|
||||
else
|
||||
log_error "Arquivo não encontrado: $ssh_path"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
log_success "Chave SSH configurada"
|
||||
echo ""
|
||||
|
||||
# ID da imagem Talos
|
||||
echo "3. ID da imagem Talos (snapshot da aula-07)"
|
||||
echo " Para listar: hcloud image list --type snapshot"
|
||||
echo ""
|
||||
read -p " Digite o ID da imagem: " TALOS_IMAGE_ID
|
||||
|
||||
if [ -z "$TALOS_IMAGE_ID" ]; then
|
||||
log_error "ID da imagem não pode ser vazio!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Validar que é número
|
||||
if ! [[ "$TALOS_IMAGE_ID" =~ ^[0-9]+$ ]]; then
|
||||
log_error "ID deve ser um número!"
|
||||
exit 1
|
||||
fi
|
||||
log_success "Image ID: $TALOS_IMAGE_ID"
|
||||
echo ""
|
||||
|
||||
# Criar terraform.tfvars
|
||||
log_info "Criando terraform.tfvars..."
|
||||
cat > terraform.tfvars << EOF
|
||||
# Gerado automaticamente por setup.sh
|
||||
# $(date)
|
||||
|
||||
hcloud_token = "$HCLOUD_TOKEN"
|
||||
ssh_public_key = "$SSH_PUBLIC_KEY"
|
||||
talos_image_id = $TALOS_IMAGE_ID
|
||||
|
||||
environment = "workshop"
|
||||
enable_monitoring = true
|
||||
EOF
|
||||
log_success "terraform.tfvars criado"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
############################################################
|
||||
# INICIALIZAÇÃO DO OPENTOFU
|
||||
############################################################
|
||||
|
||||
echo "============================================"
|
||||
echo " Inicializando OpenTofu"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
log_info "Executando tofu init..."
|
||||
tofu init
|
||||
|
||||
log_success "OpenTofu inicializado"
|
||||
echo ""
|
||||
|
||||
############################################################
|
||||
# PLANEJAMENTO
|
||||
############################################################
|
||||
|
||||
echo "============================================"
|
||||
echo " Planejando Infraestrutura"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
log_info "Executando tofu plan..."
|
||||
tofu plan -out=tfplan
|
||||
|
||||
echo ""
|
||||
log_success "Plano criado!"
|
||||
echo ""
|
||||
|
||||
# Mostrar resumo
|
||||
echo "============================================"
|
||||
echo " Recursos a serem criados:"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
echo " - 4x CAX11 (3 CP + 1 Worker) = 4 x €3.79 = €15.16"
|
||||
echo " - 1x Floating IPv4 = €3.00"
|
||||
echo " - Rede/Firewall/Placement = Grátis"
|
||||
echo ""
|
||||
echo " Custo estimado: ~€18.16/mês (sem VAT)"
|
||||
echo ""
|
||||
|
||||
############################################################
|
||||
# APLICAÇÃO
|
||||
############################################################
|
||||
|
||||
read -p "Deseja aplicar o plano? (s/N): " apply
|
||||
if [[ ! "$apply" =~ ^[Ss]$ ]]; then
|
||||
log_warn "Operação cancelada pelo usuário"
|
||||
echo ""
|
||||
echo "Para aplicar manualmente:"
|
||||
echo " tofu apply tfplan"
|
||||
echo ""
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo ""
|
||||
log_info "Aplicando infraestrutura..."
|
||||
echo ""
|
||||
|
||||
tofu apply tfplan
|
||||
|
||||
echo ""
|
||||
log_success "Infraestrutura provisionada!"
|
||||
echo ""
|
||||
|
||||
############################################################
|
||||
# CONFIGURAÇÃO PÓS-DEPLOY
|
||||
############################################################
|
||||
|
||||
echo "============================================"
|
||||
echo " Configuração Pós-Deploy"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
# Aguardar cluster ficar pronto
|
||||
log_info "Aguardando cluster Talos ficar pronto..."
|
||||
sleep 10
|
||||
|
||||
# Configurar talosctl
|
||||
if [ -f "talosconfig" ]; then
|
||||
log_info "Configurando talosctl..."
|
||||
export TALOSCONFIG="$SCRIPT_DIR/talosconfig"
|
||||
|
||||
# Obter IP do control plane
|
||||
CP_IP=$(tofu output -raw control_plane_ip 2>/dev/null || echo "")
|
||||
|
||||
if [ -n "$CP_IP" ]; then
|
||||
log_info "Aguardando API do Talos em $CP_IP..."
|
||||
|
||||
# Tentar health check (pode demorar alguns minutos)
|
||||
for i in {1..30}; do
|
||||
if talosctl --talosconfig talosconfig -n "$CP_IP" health --wait-timeout 10s 2>/dev/null; then
|
||||
log_success "Cluster Talos saudável!"
|
||||
break
|
||||
fi
|
||||
echo -n "."
|
||||
sleep 10
|
||||
done
|
||||
echo ""
|
||||
fi
|
||||
fi
|
||||
|
||||
# Configurar kubectl
|
||||
if [ -f "kubeconfig" ]; then
|
||||
log_info "Configurando kubectl..."
|
||||
export KUBECONFIG="$SCRIPT_DIR/kubeconfig"
|
||||
|
||||
log_info "Aguardando nodes ficarem Ready..."
|
||||
for i in {1..30}; do
|
||||
if kubectl get nodes 2>/dev/null | grep -q "Ready"; then
|
||||
log_success "Nodes prontos!"
|
||||
kubectl get nodes
|
||||
break
|
||||
fi
|
||||
echo -n "."
|
||||
sleep 10
|
||||
done
|
||||
echo ""
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
############################################################
|
||||
# RESUMO FINAL
|
||||
############################################################
|
||||
|
||||
echo "============================================"
|
||||
echo " Cluster Provisionado com Sucesso!"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
# Mostrar outputs
|
||||
echo "Endpoints:"
|
||||
tofu output -raw kubernetes_api_endpoint 2>/dev/null && echo "" || true
|
||||
tofu output -raw talos_api_endpoint 2>/dev/null && echo "" || true
|
||||
echo ""
|
||||
|
||||
echo "Arquivos gerados:"
|
||||
echo " - kubeconfig : Configuração do kubectl"
|
||||
echo " - talosconfig : Configuração do talosctl"
|
||||
echo ""
|
||||
|
||||
echo "Comandos úteis:"
|
||||
echo ""
|
||||
echo " # Usar kubectl com este cluster"
|
||||
echo " export KUBECONFIG=$SCRIPT_DIR/kubeconfig"
|
||||
echo " kubectl get nodes"
|
||||
echo ""
|
||||
echo " # Usar talosctl com este cluster"
|
||||
echo " export TALOSCONFIG=$SCRIPT_DIR/talosconfig"
|
||||
echo " talosctl -n <IP> health"
|
||||
echo ""
|
||||
echo " # Ver outputs do OpenTofu"
|
||||
echo " tofu output"
|
||||
echo ""
|
||||
echo " # Destruir infraestrutura (CUIDADO!)"
|
||||
echo " ./cleanup.sh"
|
||||
echo ""
|
||||
|
||||
log_success "Setup concluído!"
|
||||
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo " Próximo passo (opcional)"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
echo " Para habilitar autoscaling de 1-5 workers:"
|
||||
echo " ./install-autoscaler.sh"
|
||||
echo ""
|
||||
63
aula-08/talos-patches/control-plane.yaml
Normal file
63
aula-08/talos-patches/control-plane.yaml
Normal file
@@ -0,0 +1,63 @@
|
||||
# Talos Control Plane Configuration Patch
|
||||
# Base configuration for HA control plane
|
||||
machine:
|
||||
# Network configuration for Floating IP
|
||||
network:
|
||||
interfaces:
|
||||
- interface: eth0
|
||||
dhcp: true
|
||||
%{ if is_first_cp ~}
|
||||
addresses:
|
||||
- ${floating_ip}/32
|
||||
%{ endif ~}
|
||||
|
||||
# Network optimizations
|
||||
sysctls:
|
||||
net.core.somaxconn: "8192"
|
||||
net.ipv4.tcp_max_syn_backlog: "8192"
|
||||
net.core.netdev_max_backlog: "5000"
|
||||
net.ipv4.ip_local_port_range: "1024 65535"
|
||||
net.ipv4.tcp_tw_reuse: "1"
|
||||
net.ipv4.tcp_fin_timeout: "15"
|
||||
fs.file-max: "2097152"
|
||||
fs.inotify.max_user_watches: "524288"
|
||||
vm.max_map_count: "262144"
|
||||
|
||||
# Kubelet configuration
|
||||
kubelet:
|
||||
extraArgs:
|
||||
max-pods: "110"
|
||||
kube-reserved: "cpu=200m,memory=300Mi"
|
||||
system-reserved: "cpu=200m,memory=200Mi"
|
||||
|
||||
# Time sync
|
||||
time:
|
||||
servers:
|
||||
- ntp1.hetzner.de
|
||||
- ntp2.hetzner.com
|
||||
- ntp3.hetzner.net
|
||||
|
||||
# Features
|
||||
features:
|
||||
rbac: true
|
||||
stableHostname: true
|
||||
|
||||
cluster:
|
||||
# Control plane configuration
|
||||
controlPlane:
|
||||
endpoint: https://${floating_ip}:6443
|
||||
|
||||
# Network configuration
|
||||
network:
|
||||
cni:
|
||||
name: flannel
|
||||
dnsDomain: cluster.local
|
||||
serviceSubnets:
|
||||
- 10.96.0.0/12
|
||||
podSubnets:
|
||||
- 10.244.0.0/16
|
||||
|
||||
# Etcd configuration for HA
|
||||
etcd:
|
||||
advertisedSubnets:
|
||||
- 10.0.1.0/24
|
||||
44
aula-08/talos-patches/worker.yaml
Normal file
44
aula-08/talos-patches/worker.yaml
Normal file
@@ -0,0 +1,44 @@
|
||||
# Talos Worker Configuration Patch
|
||||
# Base configuration for worker nodes
|
||||
machine:
|
||||
# Network optimizations
|
||||
sysctls:
|
||||
net.core.somaxconn: "8192"
|
||||
net.ipv4.tcp_max_syn_backlog: "8192"
|
||||
net.core.netdev_max_backlog: "5000"
|
||||
net.ipv4.ip_local_port_range: "1024 65535"
|
||||
net.ipv4.tcp_tw_reuse: "1"
|
||||
net.ipv4.tcp_fin_timeout: "15"
|
||||
fs.file-max: "2097152"
|
||||
fs.inotify.max_user_watches: "524288"
|
||||
vm.max_map_count: "262144"
|
||||
|
||||
# Kubelet configuration
|
||||
kubelet:
|
||||
extraArgs:
|
||||
max-pods: "110"
|
||||
kube-reserved: "cpu=100m,memory=200Mi"
|
||||
system-reserved: "cpu=100m,memory=100Mi"
|
||||
|
||||
# Time sync
|
||||
time:
|
||||
servers:
|
||||
- ntp1.hetzner.de
|
||||
- ntp2.hetzner.com
|
||||
- ntp3.hetzner.net
|
||||
|
||||
# Features
|
||||
features:
|
||||
rbac: true
|
||||
stableHostname: true
|
||||
|
||||
cluster:
|
||||
# Network configuration
|
||||
network:
|
||||
cni:
|
||||
name: flannel
|
||||
dnsDomain: cluster.local
|
||||
serviceSubnets:
|
||||
- 10.96.0.0/12
|
||||
podSubnets:
|
||||
- 10.244.0.0/16
|
||||
53
aula-08/terraform.tfvars.example
Normal file
53
aula-08/terraform.tfvars.example
Normal file
@@ -0,0 +1,53 @@
|
||||
# Exemplo de arquivo terraform.tfvars
|
||||
# Copie este arquivo para terraform.tfvars e preencha com seus valores
|
||||
|
||||
# ============================================
|
||||
# CREDENCIAIS (OBRIGATÓRIO)
|
||||
# ============================================
|
||||
|
||||
# Token da API Hetzner Cloud
|
||||
# Obtenha em: https://console.hetzner.cloud/projects/[PROJECT_ID]/security/tokens
|
||||
hcloud_token = "seu_token_hetzner_aqui"
|
||||
|
||||
# Chave SSH pública para acesso emergencial aos nodes
|
||||
# Obtenha com: cat ~/.ssh/id_rsa.pub
|
||||
ssh_public_key = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC... seu@email.com"
|
||||
|
||||
# ID da imagem Talos customizada (criada na aula-07)
|
||||
# Obtenha com: hcloud image list --type snapshot
|
||||
talos_image_id = 123456789
|
||||
|
||||
# ============================================
|
||||
# CONFIGURAÇÃO DO CLUSTER
|
||||
# ============================================
|
||||
|
||||
# Ambiente (prod, staging, dev)
|
||||
environment = "workshop"
|
||||
|
||||
# Versão do Talos OS (opcional - default: v1.11.2)
|
||||
# talos_version = "v1.11.2"
|
||||
|
||||
# ============================================
|
||||
# MONITORAMENTO
|
||||
# ============================================
|
||||
|
||||
# Habilitar Victoria Metrics
|
||||
enable_monitoring = true
|
||||
|
||||
# ============================================
|
||||
# AUTO-SCALING
|
||||
# ============================================
|
||||
|
||||
# Thresholds de CPU para scaling
|
||||
scale_up_threshold = 70 # Escala quando CPU > 70%
|
||||
scale_down_threshold = 30 # Reduz quando CPU < 30%
|
||||
|
||||
# ============================================
|
||||
# LABELS CUSTOMIZADAS (OPCIONAL)
|
||||
# ============================================
|
||||
|
||||
# Labels adicionais para todos os recursos
|
||||
custom_labels = {
|
||||
projeto = "k8s-base"
|
||||
responsavel = "devops"
|
||||
}
|
||||
38
aula-08/test-autoscaler.yaml
Normal file
38
aula-08/test-autoscaler.yaml
Normal file
@@ -0,0 +1,38 @@
|
||||
############################################################
|
||||
# Deployment de teste para o Cluster Autoscaler
|
||||
# Cria pods que consomem recursos para forçar scale-up
|
||||
############################################################
|
||||
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: test-autoscaler
|
||||
namespace: default
|
||||
spec:
|
||||
replicas: 10
|
||||
selector:
|
||||
matchLabels:
|
||||
app: test-autoscaler
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: test-autoscaler
|
||||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx:alpine
|
||||
resources:
|
||||
requests:
|
||||
cpu: 400m # Cada pod pede 0.4 CPU
|
||||
memory: 512Mi # Cada pod pede 512MB RAM
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 640Mi
|
||||
# Evita que pods rodem nos control-planes
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
nodeSelectorTerms:
|
||||
- matchExpressions:
|
||||
- key: node-role.kubernetes.io/control-plane
|
||||
operator: DoesNotExist
|
||||
63
aula-08/variables.tf
Normal file
63
aula-08/variables.tf
Normal file
@@ -0,0 +1,63 @@
|
||||
############################################################
|
||||
# Variables for Hetzner Talos Kubernetes Cluster
|
||||
############################################################
|
||||
|
||||
# Authentication
|
||||
variable "hcloud_token" {
|
||||
type = string
|
||||
description = "Hetzner Cloud API token"
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# Cluster Configuration
|
||||
variable "environment" {
|
||||
type = string
|
||||
description = "Environment name (prod, staging, dev)"
|
||||
default = "prod"
|
||||
}
|
||||
|
||||
|
||||
# SSH Configuration
|
||||
variable "ssh_public_key" {
|
||||
type = string
|
||||
description = "Public SSH key for emergency access to nodes"
|
||||
}
|
||||
|
||||
# Talos Configuration
|
||||
variable "talos_image_id" {
|
||||
type = number
|
||||
description = "ID da imagem Talos customizada na Hetzner (criada na aula-07). Obtenha com: hcloud image list --type snapshot"
|
||||
}
|
||||
|
||||
variable "talos_version" {
|
||||
type = string
|
||||
description = "Talos version to use"
|
||||
default = "v1.11.2" # Match the official image version
|
||||
}
|
||||
|
||||
# Monitoring Configuration
|
||||
variable "enable_monitoring" {
|
||||
type = bool
|
||||
description = "Enable Victoria Metrics monitoring stack"
|
||||
default = true
|
||||
}
|
||||
|
||||
# Auto-scaling Configuration
|
||||
variable "scale_up_threshold" {
|
||||
type = number
|
||||
description = "CPU percentage to trigger scale up"
|
||||
default = 70
|
||||
}
|
||||
|
||||
variable "scale_down_threshold" {
|
||||
type = number
|
||||
description = "CPU percentage to trigger scale down"
|
||||
default = 30
|
||||
}
|
||||
|
||||
# Tags for resource management
|
||||
variable "custom_labels" {
|
||||
type = map(string)
|
||||
description = "Custom labels to add to all resources"
|
||||
default = {}
|
||||
}
|
||||
35
aula-08/versions.tf
Normal file
35
aula-08/versions.tf
Normal file
@@ -0,0 +1,35 @@
|
||||
############################################################
|
||||
# OpenTofu Version and Provider Requirements
|
||||
# Compatible with OpenTofu >= 1.6.0
|
||||
############################################################
|
||||
|
||||
terraform {
|
||||
required_version = ">= 1.6.0"
|
||||
|
||||
required_providers {
|
||||
hcloud = {
|
||||
source = "hetznercloud/hcloud"
|
||||
version = "~> 1.45"
|
||||
}
|
||||
|
||||
talos = {
|
||||
source = "siderolabs/talos"
|
||||
version = "0.6.0"
|
||||
}
|
||||
|
||||
random = {
|
||||
source = "hashicorp/random"
|
||||
version = "~> 3.5"
|
||||
}
|
||||
|
||||
null = {
|
||||
source = "hashicorp/null"
|
||||
version = "~> 3.2"
|
||||
}
|
||||
|
||||
local = {
|
||||
source = "hashicorp/local"
|
||||
version = "~> 2.4"
|
||||
}
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user