Manually managing infrastructure through a cloud console (often referred to as "ClickOps") is a significant bottleneck. It results in untracked configurations, environment drift, and difficulty in reproducing environments during disaster recovery. Infrastructure as Code (IaC) solves these issues by representing your networks, virtual machines, databases, and firewalls as text files in version control. However, to maximize velocity and prevent state file corruption, you must adhere to structured patterns. Here are the core principles of professional IaC.
1. Declarative Design Over Imperative Scripting
Always prioritize declarative tools like Terraform, OpenTofu, or Pulumi over imperative orchestration scripts (e.g., custom Bash scripts, AWS CLI commands). Declarative tools describe the desired end-state of your architecture: "I want a VPC with three subnets and a PostgreSQL database." The IaC engine then calculates the diff and safely updates the cloud provider. Imperative scripts require you to define every action step (e.g., "create vpc, then wait, then create subnet..."), which makes rollbacks complex and environments fragile.
2. Eliminate Configuration Drift
Configuration drift occurs when manual updates are made directly in the cloud console, bypassing the code. This creates a dangerous mismatch: running an IaC plan could unintentionally modify or delete the manual updates. To combat drift:
- Revoke Write Access: Restrict manual console edit permissions. The CI/CD deployment pipeline should be the only entity with credentials to modify infrastructure.
- Automated Drift Detection: Schedule daily cron jobs to run
terraform planand send alerts if there is any mismatch between code and live cloud state. - Policy-as-Code: Audit your configurations for security flaws (like open SSH ports or unencrypted storage buckets) using policy tools like Checkov, OPA (Open Policy Agent), or tfsec.
3. Separate Infrastructure Into Decoupled Layers
Do not package your entire cloud environment into a single module. A minor update to a web application server should never risk modifying your core database. Separate your configuration into independent stacks with clean boundaries:
- Base Network Stack: Handles VPCs, subnets, route tables, internet gateways, and VPN access. This layer changes very rarely.
- Stateful Data Stack: Handles relational databases, cache layers (like Redis), and object storage buckets. This layer changes occasionally and requires strict backup policies.
- Stateless Application Stack: Handles Kubernetes clusters, virtual machine scale sets, load balancers, DNS records, and SSL certificates. This layer changes frequently.
4. Reusable Terraform Module Example
Here is an example of a secure, production-grade AWS S3 bucket module incorporating security best practices like server-side encryption, public access blocks, and lifecycle versioning rules:
# modules/secure_s3_bucket/main.tf
resource "aws_s3_bucket" "this" {
bucket = var.bucket_name
tags = var.tags
}
resource "aws_s3_bucket_public_access_block" "this" {
bucket = aws_s3_bucket.this.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_versioning" "this" {
bucket = aws_s3_bucket.this.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
bucket = aws_s3_bucket.this.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
5. State File Management and Locking
The state file is the memory of your infrastructure. Store your state file in a remote, encrypted storage backend (e.g., AWS S3, Google Cloud Storage) and enable DynamoDB state locking to prevent concurrent runs from corrupting the state file. Keep backups of your state files to enable quick recovery in case of accidental deletions.
Infinity DevOps
Sharing practical DevOps knowledge with the community.
