Services / Infrastructure

Linux Administration and Cloud Infrastructure.

Production Linux work across AWS, Google Cloud, Azure, Cloudflare, Digital Ocean, and Linode. Multi-AZ deployments with web, database, in-memory cache, and search backends. Terraform and Ansible carry the deployment work as code so environments stay reproducible and recoverable. Docker is our default container runtime, with our own published image set going back over a decade. Security is the first concern on every engagement: HashiCorp Vault for secrets, key rotation as a first-class operational concern, VPN-isolated networks, and traffic segmentation by default.

01 Multi-AZ deployments on AWS, Google Cloud, and Azure

Cloud providers

We deploy production workloads across all three major cloud providers (AWS, Google Cloud Platform, and Microsoft Azure) and pick between them based on the constraints of the engagement: existing investment, regulatory requirements, regional coverage, managed-service availability, and pricing for the specific workload shape. Cross-cloud and hybrid deployments are common when latency, cost, or data-residency requirements pull the architecture in different directions.

The deployments are typically multi-AZ from day one. Web tier behind a load balancer (ALB, Cloud Load Balancing, Azure Load Balancer) with auto-scaling groups across availability zones. Database tier on managed Postgres or MySQL (RDS, Cloud SQL, Azure Database) with automated backups, point-in-time recovery, and read replicas in additional AZs for failover. In-memory cache on managed Redis (ElastiCache, Memorystore, Azure Cache for Redis) for session state, hot-read caching, and rate limiting. Search backends on managed OpenSearch or Elasticsearch (Amazon OpenSearch Service, Elastic Cloud on GCP, Azure AI Search) with index replication across AZs. Object storage on the provider's native service (S3, GCS, Azure Blob) for static assets, file uploads, and backup destinations. Queues and pub/sub on managed services (SQS, Pub/Sub, Service Bus) for async work and integration plumbing.

The pattern that holds across providers: stateless application tier, stateful data tier on managed services where the provider's failure-handling is better than what we'd build, and an operational layer (logs, metrics, alerts, distributed tracing) that gives the on-call engineer the same visibility regardless of which cloud is underneath.

02 Terraform and Ansible for testable, repeatable, recoverable deployments

Infrastructure as code

Every deployment we ship is described in code, version-controlled, and reviewable in a pull request. The split is consistent: Terraform owns the provisioning layer (VPCs, subnets, security groups, load balancers, managed databases, IAM, DNS, CDN) across whatever cloud or combination of clouds the project uses. Ansible owns the configuration layer (OS packages, system users, application runtimes, service configuration, certificate provisioning, log forwarding) on the instances Terraform stands up. Both stay in the same repository as the application code where that fits, or in a sibling infrastructure repo when it doesn't.

The process is built around three properties that the user gets to verify, not just take on faith:

Testability. Terraform plans run on every pull request through CI, with the proposed change visible before merge. Ansible playbooks run against ephemeral test environments (or container-based test harnesses with Molecule) before the production environment sees them. Drift detection runs on a schedule so an out-of-band change doesn't sit unnoticed.
Repeatability. A clean checkout of the repo plus the cloud credentials is enough to reconstruct the environment from scratch. Staging, production, and disaster-recovery regions all derive from the same modules with environment-specific variables. Manual changes in the cloud console are reverted by the next apply, on purpose.
Recovery. Because the entire environment is described in code and the data tier sits behind managed-service backups, recovery from a region-level event is a documented runbook: restore the most recent backup into the target region, run the IaC apply, verify, cut over DNS. We rehearse the runbook periodically rather than discover its rough edges during an outage.

Ansible playbooks are written for idempotency, with role-based composition so a host's purpose (web, db-proxy, cache, search, jump-host) is declared in inventory rather than baked into per-host scripts. Terraform modules are versioned and pinned, with state stored remotely (S3 + DynamoDB lock, GCS, Azure Blob with state-locking) so multiple engineers can apply changes safely.

03 Containerized workloads since the beginning

Docker

Docker has been our default for containerized workloads since the early days of the project. We've been building and publishing our own images since the first stable releases, with a focus on small, hardened, purpose-built containers rather than bloated kitchen-sink base images. Our public Docker Hub at hub.docker.com/u/blueoakinteractive hosts fifteen maintained images covering web tier (nginx, php-fpm, nginx-proxy), database tier (mariadb), search backend (elasticsearch), file transfer (sftp), an Alpine base image, and a CI image (php-ci) with roughly ten thousand pulls used for Drupal continuous-integration pipelines on GitLab. Andy's personal Docker Hub at hub.docker.com/u/andyg5000 has eleven additional images going back over a decade, covering media servers, sync tools, and infrastructure experiments.

We've deployed Docker-hosted workloads on every major hosting target: Cloudflare Containers (the newer managed runtime co-located with Workers), AWS (ECS, Fargate, EKS, and EC2-with-docker for legacy stacks), Google Cloud (Cloud Run for stateless services, GKE for orchestrated workloads), Digital Ocean (App Platform and Droplets with Docker installed), and Linode (LKE for Kubernetes and managed Docker hosts). The image is the same regardless of where it lands; the orchestration layer is what changes.

The advantage of long Docker experience is in the unsexy parts: image-layer cache discipline so builds stay fast, multi-stage builds that strip build-time tooling out of the runtime image, distroless and Alpine bases where the application's dependency surface allows, security scanning in CI so CVEs in upstream layers fail the pipeline, and registry hygiene so old image tags don't accumulate forever. We've shipped enough containers in production to have opinions about every part of the lifecycle.

04 Security-first deployments, secrets management, and network isolation

Security

Security is the first concern on every engagement, not the last. We design infrastructure to fail closed by default: private subnets for everything except the public load balancer, no SSH keys distributed by hand, no production secrets in environment files, no shared admin accounts.

Secrets management runs through HashiCorp Vault for engagements that warrant it. Static credentials are rotated on schedule and on access; dynamic credentials (database, cloud IAM, SSH) are issued per session with short TTLs and revoked automatically. The application instances authenticate to Vault through the platform's identity (instance profile, workload identity, service account) so no bootstrap secret has to live on disk. For smaller deployments, the cloud-native equivalents (AWS Secrets Manager, Google Secret Manager, Azure Key Vault) follow the same pattern: rotate on schedule, retrieve at runtime, never commit to a repo.

Network isolation is the second pillar. Public traffic enters through a hardened reverse proxy or load balancer with WAF rules in front of it. Internal traffic between application tiers stays inside the VPC, with security groups scoped to the minimum required source. Administrative access goes through a VPN (WireGuard, OpenVPN, or the cloud-managed equivalent) into a private subnet, with no public SSH on production instances. Database tiers are not internet-routable; they're reachable only from the application tier and the bastion host through the VPN. Egress is controlled too: production workloads talk to a known set of external services with allowlisted destinations, rather than running with default-allow internet access.

The remaining standard practices fill in around those two pillars: full disk encryption (LUKS, EBS encryption, persistent-disk encryption) by default, TLS certificates issued and renewed automatically through Let's Encrypt or the cloud-native CA, host-level intrusion detection and audit logging shipping to a central log store, regular patch cadence on the OS and the application runtime, immutable infrastructure where the application tier is concerned (replace, don't patch in place), and security scanning baked into the CI pipeline so vulnerabilities surface at build time rather than after deployment.

Application Development

AI & Agentic

Infrastructure

Linux Administration and Cloud Infrastructure.

Cloud providers

Infrastructure as code

Docker

Security