I build resilient infrastructure with automated recovery, observability, and failure-driven engineering.
I spent two years supporting enterprise production billing systems, debugging failures, tracing edge-case behavior, and validating live deployments under SLA pressure. Working in production automation taught me how fragile distributed systems become under real operational conditions.
That experience directly shaped how I approach infrastructure engineering today. I design systems around failure visibility, rollback safety, observability, and recovery behavior because I've already seen what happens when those layers are missing.
My projects focus on reliability engineering across Linux systems, containerized workloads, Terraform-managed AWS infrastructure, secure CI/CD delivery, Kubernetes orchestration, and production-style observability patterns.
Every project includes failure injection, recovery validation, and documented incident analysis to simulate real operational behavior instead of tutorial-style deployments.
These are real debugging scenarios from my projects. I break things intentionally to understand how they fail, then document the investigation and fix.
If your team manages production systems and values an engineer who thinks about failure before it happens — I'd like to connect.