Design and evolve scalable, highly available cloud infrastructure spanning multiple regions and cloud providers (AWS, Azure, GCP)
Lead infrastructure-as-code (IaC) initiatives using CloudFormation, Terraform, and other tooling; establish patterns and standards for infrastructure management
Architect and implement disaster recovery, business continuity, and infrastructure resilience strategies
Drive cloud cost optimization initiatives, implementing governance, monitoring, and rightsizing strategies to achieve operational efficiency
Establish and maintain security baselines, compliance requirements (SOC 2, ISO 27001, etc.), and infrastructure hardening standards
Mentor and guide junior and mid-level DevOps engineers; establish technical standards, code reviews, and best practices
Collaborate with software engineering, security, database, and network teams to align on infrastructure requirements
Drive technical documentation, architecture decision records (ADRs), and knowledge sharing initiatives
Participate in architectural reviews, capacity planning, and technology evaluation for infrastructure modernization
Advocate for operational excellence, automation-first mindset, and continuous improvement culture
Bachelor's degree in Computer Science, Engineering, or related field (Master's degree preferred)
15+ years of overall experience in software engineering, systems engineering, or infrastructure operations
5-10+ years of hands-on DevOps, Site Reliability Engineering (SRE), or platform engineering experience
Proven expertise with cloud platforms (AWS, Azure, GCP) in production environments at scale
Proficiency with Infrastructure as Code tools (Terraform, CloudFormation, or equivalent)
Deep experience with Kubernetes in production; strong understanding of container technologies (Docker, containerd)
Expert-level proficiency with Linux/Unix system administration, networking, and security fundamentals
Strong scripting and programming skills in Python, Bash, Go, or similar languages
Extensive experience designing and maintaining CI/CD pipelines at enterprise scale
Demonstrated expertise with monitoring, logging, and observability solutions (Prometheus, ELK, Datadog, etc.)
Exceptional problem-solving, troubleshooting, and diagnostic skills
Strong verbal and written communication skills; ability to document processes and architect complex systems
Ability to lead and mentor engineers; demonstrated experience mentoring junior and mid-level staff
Experience working in fast-paced, collaborative environments with cross-functional teams
AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or Certified Kubernetes Administrator (CKA)
Knowledge of service mesh technologies (Istio, Linkerd, Consul)
Experience with advanced configuration management (Helm, Kustomize) and templating solutions
Exposure to distributed systems, microservices architecture, and event-driven design
Experience with multi-cloud or hybrid cloud strategies
Knowledge of advanced security practices (DevSecOps, supply chain security, secrets management)
Experience with backup strategies, and infrastructure disaster recovery
Familiarity with software defined networking (SDN) and advanced networking concepts
Experience with observability and cost management tools (Prometheus, Grafana, CloudHealth, etc.)
Track record of implementing cost optimization initiatives resulting in measurable savings
Open source contributions or involvement in DevOps/infrastructure communities