We are seeking a mid- to senior-level CloudOps / DevOps Engineer to design, build, and operate resilient, scalable cloud infrastructure on AWS. This role is ideal for someone who treats infrastructure as code, thinks in terms of reliability and recoverability, and can translate complex business and technical requirements into well-architected cloud solutions.
You will partner with engineering, platform, and security teams to own core infrastructure, automation, and operational excellence across cloud environments. You must be comfortable communicating across technical and non-technical stakeholders, working independently on defined initiatives, and collaborating closely with distributed teams when cross-functional alignment is required.
Design and implement complex cloud architectures on AWS, including networking, compute, storage, identity, and observability components
Build and maintain Infrastructure as Code (IaC) using Terraform, with modular, reusable, and version-controlled configurations
Design, implement, and maintain CI/CD pipelines and release practices that support safe, repeatable infrastructure and application deployments
Build and operate Docker-based workloads and support containerization initiatives, including migration from VM-based operating models
Write automation and tooling in Python, Go, and Node.js to support infrastructure, pipelines, integrations, and operational workflows
Leverage AI development tooling (e.g., Cursor, Claude, and related platforms) as part of day-to-day development
Define and implement disaster recovery (DR) and business continuity strategies, including backup, failover, RTO/RPO planning, playbook creation, and recovery testing
Execute and advance roadmap initiatives spanning account reorganization, hub/spoke networking, DR strategy, Git migrations, EKS runners, database migrations, security hardening, and cost optimization
Establish and improve operational practices for monitoring, alerting, incident response, metrics, and post-incident review
Automate provisioning, configuration management, and operational tasks to reduce manual toil and improve consistency
Provide ongoing DevOps support for development teams across multiple business units
Document architecture decisions, runbooks, and operational procedures for knowledge sharing and audit readiness
Participate in on-call rotation and lead or support production incident resolution as needed
Deploy, administer, manage, and optimize LAMP (Linux, Apache, MariaDB/MySQL, PHP) environments supporting business-critical and high-availability applications.
Troubleshoot complex issues across the full application stack, including Linux OS, Apache, PHP, databases, networking, storage, and application integrations.
Install, configure, secure, and troubleshoot Apache web servers, including virtual hosts, SSL/TLS certificates, reverse proxy configurations, URL rewrites, and performance tuning.
5+ years of experience in DevOps, CloudOps, Site Reliability Engineering, or a related infrastructure role at a mid to senior level
Strong hands-on experience with Terraform for provisioning and managing cloud infrastructure
Production experience with AWS, including core services such as VPC, IAM, EC2/EKS/ECS, S3, RDS, Lambda, EFS, CloudWatch, Route 53, KMS, WAF, and related networking/security services
Demonstrated ability to design complex, production-grade cloud architectures that balance scalability, security, availability, and cost
Practical experience with disaster recovery practices, including DR strategy, playbook development, backup strategies, multi-region or multi-zone design, failover planning, and DR testing
Strong working knowledge of CI/CD concepts and tooling (e.g., GitHub Actions, GitLab CI, Jenkins, Argo CD, or similar), including pipeline design, release automation, and repository governance
Hands-on experience with Docker and containerized deployments; familiarity with ECS and/or Kubernetes/EKS in production environments
Proficiency in scripting and development using Python, Go, or Node.js for automation, Lambda functions, tooling, and pipeline integrations
Experience with IaC beyond a single toolset — designing reusable modules, managing environment replication, and supporting DR-ready infrastructure
Solid grasp of networking concepts (DNS, load balancing, firewalls, VPNs, hub/spoke models, private connectivity, TLS)
Familiarity with security best practices for cloud infrastructure, including least-privilege IAM, secrets management, WAF management, and compliance considerations (e.g., SOC 2)
Ability to communicate clearly with engineering, security, and leadership stakeholders and to work independently on roadmap initiatives with minimal day-to-day direction
Strong problem-solving skills, sound judgment, and a track record of delivering infrastructure work across multiple concurrent priorities
Cloud certifications in AWS (e.g., AWS Solutions Architect, AWS DevOps Engineer, or related associate/professional-level credentials)
Production experience with GCP, including core services such as VPC, IAM, GKE, Cloud Storage, Cloud SQL, Cloud Functions, Cloud Monitoring, and Cloud DNS
GCP certifications (e.g., Google Professional Cloud Architect, Google Professional DevOps Engineer, or related credentials)
Experience designing, deploying, or operating AI/ML workloads in the cloud, including model hosting, inference pipelines, vector/search infrastructure, model serving infrastructure, GPU/compute provisioning, data pipelines, and integration with managed AI services on AWS and/or GCP (e.g., SageMaker, Bedrock, Vertex AI)
Experience with additional IaC or configuration tools (e.g., Ansible, Pulumi, CloudFormation, Deployment Manager)
Experience implementing Infrastructure as Code governance, including policy-as-code (OPA, Sentinel, or similar) and drift detection
Familiarity with Mirth Connect or healthcare integration platforms
Experience with observability stacks (Prometheus, Grafana, Datadog, OpenTelemetry, ELK, or similar)
Experience supporting regulated environments (SOC 2, HIPAA, PCI, or similar)
Prior involvement in chaos engineering, game days, or formal resilience testing programs
Experience leading VM-to-container migration programs or multi-account AWS organization redesigns
Infrastructure is provisioned consistently through Terraform with clear module boundaries and reviewable change workflows
CI/CD pipelines and release practices are standardized, secure, and adopted by development teams
Critical systems have documented DR plans with tested recovery procedures and defined RTO/RPO targets
Containerization and Lambda-based modernization efforts reduce operational overhead and improve deployment velocity
Cloud architectures are scalable, secure, and observable, with measurable improvements in reliability and cost efficiency
The candidate operates effectively as a self-directed contributor while keeping stakeholders informed and aligned
Remote-first role with home office as the primary working location
Required overlap of 9:00 AM – 12:00 PM ET to collaborate with US-based teams
Participation in an on-call rotation is expected for production support
Collaboration with distributed engineering and platform teams across Surgimate, Implatbase, and openDoctor
Expectation of independent execution on assigned roadmap work, with regular communication on progress, risks, and dependencies
Flexibility to work extended hours as needed to support key initiatives
We are an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, age, disability status, protected veteran status, sexual orientation, gender identity, or any other characteristic protected by law.
Learn more about openDoctor
Our Product & Mission:
At openDoctor, we’re one company with three powerful platforms — Surgimate, ImplantBase, and openDoctor — working together to transform how surgeries are coordinated and delivered. Each platform supports a different phase of the surgical journey, from patient access and scheduling to implant management and post-operative care. United under one mission, we’re building the essential operating system for surgical orchestration — helping providers deliver faster, smarter, and safer care.
Our Team:
At openDoctor, we are a remote first team distributed across the US with R&D Centers in Israel & India. We offer opportunities for our team to spend time together at meetups, volunteer, and work flexibly.
Read more about our team & values here:
https://www.surgimate.com/
https://us.implantbase.com/
https://opendr.com/
Compensation Range: ₹15L - ₹20L