Overview
As a senior DevOps Engineer, you will
own the AWS infrastructure and DevOps toolchain for a high-scale ad serving system composed of asynchronous
Java microservices (Akka framework). Targets include
<50ms response time and
up to 5M concurrent users with
99.99% uptime.
Responsibilities-
Design & stand up AWS environments end-to-end (landing zone, VPCs, networking, security, automation).
-
Build immutable infrastructure and CI/CD for Java microservices (Maven/Gradle) including blue/green & canary releases and automated rollbacks.
-
Implement observability: metrics, logs, traces, SLOs/SLIs, alerting, on-call runbooks.
-
Engineer reliability & performance: autoscaling, caching layers, multi-AZ/region DR, capacity planning to support 5M+ concurrent users and p95/p99 latency goals.
-
Establish security-by-design: IAM least privilege, KMS/Secrets Manager, WAF/Shield, image/signing policies, CIS benchmarks.
-
Partner with EY developers & Performance Test Engineer to tune JVM/Akka, thread pools, GC, and infra limits based on load-testing feedback.
-
Champion cost governance and tagging; produce dashboards and weekly reports.
Tech you’ll use (you don’t need every single one, but you know most)-
AWS: EKS/ECS, EC2, ALB/NLB, API Gateway/Lambda, S3/CloudFront, DynamoDB/ElastiCache (Redis), Aurora/RDS, MSK/Kinesis, OpenSearch, Route 53, VPC, NAT/GW, WAF/Shield, CloudWatch/X-Ray, IAM, KMS, Secrets Manager.
-
IaC & CI/CD: Terraform/CloudFormation, Helm, Argo CD or Flux, GitHub Actions/Jenkins/GitLab CI, Docker.
-
Observability: CloudWatch, OpenTelemetry, Prometheus/Grafana, log pipelines.
-
Languages/Build: Bash/Python for automation; familiarity with Java build/release workflows.
What makes you a great fit-
3–5+ years total experience; Senior/Manager-level depth in AWS platform engineering for high-throughput, low-latency services.
-
Proven ownership of production systems at 10k–1M+ concurrent users (or comparable high RPS) with 99.9x SLOs.
-
Hands-on with Akka/Java microservice delivery pipelines (nice if you’ve tuned JVM, GC, Akka dispatchers).
-
Strong grounding in scaling patterns (event-driven, async IO, caching, backpressure, rate limiting) and resilience(circuit breakers, retries, chaos).
-
Excellent collaboration, documentation, and stakeholder communication.
Logistics-
Location: Remote (prefer India candidates)
-
Schedule: Must join US morning calls (Eastern Time) as needed.
-
Start: 1–3 weeks from offer.
-
Term: Through end of January (likely extension).
EX5kTWpoh5