Level: Senior · Type: Full-time, hands-on Primary stack: Python · FastAPI · PostgreSQL · CI/CD · Infrastructure-as-Code · Containers · Observability
The role
A senior, hands-on engineer who owns infrastructure, CI/CD, environments, and observability — and who has enough backend-security depth to build and review the permission and data-isolation code, not just deploy around it. You own integration boundaries between independent services, and you keep an AI-assisted build moving by removing bottlenecks.
This is an AI-assisted build: AI coding tools generate most of the implementation. Your value is judgment — platform and architecture decisions, rigorous review, system seams, and unblocking — not lines typed. Because generation is fast, review is the bottleneck, and a defining duty of this role is being the second competent reviewer of security-critical code so it is never reviewed by only one person.
The depth requirements below are deliberately specific. We expect hands-on, production-grade command of most of them.
Technical depth we expectContainers & orchestration (expert)
- Docker: multi-stage builds, minimal/secure base images, layer caching, non-root containers, image scanning, docker compose for local dev parity.
- Orchestration/runtime: experience running containerized services in production (a managed platform such as ECS/Fargate, a PaaS such as Railway/Render/Fly, and/or Kubernetes); health checks, readiness/liveness, graceful shutdown, resource limits, autoscaling signals.
CI/CD (expert)
- Design pipelines (GitHub Actions or equivalent): build → test → scan → deploy, with caching, matrix builds, and required status checks.
- Deployment safety: blue/green or rolling deploys, migration gating, automated rollback, environment promotion (dev → staging → prod), feature-flagged releases.
- Branch protection, CODEOWNERS-style review routing, and merge-queue/throughput thinking.
Infrastructure-as-Code (deep)
- Terraform (or equivalent): modules, state management and locking, plan/apply discipline, drift detection, secrets-free state.
- Networking fundamentals: VPC/subnets, security groups/firewalls, TLS termination, DNS, load balancers, private vs public service exposure.
- Secrets management (a vault/secrets-manager), key rotation, least-privilege IAM.
Observability (deep)
- Structured logging (JSON, correlation/trace IDs), centralized log aggregation, and log-based alerting.
- Metrics & tracing: RED/USE methods, dashboards, SLOs, distributed tracing across services.
- Error tracking and severity routing (e.g., Sentry-class tooling), on-call-grade alerting that minimizes noise.
- Cost & usage observability for AI workloads: per-operation token/usage accounting, budget thresholds and hard stops, spend dashboards.
PostgreSQL & data layer (strong)
- Operating Postgres in production: backups/PITR, connection pooling (PgBouncer — transaction vs session pooling and what breaks in each), replication/failover basics, pgvector operational considerations.
- Migration operations: running zero-downtime migrations safely in CI/CD, expand/contract, rollback of schema changes.
- EXPLAIN ANALYZE literacy sufficient to catch a pathological query before it ships.
Backend & security (must genuinely review, not just deploy)
- Build and maintain services in Python/FastAPI on PostgreSQL — real application code, not infra-only.
- Row-Level Security (RLS) and multi-tenant isolation: CREATE POLICY, USING vs WITH CHECK, FORCE ROW LEVEL SECURITY, session-variable/role-based tenant scoping — enough fluency to authoritatively review isolation code and find the leak another reviewer misses.
- Auth mechanics: JWT verification and key rotation, API-key hashing/revocation, secure handling of credentials in transit and at rest.
- Append-only/audit patterns, REVOKE-based immutability, and tamper-evidence.
- OWASP Top 10 and their mitigations in this stack.
Integration boundaries & system seams
- Define and police interface contracts between independent services/codebases (including multi-project / multi-database setups that merge through a defined contract).
- Reason about boundary failure modes: schema drift, version skew, contract violations, partial outages; design for graceful degradation across the seam.
Engineering practice
- Strong Git discipline; small reviewable PRs; ability to review AI-generated code critically — spotting subtly wrong infra, concurrency, permission, and edge-case handling that looks plausible.
- Test strategy for infrastructure and backend (integration tests, ephemeral environments, contract tests).
- Scripting/automation fluency (Python and shell).
Core tasks
- Stand up and own infrastructure-as-code, CI/CD, environments, secrets, and rollback.
- Build observability: logging, metrics, tracing, error/severity routing, and AI-workload cost/usage monitoring.
- Define and enforce integration contracts between independent codebases and resolve seam failures.
- Build backend services and serve as the designated second reviewer of security-critical (RLS/auth/isolation) code.
- Find and remove bottlenecks: review-queue backlog, environment friction, integration failures.
Requirements
Must-have
- 6+ years in platform / DevOps / SRE, with strong IaC (Terraform or equivalent), CI/CD, container, and observability experience in production.
- Full-stack/backend capability: able to build and review Python/FastAPI on PostgreSQL application code, not infrastructure-only.
- Database-security depth: genuine RLS and multi-tenant-isolation fluency sufficient to authoritatively review it.
- Experience owning integration boundaries between independent services/codebases.
Nice-to-have
- Multi-project / multi-database architectures with contract-based merges.
- Cost/observability tooling for LLM workloads (token accounting, severity routing).
- Kubernetes at production scale; managed-cloud (AWS/GCP) depth.
- Experience in AI-assisted / AI-pair-programming and review-bottleneck environments.
How we'll screen
We do not accept an infra-only profile. Expect:
- Live RLS / isolation review: given a schema and policies, find the cross-boundary leak and fix it.
- Deployment-safety design: design zero-downtime deploy + migration + rollback for a schema-changing release.
- AI-code review: critique a plausible-but-subtly-wrong AI-generated infra or permission PR.
If a candidate can't review backend security, they are not a fit for this role.
Pay: ₹332,027.25 - ₹1,620,665.88 per year
Benefits:
Work Location: In person