An Platform Engineer - Infrastructure builds and operates the foundational platform that product engineering teams rely on — including internal developer tools, automation systems, abstraction layers, and resilient infrastructure.
InfraPlatform engineers work as software engineers for infrastructure, focusing on scale, reliability, developer productivity and cost efficiency.
They design internal tools and services that improve developer experience, automate infrastructure lifecycle, enforce security/compliance, and maintain high availability and performance across cloud environments.
InfraPlatform engineers create tooling that empowers 250+ developers to move faster with less friction:
In-house debugging or triage tools
Self-service portals for infrastructure, logs, metrics, access
Secrets management abstractions
These tools reduce friction, enforce standards, and boost developer velocity.
InfraPlatform engineers think like SREs but build systems, not dashboards.
They are responsible for:
Designing highly available systems (multi-AZ, multi-region)
Automated failover, health checking, circuit breaking
SLOs, error budgets, reliability KPIs
Observability architecture (metrics, logs, tracing, profiling)
Scaling infrastructure through automation (autoscalers, KEDA, controllers)
Continuous chaos testing in staging
Policy-as-code to enforce guardrails
Goal: platform downtime should not exceed the company SLO.
InfraPlatform treats infra as a software product:
Write Terraform/Pulumi/OpenTofu modules for teams to reuse
Build opinionated abstractions (e.g., “standard service” module)
Write Kubernetes operators/controllers to automate infra
Maintain GitOps pipelines (ArgoCD/Github)
Multi-account orchestration, permissions boundary management
Goal: infra provisioning becomes predictable, fast, error-free.
Platform engineers create internal services that orchestrate entire workflows:
Deployment orchestration service
Resource orchestration control plane (namespace creation, quotas, IAM, networking)
Internal service discovery
Audit/Access management tooling
Environment-as-a-Service API
These are custom-built services, not manual scripts.
Ensure engineers get first-class troubleshooting tools:
Unified logs, metrics, traces ingestion pipeline
Debugging utilities (e.g., "log stream" CLI, distributed trace browser)
Stateful dashboards for performance & cost hotspots
Anomaly detection for infra issues
Postmortem automation
Goal: reduce MTTR drastically.
Key responsibility in fintech:
Design cost-efficient scaling policies
Write analyzers/checkers to detect expensive patterns
Automate cost guardrails (e.g., TTL for resources, cleanup pipelines)
Build dashboards (VM, S3, CloudWatch, EKS) with optimizations
Drive “zero waste infra” initiatives
Goal: reduce infra costs aggressively without impacting reliability.
Platform engineers own:
IAM automation and least-privilege patterns
Secrets & key rotation automation
Enforcement of standards using OPA / Gatekeeper / SCPs
Vulnerability scanning pipelines
Infrastructure compliance (PCI DSS, SOC2) baked into workflows
Goal: compliance is automatic, not a manual activity.
Experience & Background
3 to 6 years of hands-on experience in Software Engineering, Platform Engineering, Site Reliability Engineering (SRE), or DevOps.
Proven track record of treating infrastructure as a software product and building platforms at scale rather than just managing servers manually.
Software Engineering & Development
Strong Programming Skills: Proficiency in systems-level or backend programming languages such as Go, Python, or Rust. You should be comfortable building APIs, CLI tools, and background worker services.
Systems Design: Ability to design, build, and maintain internal RESTful APIs, control planes, and microservices for deployment orchestration and resource management.
Infrastructure & Cloud Architecture
Cloud Expertise: Deep understanding of at least one major cloud provider (AWS, GCP, or Azure), including multi-region and multi-AZ architectures, networking, and IAM.
Infrastructure as Code (IaC): Advanced experience with Terraform, Pulumi, or OpenTofu. You should be able to write, test, and maintain opinionated, reusable modules rather than just basic scripts.
Containerization & Kubernetes
Kubernetes Mastery: Deep understanding of Kubernetes architecture. Experience writing custom K8s operators/controllers and managing multi-cluster setups.
GitOps: Hands-on experience with continuous delivery and GitOps workflows using tools like ArgoCD or Flux.
Reliability & Observability
SRE Mindset: Practical experience defining and managing SLOs, SLIs, and error budgets. Understanding of circuit breaking, health checking, and automated failover.
Observability Stack: Experience setting up and managing unified logging, metrics, and distributed tracing pipelines (e.g., Prometheus, Grafana, OpenTelemetry, Datadog, or Jaeger).
Security, Governance & Cost Efficiency
Policy & Security as Code: Familiarity with implementing guardrails using OPA (Open Policy Agent), Gatekeeper, or Kyverno. Experience with secret management systems (e.g., HashiCorp Vault).
Cloud FinOps (Bonus): Experience designing auto-scaling architectures (e.g., KEDA, Karpenter) and implementing strategies or automation to reduce cloud waste and optimize infrastructure costs.