DevOps Engineer 2

Cashfree Payments -
Bellandur, Karnataka

Apply Now

Job details

Full-time
1 day ago

Qualifications

Azure
Rust (programming language)
Go
Cloud architecture
PCI
Kubernetes
Git
AWS
Terraform
Back-end development
REST
GitHub
APIs
S3
Budgeting
Python
Identity & access management

Full job description

An Platform Engineer - Infrastructure builds and operates the foundational platform that product engineering teams rely on — including internal developer tools, automation systems, abstraction layers, and resilient infrastructure.

InfraPlatform engineers work as software engineers for infrastructure, focusing on scale, reliability, developer productivity and cost efficiency.

They design internal tools and services that improve developer experience, automate infrastructure lifecycle, enforce security/compliance, and maintain high availability and performance across cloud environments.

✅ Core Responsibilities (Expanded)

1. Build & Maintain Internal Developer Tools (Dev Productivity)

InfraPlatform engineers create tooling that empowers 250+ developers to move faster with less friction:

In-house debugging or triage tools
Self-service portals for infrastructure, logs, metrics, access
Secrets management abstractions

These tools reduce friction, enforce standards, and boost developer velocity.

2. Platform Reliability, Uptime & Resilience Engineering

InfraPlatform engineers think like SREs but build systems, not dashboards.

They are responsible for:

Designing highly available systems (multi-AZ, multi-region)
Automated failover, health checking, circuit breaking
SLOs, error budgets, reliability KPIs
Observability architecture (metrics, logs, tracing, profiling)
Scaling infrastructure through automation (autoscalers, KEDA, controllers)
Continuous chaos testing in staging
Policy-as-code to enforce guardrails

Goal: platform downtime should not exceed the company SLO.

3. Infrastructure as Code — Modules, Libraries & Abstractions

InfraPlatform treats infra as a software product:

Write Terraform/Pulumi/OpenTofu modules for teams to reuse
Build opinionated abstractions (e.g., “standard service” module)
Write Kubernetes operators/controllers to automate infra
Maintain GitOps pipelines (ArgoCD/Github)
Multi-account orchestration, permissions boundary management

Goal: infra provisioning becomes predictable, fast, error-free.

4. Internal Platform Services (Control planes & APIs)

Platform engineers create internal services that orchestrate entire workflows:

Deployment orchestration service
Resource orchestration control plane (namespace creation, quotas, IAM, networking)
Internal service discovery
Audit/Access management tooling
Environment-as-a-Service API

These are custom-built services, not manual scripts.

5. Observability, Diagnostics & Production Tooling

Ensure engineers get first-class troubleshooting tools:

Unified logs, metrics, traces ingestion pipeline
Debugging utilities (e.g., "log stream" CLI, distributed trace browser)
Stateful dashboards for performance & cost hotspots
Anomaly detection for infra issues
Postmortem automation

Goal: reduce MTTR drastically.

6. Cost Efficiency & Infrastructure Optimization

Key responsibility in fintech:

Design cost-efficient scaling policies
Write analyzers/checkers to detect expensive patterns
Automate cost guardrails (e.g., TTL for resources, cleanup pipelines)
Build dashboards (VM, S3, CloudWatch, EKS) with optimizations
Drive “zero waste infra” initiatives

Goal: reduce infra costs aggressively without impacting reliability.

7. Security, Governance & Compliance Automation

Platform engineers own:

IAM automation and least-privilege patterns
Secrets & key rotation automation
Enforcement of standards using OPA / Gatekeeper / SCPs
Vulnerability scanning pipelines
Infrastructure compliance (PCI DSS, SOC2) baked into workflows

Goal: compliance is automatic, not a manual activity.

✅ Job Requirements

Experience & Background

3 to 6 years of hands-on experience in Software Engineering, Platform Engineering, Site Reliability Engineering (SRE), or DevOps.
Proven track record of treating infrastructure as a software product and building platforms at scale rather than just managing servers manually.

Software Engineering & Development

Strong Programming Skills: Proficiency in systems-level or backend programming languages such as Go, Python, or Rust. You should be comfortable building APIs, CLI tools, and background worker services.
Systems Design: Ability to design, build, and maintain internal RESTful APIs, control planes, and microservices for deployment orchestration and resource management.

Infrastructure & Cloud Architecture

Cloud Expertise: Deep understanding of at least one major cloud provider (AWS, GCP, or Azure), including multi-region and multi-AZ architectures, networking, and IAM.
Infrastructure as Code (IaC): Advanced experience with Terraform, Pulumi, or OpenTofu. You should be able to write, test, and maintain opinionated, reusable modules rather than just basic scripts.

Containerization & Kubernetes

Kubernetes Mastery: Deep understanding of Kubernetes architecture. Experience writing custom K8s operators/controllers and managing multi-cluster setups.
GitOps: Hands-on experience with continuous delivery and GitOps workflows using tools like ArgoCD or Flux.

Reliability & Observability

SRE Mindset: Practical experience defining and managing SLOs, SLIs, and error budgets. Understanding of circuit breaking, health checking, and automated failover.
Observability Stack: Experience setting up and managing unified logging, metrics, and distributed tracing pipelines (e.g., Prometheus, Grafana, OpenTelemetry, Datadog, or Jaeger).

Security, Governance & Cost Efficiency

Policy & Security as Code: Familiarity with implementing guardrails using OPA (Open Policy Agent), Gatekeeper, or Kyverno. Experience with secret management systems (e.g., HashiCorp Vault).
Cloud FinOps (Bonus): Experience designing auto-scaling architectures (e.g., KEDA, Karpenter) and implementing strategies or automation to reduce cloud waste and optimize infrastructure costs.

Apply Now

✅ Core Responsibilities (Expanded)

1. Build & Maintain Internal Developer Tools (Dev Productivity)

2. Platform Reliability, Uptime & Resilience Engineering

3. Infrastructure as Code — Modules, Libraries & Abstractions

4. Internal Platform Services (Control planes & APIs)

5. Observability, Diagnostics & Production Tooling

6. Cost Efficiency & Infrastructure Optimization

7. Security, Governance & Compliance Automation

✅ Job Requirements

Jobseeker tools

Employer Tools

Browse

Stay Connected