Bengaluru, Karnataka
Job Summary
As a Platform Engineer, you are responsible for ensuring the stability, performance, and automation of the cloud platform’s core services, including API automation layers, observability components, CI/CD workflows, IaC toolchains, and QA/Documentation systems.
Key Responsibilities
Core Responsibilities 1. Platform Operations & Reliability (Run Engineering) • Operate and maintain key platform services such as the Terraform Registry, Tracing infrastructure, SGCP Quality & Observability resources, and documentation & chat support systems. • Ensure availability, performance, resilience, and secure lifecycle management for all production components. • Perform patching, upgrades, and vulnerability remediation, aiming for minimal human intervention on production systems. • Lead incident response, perform deep root cause analysis, and implement long term corrective actions. • Reduce operational toil through automation, workflow industrialization, and proactive reliability engineering. 2. CI/CD & Delivery Platforming • Operate and evolve the cloud platform’s CI/CD pipelines and reusable workflows used by ~300 developers. • Manage the lifecycle of base Docker images: security hardening, automated build pipelines, versioning, and distribution. • Maintain and extend the platform’s IaC toolchain, including Terraform workflows, deployment pipelines, and registry management. • Continuously improve delivery performance, deployment reliability, and overall developer experience. • Contribute to the technical roadmap with an engineering driven mindset. 3. Observability Engineering • Maintain and enhance the cloud platform’s observability stack across traces, and dashboards. • Ensure full visibility into system behaviour, performance drifts, errors, and capacity indicators. • Build automation for alerting, anomaly detection, and platform health insights, improving signal quality and reducing noise. • Support SRE practices to strengthen platform reliability through data driven insights. 4. User Support & Platform Adoption • Participate in system demos, validation sessions, and operational readiness reviews. • Act as a partner for SG Cloud engineering teams in troubleshooting and platform enablement.
Skill Requirements
Core Responsibilities 1. Platform Operations & Reliability (Run Engineering) • Operate and maintain key platform services such as the Terraform Registry, Tracing infrastructure, SGCP Quality & Observability resources, and documentation & chat support systems. • Ensure availability, performance, resilience, and secure lifecycle management for all production components. • Perform patching, upgrades, and vulnerability remediation, aiming for minimal human intervention on production systems. • Lead incident response, perform deep root cause analysis, and implement long term corrective actions. • Reduce operational toil through automation, workflow industrialization, and proactive reliability engineering. 2. CI/CD & Delivery Platforming • Operate and evolve the cloud platform’s CI/CD pipelines and reusable workflows used by ~300 developers. • Manage the lifecycle of base Docker images: security hardening, automated build pipelines, versioning, and distribution. • Maintain and extend the platform’s IaC toolchain, including Terraform workflows, deployment pipelines, and registry management. • Continuously improve delivery performance, deployment reliability, and overall developer experience. • Contribute to the technical roadmap with an engineering driven mindset. 3. Observability Engineering • Maintain and enhance the cloud platform’s observability stack across traces, and dashboards. • Ensure full visibility into system behaviour, performance drifts, errors, and capacity indicators. • Build automation for alerting, anomaly detection, and platform health insights, improving signal quality and reducing noise. • Support SRE practices to strengthen platform reliability through data driven insights. 4. User Support & Platform Adoption • Participate in system demos, validation sessions, and operational readiness reviews. • Act as a partner for SG Cloud engineering teams in troubleshooting and platform enablement.
Other Requirements
Key Skills & Competencies Technical Skills • Strong experience with CI/CD tooling (Github Action/GitLab CI, Jenkins) • Solid expertise in Infrastructure as Code—Terraform, Ansible preferred • Hands on experience with platform automation, scripting/coding (Python), and workflow orchestration • Proficiency in containerized environments (Docker / Kubernetes, registries, build pipelines) • Understanding of monitoring and observability at scale (metrics, logs, traces) Engineering Mindset • Reliability first mindset with strong operational discipline • Ability to automate, industrialize, and eliminate manual processes • Strong troubleshooting capabilities across distributed systems • Clear communication and collaborative problem solving across global teams
#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-