Job Title: Site Reliability Engineer (SRE) – GCP Expert
Experience: 5 to 6 years
Location: [Insert Location or "Remote"]
Employment Type: Full-time
Job Summary:
We are seeking a skilled and proactive Site Reliability Engineer (SRE) with 5–8 years of experience and deep expertise in Google Cloud Platform (GCP). The ideal candidate will be responsible for the reliability, availability, and performance of cloud-based applications and infrastructure. You will collaborate with development, operations, and security teams to build and maintain scalable, secure, and highly available systems.
Key Responsibilities:
-
Design, develop, and maintain reliable, scalable, and highly available systems on GCP.
-
Build and manage CI/CD pipelines, infrastructure as code (IaC), and monitoring solutions.
-
Proactively monitor and manage system performance, uptime, and capacity using observability tools.
-
Troubleshoot and resolve infrastructure and application-level issues in real-time.
-
Implement and maintain disaster recovery, failover mechanisms, and backup strategies.
-
Automate repetitive tasks and processes to improve efficiency and reduce toil.
-
Participate in on-call rotations, incident management, and root cause analysis (RCA).
-
Ensure compliance with security standards, privacy regulations, and governance policies.
-
Collaborate with cross-functional teams to support DevOps and SRE best practices.
-
Drive improvements in SLAs, SLOs, and error budgets through data-driven insights.
Required Qualifications:
-
4–6 years of relevant experience as an SRE, DevOps Engineer, or Cloud Infrastructure Engineer.
-
Cloud Experience (GCP/AWS): Virtual Machines, Pub/Sub, Networking (Cloud NAT, VPC, Subnets, Load Balancer, Firewalls), Kubernetes (GKE), Buckets, Databases: Postgres, Cloud SQL, BigQuery
-
Incident Management: SLA, SLO, SLI; Error Budgeting (MTTR, MTTD)
-
Observability: Dynatrace, Prometheus & Grafana, Dashboarding
-
Automation: Shell Scripting, Terraform
-
Good to have: CI/CD (Jenkins, ArgoCD), Cloud Security & Compliance, Vulnerability Management, Business Continuity & Disaster Recovery
-
Strong problem-solving and communication skills.
Preferred Qualifications:
-
GCP certifications (e.g., Professional Cloud DevOps Engineer, Cloud Architect).
-
Exposure to multi-cloud environments or hybrid cloud infrastructure.
-
Familiarity with Agile and ITIL frameworks.
-
Experience working in regulated environments with compliance standards (e.g., ISO, SOC2).