Project Role : Application Support Engineer
Project Role Description : Act as software detectives, provide a dynamic service identifying and solving issues within multiple components of critical business systems.
Must have skills : Google Cloud Container Services
Good to have skills : DevOps, DevOps Architecture
Minimum
7.5 year(s) of experience is required
Educational Qualification : 15 years full time education
Summary:
As an Application Support Engineer, a typical day involves acting as a software detective by proactively monitoring and diagnosing issues across various components of essential business systems. This role requires dynamic problem-solving to ensure the smooth operation of critical applications, collaborating with different teams to maintain system stability and performance. The position demands vigilance and adaptability to swiftly identify root causes and implement effective solutions, contributing to the overall reliability and efficiency of the organization's technology infrastructure.
Google Cloud SRE responsible to maintaining reliable, performant, and scalable services through engineering, automation, and proactive monitoring. Core practices include defining and tracking SLIs/SLOs, managing error budgets, automating infrastructure, responding to incidents, optimizing performance, and collaborating cross-functionally. SREs leverage these practices alongside cloud-native tools and GDC to ensure systems operate efficiently at massive scale
.
1. Google Cloud Platform (GCP) Proficiency
Google Cloud Monitoring (formerly Stackdriver): Understanding metrics, dashboards, uptime monitoring, and alert policies.
Familiarity with GCP resources, services, and quotas for effective monitoring setup.
IAM & permissions management: Ensuring secure access to monitoring dashboards and alerting channels.
2. PromQL (Prometheus Query Language)
Writing and optimizing PromQL queries: Selecting metrics, using functions like rate(), sum(), avg over time(), increase().
Understanding time-series data and aggregation principles.
Translating monitoring requirements into queries that power dashboards and alerts.
3. Terraform & Infrastructure as Code
Knowledge of Terraform modules, resources, and providers.
Specifically, experience with Terraform Google Cloud Provider to define Google Monitoring dashboards and alert policies programmatically.
Understanding Terraform state management, plan and apply lifecycle for safe deployment.
Writing reusable, maintainable code for monitoring infrastructure.
4. Observability & Monitoring Concepts
Metrics, logs, traces: Knowing the difference and when to use them.
Service Level Indicators (SLIs), Service Level Objectives (SLOs), Service Level Agreements (SLAs): Defining meaningful thresholds.
Understanding alerting best practices, noise reduction, and appropriate severity levels.
5. Alerting & Notification Skills
Configuring alerting policies effectively.
Integrating with notification channels such as email, Slack, PagerDuty, or Webhooks.
Knowledge of escalation policies and response tools.
6. Programming & Scripting
Python, Go, or Bash: Useful for automating metrics collection or response actions.
Writing scripts for metric transformations or exporting data to Prometheus/Stackdriver.
7. Version Control & CI/CD
Managing monitoring infrastructure using Git.
Using CI/CD pipelines for Terraform code to ensure consistent and tested deployments.
Additional Information:
- The candidate should have minimum 7.5 years of experience in Google Cloud Container Services.
- This position is based at our Bengaluru office.
- A 15 years full time education is required.