Senior Site Reliability Engineer

MetAntz
Bangalore City, Bengaluru, Karnataka

Quick apply

Job details

Full-time
2 days ago

Qualifications

BCS
CI/CD
TCP
System administration
Azure
Computer Science
Incident management
Software troubleshooting
Salt
Kubernetes
Ansible
Software deployment
Load balancing
Encryption
Windows
Firewall
AWS
Docker
Bachelor's degree
SSL
ITIL
JavaScript
Distributed systems
Continuous integration
New Relic
Perl
Scripting
DNS
Financial services
Puppet
Linux
Jenkins
Communication skills
Python
Debugging

Full job description

About :

We are recruiting on behalf of a leading global financial technology organization (full-time opportunity on the client's payroll) that is committed to delivering best-in-class service reliability and performance.

As part of its continued investment in platform stability and operational excellence, the organization is expanding its Site Reliability Engineering (SRE) team to support a highly available 24x7 FX trading environment.

We are looking for a highly motivated and technically talented Senior Application SRE who will focus on application monitoring, automation, and optimization to enhance system stability, minimize downtime, and improve overall user experience.

The ideal candidate will bring strong problem-solving skills, experience working with large-scale distributed systems, and a deep understanding of software and infrastructure reliability principles. This role offers an opportunity to work on mission-critical applications in a fast-paced, high-performance environment where reliability and operational excellence are key priorities.

Responsibilities

Ensure the reliability, performance, and availability of applications through proactive monitoring and automation.
Develop and maintain real-time monitoring, alerting, and logging systems to detect and resolve issues before they impact customers.
Automate manual operations, including application deployment, configuration, scaling, and recovery.
Collaborate with software engineering teams to integrate reliability best practices into the development lifecycle.
Conduct root cause analysis (RCA) and implement preventive measures to mitigate recurring issues.
Support a 24x7 distributed enterprise environment across multiple global data centers.
Work closely with Support to enhance incident response processes, ensuring fast and effective resolution of technical escalations.
Participate in on-call rotations to support critical application issues and outages.
Maintain and optimize CI,CD pipelines to ensure fast and reliable application releases.
Enhance system security by managing SSL certificates, encryption, and authentication mechanisms.
Foster a culture of continuous improvement by evaluating new tools, frameworks, and methodologies to enhance system reliability.

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
5+ years of experience in a similar role, focusing on application reliability, automation, and performance optimization.
Strong expertise in Linux and Windows system administration.
Proficiency in at least one scripting language (e.g., Python, Shell, Perl, JavaScript).
Experience with Docker, Kubernetes, or containerization technologies.
Familiarity with CI,CD tools like Jenkins and deployment automation frameworks.
Hands-on experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK Stack, New Relic, Datadog).
Understanding of networking concepts (TCP,IP, DNS, load balancing, firewalls).
Experience with configuration management tools like Ansible, Salt, or Puppet.
Strong debugging and troubleshooting skills across application, database, and infrastructure layers.
Ability to work in a fast-paced, high-pressure environment with multiple priorities.
Excellent communication and collaboration skills to work effectively with engineering and support teams.

Nice-to-Have Skills

Experience in the financial services or trading industry.
Knowledge of distributed computing, cloud platforms (AWS, GCP, Azure).
Exposure to security best practices and compliance standards.
Familiarity with incident management frameworks (ITIL, SRE best practices, or similar methodologies).

Experience and education

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.

Work Location: Hybrid remote in Bangalore City, Bengaluru, Karnataka

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected