Staff TechOps Engineer
Summary / Key Responsibilities
The Staff TechOps Engineer is responsible for ensuring the stability, availability, performance, security, and reliability of production systems and enterprise platforms. The role focuses on managing infrastructure, middleware, cloud environments, monitoring solutions, incident response, automation, deployments, backups, disaster recovery, and operational excellence while supporting mission-critical business applications.
Key Functions & ResponsibilitiesInfrastructure & Platform Administration
- Deploy, configure, administer, and maintain:
- Linux Servers
- Windows Servers
- Cloud Services
- Middleware Platforms
- Production Environments
- Ensure systems meet high availability, scalability, performance, and security requirements.
IBM Cloud Pak Support
Deploy, configure, administer, and support IBM Cloud Pak for Business Automation (CP4BA) components, including:
- Business Automation Workflow (BAW)
- Operational Decision Manager (ODM)
- FileNet Content Manager
- Business Automation Navigator
- Associated platform services and integrations
Middleware & Integration Support
- Troubleshoot and resolve issues related to:
- Integration Flows
- Messaging Platforms
- API Gateways
- Workflow Processing
- Middleware Services
- Enterprise Application Integrations
- Support and optimize enterprise integrations across distributed environments.
Monitoring, Observability & Reliability
- Monitor and maintain:
- Infrastructure
- Cloud environments
- Virtualization platforms
- Networking components
- Storage systems
- Container platforms
- Implement and manage:
- Monitoring Solutions
- Logging Platforms
- Alerting Systems
- Observability Tools
- Proactively identify and prevent:
- Service outages
- Performance degradation
- Capacity bottlenecks
- Infrastructure failures
Incident & Problem Management
- Manage and coordinate:
- Incident Response
- Troubleshooting Activities
- Escalation Management
- Root Cause Analysis (RCA)
- Post-Incident Reviews
- Implement:
- Corrective Actions
- Preventive Measures
- Continuous service improvement initiatives
Automation & DevOps
- Automate operational and administrative activities using:
- Infrastructure as Code (IaC)
- Automation Frameworks
- Scripting Languages
- Develop and maintain automation for deployments, monitoring, backups, and platform administration.
Cross-Functional Collaboration
- Collaborate with:
- Development Teams
- QA Teams
- Support Teams
- Infrastructure Teams
- Security Teams
- Network Teams
- Ensure smooth service delivery and operational excellence across environments.
Production Support
- Provide Tier-2 and Tier-3 Support for:
- Production environments
- Middleware platforms
- Infrastructure services
- Enterprise integrations
- Critical business applications
Documentation & Knowledge Management
- Maintain:
- Technical Documentation
- Runbooks
- Standard Operating Procedures (SOPs)
- Knowledge Base Articles
- Incident Records
- Operational Guides
Qualification & ExperienceEducation
- Degree or Diploma in:
- Information Technology
- Computer Science
- Engineering
- Or an equivalent technical discipline
Experience
- Minimum 4+ years of experience in a TechOps, Production Support, Middleware Support, Platform Operations, Site Reliability Engineering (SRE), or Infrastructure Engineering role.
Required Technical Skills & ExpertiseOperating Systems
Strong hands-on experience with:
- Linux Administration
- Windows Server Administration
DevOps & Automation
Experience with:
- CI/CD Pipelines
- Jenkins
- GitLab CI/CD
- GitHub Actions
- Infrastructure as Code (IaC)
Strong scripting skills in:
Configuration management experience with:
Containers & Orchestration
Strong experience with:
- Red Hat OpenShift
- Docker
- Kubernetes
- Containerized Production Environments
Java Platform Support
Strong expertise in:
- JVM Dump Analysis
- Thread Dump Analysis
- Heap Dump Analysis
- Java Performance Tuning
- Troubleshooting Java-Based Applications
- Garbage Collection (GC) Analysis
IBM Cloud Platforms
Hands-on experience supporting at least two of the following:
- IBM Cloud Pak for Business Automation (CP4BA)
- IBM Cloud Pak for Integration (CP4I)
- IBM Cloud Pak for Data (CP4D)
Monitoring & Observability
Experience with:
- Prometheus
- Grafana
- ELK Stack
- Enterprise Monitoring Solutions
- Application Performance Monitoring (APM)
Networking & Infrastructure
Strong understanding of:
- TCP/IP
- DNS
- VPN Technologies
- Load Balancers
- Network Troubleshooting
- Infrastructure Connectivity
Security & Compliance
Knowledge of:
- Identity & Access Management (IAM)
- Secrets Management
- Security Best Practices
- Compliance Standards
- Access Control Models
Production Operations
Strong experience managing:
- High Availability Environments
- Performance Tuning
- Capacity Planning
- Backup & Recovery Solutions
- Disaster Recovery (DR) Planning
- System Resilience & Business Continuity
Service Reliability Engineering (SRE)
Strong knowledge of:
- SLA (Service Level Agreements)
- SLO (Service Level Objectives)
- SLI (Service Level Indicators)
- Reliability Engineering Principles
- Operational Excellence Practices
Soft Skills
- Strong communication and stakeholder management skills
- Ability to work effectively in cross-functional teams
- Strong analytical and problem-solving abilities
- Ability to perform under pressure in critical production environments
- Good command of English (Reading, Writing, and Communication)
Pay: ₹1,252,322.78 - ₹2,545,233.81 per year
Benefits:
Application Question(s):
- Do you have experience supporting critical production applications?
- Have you worked with IBM Cloud Pak solutions?
- Which IBM products have you supported?
(Multi-select)
CP4BA
CP4I
CP4D
BAW
FileNet
ODM
IBM MQ
- Have you performed JVM Heap Dump or Thread Dump analysis?
- Have you troubleshooted Java-based enterprise applications in production?
- Which monitoring tools have you used?
(Multi-select)
Grafana
Prometheus
ELK
Splunk
Dynatrace
AppDynamics
- Do you have experience with incident management and RCA?
- Have you worked with networking concepts such as DNS, TCP/IP, VPN, Load Balancers?
- Notice Period?
Experience:
- Production Support: 5 years (Preferred)
- TechOps: 5 years (Preferred)
- Middleware Support: 5 years (Preferred)
- Platform Operations: 5 years (Preferred)
- OpenShift experience do you have?: 5 years (Preferred)
Location:
- Mumbai, Maharashtra (Mumbai) (Preferred)
Work Location: In person