At Charles Schwab, our purpose is simple: we championclients’goals with passion and integrity. Guided by honesty, mutualrespectand a commitment to doingwhat’sright, we bring innovation, education, and service together to help shape financial futures. Our people are the foundation of our success – they approach their work with curiosity and collaboration, coming together to create solutions that make a meaningful impact for clients and communities. As we expand into India, we are bringing this same culture of inclusion, learning, and opportunity to new talent. Joining us means becoming part of a global team where your work matters and your future can take shape.
Our Hyderabad location is central to Schwab’s growth, bringing together talented people and technology to drive innovation,scale,and efficiency. Here, you will work alongside teams who create solutions that support millions of clients every day. The work you do is more than daily operations –it’sa chance to experiment, learn, and build within avalue–driven, supportive environment. This is a unique opportunity to be part of our early growth phase and shape something new, backed by the stability and strength of a Fortune 500 company. Your impact begins on day one, and your contributions will help define our future in the region
We are seeking a Senior Manager, AI Site Reliability Engineering to lead a team that builds intelligent observability, monitoring, and deployment automation solutions using AI-augmented development practices. This is not traditional production support. You will lead AI SRE engineers who build software solutions for operational challenges, leveraging Gen AI to accelerate workflows, automate incident response, and drive systems toward five 9s (99.999%) availability while ensuring capacity planning, redundancy, and the highest standards of reliability, security, and scalability. This is a hands-on technical leadership role where you will actively contribute to architecture, code, and AI-driven tooling alongside your team. Ideal for engineering leaders who are curious, adaptable, and excited about working at the intersection of software engineering and AI
KeyResponsibilities:
People Leadership
Lead, mentor, and grow a high-performing team of AI Site Reliability Engineers
Foster a culture of ownership, accountability, AI-first thinking, and psychological safety
Drive hiring, onboarding, career development, and performance management
Remove blockers and advocate for team needs across the organization
Hands-On Technical Leadership
Stay hands-on with architecture, code reviews, AI tooling evaluation, and critical system design decisions
Set and enforce engineering standards for code quality, AI-generated code review, and operational excellence
Serve as the technical escalation point for high-severity incidents, leveraging AI-driven diagnostics
High Availability & Resilience
Architect and drive systems toward 99.999% uptime across critical systems
Define, measure, and enforce SLOs, SLIs, and error budgets
Champion AI-powered self-healing systems with automated failover, redundancy, and graceful degradation
Drive AI-assisted capacity planning, demand forecasting, and load testing
Establish chaos engineering practices enhanced with AI-driven failure prediction
Observability & Monitoring
Lead design of AI-enhanced observability platforms covering metrics, logs, traces, and intelligent alerting
Drive AI-powered anomaly detection, predictive alerting, and proactive system health management
Leverage Gen AI to auto-generate and refine dashboards, alert rules, and runbooks
Build real-time availability dashboards with AI-driven trend analysis tracking 99.999% targets
Root Cause Analysis
Lead AI-accelerated root cause analysis with thorough postmortems and actionable remediation
Build AI-powered diagnostic tools that automatically correlate logs, metrics, and traces
Use Gen AI to analyze incident patterns, predict recurring failures, and recommend preventive actions
Build AI agents that automate initial triage and preliminary RCA for common incident types
Continuously reduce MTTD and MTTR through AI-assisted workflows
Deployment Automation
Oversee AI-enhanced CI/CD pipelines with AI-driven deployment risk scoring
Drive progressive delivery with canary releases, blue-green deployments, and automated rollbacks triggered by AI anomaly detection
Leverage Gen AI to generate deployment scripts, IaC templates, and pipeline configurations
Eliminate manual toil by building AI agents for repetitive operational tasks
AI-Driven Development
Drive team-wide adoption of GenAI tools (GitHub Copilot, Cursor, ChatGPT) for coding, debugging, documentation, and operations
Developer ownership is non-negotiable. All code, whether human or AI-generated, must be reviewed, tested, and understood before merging
Lead design of AI agents for incident response, log analysis, capacity management, and operational automation
Define agent goals, tool use, memory, and orchestration logic for multi-step SRE workflows
Drive spec-driven development and continuously evaluate emerging Gen AI tools
Testing & Quality
Leverage Gen AI to generate tests, identify coverage gaps, and create edge-case scenarios
Drive AI-powered security scanning, performance testing, and reliability validation early in development
Modernization & Strategy
Lead AI-assisted modernization of monitoring, alerting, and deployment systems
Define and execute the AI SRE roadmap aligned with business and technology objectives
Partner with product, platform, and application teams to embed AI-driven reliability practices
Establish team-level standards for prompt engineering, custom instructions, and AI agent development
Champion AI-enabled practices and foster knowledge sharing across the organization
Required Qualifications:
-
12+ years of software engineering experience, with 4+ years in people leadership
-
Bachelor's degree in Computer Scienceor equivalent
-
Hands-on technically with ability to architect, review code, and contribute to critical decisions
-
Proven experience operating systems at 99.99% or higher availability
-
Hands-on experience with GenAI coding tools anddemonstratedability to drive team adoption
-
Proficiencyin Java or .NET, with scripting skills in Python, Bash, or PowerShell
-
Experience building or leading development of AI agents for operational automation
-
Strong prompt engineering skills with ability to define team-level AI practices
-
Deep understanding of HA patterns including active-active, multi-region failover, and distributed consensus
-
Hands-on experience with observability platforms (Datadog, Splunk, Grafana, Prometheus, ELK, or similar)
-
Strong experience with CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, ArgoCD, or similar)
-
Proventrack recordleading RCA efforts and driving reliability improvements
-
Strong CS fundamentals including system design, networking, concurrency, and algorithms
-
Demonstrated ability to hire, develop, andretaintop engineering talent
PreferredQualifications:
-
Experience with cloud platforms (GCP, AWS, or Azure) and IaC (Terraform, Pulumi, or similar)
-
Experience with containerization and orchestration (Docker, Kubernetes)
-
Knowledge of chaos engineering and ML-powered anomaly detection
-
Experiencewith legacysystem modernization
What Success Looks Like -
Builds andretainsa high-performing AI SRE team delivering reliable, AI-driven solutions
-
Stays hands-on and earns technical credibility through direct contributions to architecture and code
-
Drives and sustains 99.999% availability through AI-enhanced reliability engineering
-
Delivers AI-powered observability that predicts issues before customer impact
-
Continuously reduces MTTD and MTTR through AI-accelerated RCA and automated diagnostics
-
Builds fully automated deployment pipelines with zero-downtime releases and minimal toil
-
Establishes the team's AI-enabled engineering practices as a model for the organization
At Schwab India, you’re empowered to shape your future. We support your growth through meaningful work, continuous learning, and a culture rooted in trust and collaboration – so you can build the skills to make a lasting impact. Our benefits are designed to care for your wellbeing, your family, and your long-term financial security.
-
Competitive compensation and retirement programs including Employee Provident Fund (EPF), Gratuity, and optional National Pension System (NPS) contributions
-
Robust Paid Time Off, including annual/privilege leave, sick and casual leave, public holidays, maternity/paternity leave, and more
-
Education assistance for continued learning to help you grow
-
Comprehensive medical insurance with Outpatient Department (OPD) services, including vaccination, pharmacy, dental, and vision coverage
-
Annual reimbursement for health check-ups and mental health support through our Employee Assistance Program (EAP)
-
Childcare (creche) reimbursement for eligible employees
-
Transportation and meal benefits that support your day-to-day work
-
Group life, personal accident, and critical illness insurance