Site Reliability Engineer

Billtrust Careers -
Hyderabad District, Telangana

Apply Now

Job details

1 day ago

Qualifications

CI/CD
System administration
Azure
Go
Kubernetes
UNIX
Git
Bash (Unix shell)
AWS
Terraform
Continuous integration
Scripting
GitHub
Linux
AI
Communication skills
Python
Shell Scripting

Full job description

Who We Are

Finance leaders choose Billtrust to get paid faster, control costs, and maximize customer satisfaction. As the leader in B2B accounts receivable workflow and payment software, we provide the world’s leading brands with AI-powered solutions across the full AR lifecycle—from invoice presentment and payment processing to cash application and collections. With over 2,600 global customers, more than $1 trillion in invoice dollars processed, and a proprietary network of 13 million buyers, Billtrust delivers business value through deep industry expertise and a culture relentlessly focused on meaningful customer outcomes.

We’re an AI-first company, not just in what we build for our customers, but in how we work. Across every function, our teams use AI tools daily to work faster, make better decisions, and deliver higher-quality outcomes. We hire exceptional people, give them cutting-edge AI capabilities, and measure success by the impact they create. If you want to do the best work of your career at the frontier of AI and fintech, Billtrust is the place to do it.

Our Values

Customers

We relentlessly increase value for customer and do the right thing for them.

Action

We make ‘thoughtfully fast’ decisions, act quickly, cut through red tape, deliver progress not perfection, take ownership and accountability.

Team Spirit

We put the team ahead of ourselves, foster trust and respect, collaborate with passion, despise toxic politics, value our differences, and celebrate together.

Innovation

We challenge the status quo, experiment thoughtfully, and are novel and brilliant in what we create.

Excellence

We love to win, but we hate losing even more. We aspire to be the best and take pride in our work. When we fall short, we own it and come back stronger.

Site Reliability Engineer

As a Site Reliability Engineer within our Operations Engineering Center, you'll ensure the reliability, scalability, and performance of Billtrust's infrastructure that powers mission-critical order-to-cash operations. You'll participate in our follow-the-sun SRE coverage across time zones. You'll respond to incidents, implement monitoring and alerting strategies, and engineer autonomous incident response systems through agentic runbooks and intelligent triage. Your work will directly impact billions of dollars in transactions processed through our platform while pioneering AI-driven operational excellence.

Key Responsibilities

Respond to incidents, perform root cause analysis, and lead post-mortem discussions

Implement and maintain comprehensive monitoring, alerting, and observability across infrastructure

Establish and maintain SLO frameworks, tracking and improving reliability metrics

Engineer autonomous alert triage agents and agentic runbooks for incident response

Design and build intelligent incident correlation engines using AI/ML techniques

Develop and maintain infrastructure automation, CI/CD pipelines, and deployment procedures

Manage Kubernetes clusters, container orchestration, and cloud platform resources (AWS)

Lead toil reduction initiatives through automation, focusing on high-impact pain points

Collaborate with platform and product teams on infrastructure requirements and capacity planning

Required Qualifications

Experience & Technical Background

5+ years of hands-on experience in Site Reliability Engineering or infrastructure operations

Strong proficiency with Linux/Unix systems administration and shell scripting

Experience with cloud platforms (AWS preferred, Azure or GCP acceptable)

Hands-on Kubernetes and container orchestration experience

Demonstrated expertise in incident response, troubleshooting, and post-mortem analysis

Strong background with monitoring tools (Datadog, Prometheus, Grafana, PagerDuty)

Experience with infrastructure automation and infrastructure-as-code tools (Terraform)

Proficiency with at least one programming/scripting language (Python, Go, Bash preferred)

Proficiency using Claude Code, GitHub Copilot or similar AI coding assistance

Soft Skills & Attributes

Excellent communication skills, particularly during high-stress incident situations

Problem-solving mindset with focus on automated solutions over manual workarounds

Reliability-first mentality with attention to detail and systems thinking

Ability to thrive in a distributed, follow-the-sun team environment

Comfort with on-call responsibilities and 24x7 operational commitment

Apply Now

Jobseeker tools

Employer Tools

Browse

Stay Connected