Project Role : Custom Software Engineer
Project Role Description : Develop custom software solutions to design, code, and enhance components across systems or applications. Use modern frameworks and agile practices to deliver scalable, high-performing solutions tailored to specific business needs.
Must have skills : Site Reliability Engineering
Good to have skills : NA
Minimum 5 year(s) of experience is required
Educational Qualification : 15 years full time education
Summary:
We are seeking a Full-Stack Site Reliability Engineer (SRE) to design, build, and operate highly reliable, cloud-native services running primarily on AWS and Azure. This is an engineering-first role combining software development, distributed systems expertise, and operational ownership.
SREs are responsible for how systems behave in production. You will work across application code, infrastructure, deployment pipelines, and observability stacks to ensure NA Returns platforms meet strict availability, performance, scalability, and auditability expectations.
Roles & Responsibilities:
Own production reliability for NA Returns services across AWS and Azure environments
Define, track, and improve SLIs, SLOs, and error budgets for compliance-critical workflows
Design and implement automation to eliminate manual operational toil
Build and maintain observability: metrics, logs, traces, alerts, and dashboards
Conduct blameless postmortems and drive systemic remediation
Partner with engineering and product teams to influence reliability-focused system design
Perform capacity planning, performance tuning, and resilience testing
Ensure deployment safety via CI/CD guardrails, progressive rollouts, and automated rollback
Support auditability, traceability, and controlled change management required for regulated systems
Professional & Technical Skills:
Write production-quality code (Go, C#, Java, Python, or similar)
Build internal tooling, automation frameworks, and self-service platforms
Review application and infrastructure designs for operability and failure modes
Treat infrastructure and operations as software problems
Required Qualifications
Bachelor s degree in Computer Science or equivalent practical experience
Strong software engineering background with distributed systems experience
Hands-on experience with AWS and/or Azure production systems
Strong Linux, networking, and systems troubleshooting skills
Experience with Infrastructure as Code (Terraform, ARM, CloudFormation)
Experience supporting production systems with on-call responsibility
Preferred Qualifications
Experience operating compliance-critical or financial systems
Kubernetes / container orchestration experience
Experience with observability platforms (Prometheus, Grafana, CloudWatch, Azure Monitor)
Experience defining SLOs for business-critical workflows
Strong incident leadership and cross-team communication skills
Additional Information:
You will be working with a Trusted Tax Technology Leader, committed to delivering reliable and innovative solutions
15 years full time education