About Us
Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid.
At Visa, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world.
Join Visa and do work that matters – to you, to your community, and to the world. Progress starts with you.
Job Description
The desired candidate will join the Site Reliability Engineering team within the Platform Products Technology organization. This role is focused on improving the reliability, observability, and operational efficiency of production platforms by building automation, AI-enabled tools, monitoring capabilities, and self-service solutions.
The candidate will be responsible for supporting production applications, investigating and troubleshooting issues, enabling client launches, resolving client-reported problems, and contributing bug fixes for production defects. A key focus of this role will be to leverage AI, machine learning-assisted workflows, automation frameworks, and data-driven insights to reduce manual effort, improve incident response, and enhance system reliability.
This position is ideal for a software engineer with a strong engineering mindset who enjoys solving real-world reliability challenges at scale, building impactful automation, applying AI to production operations, and working on systems that directly improve customer experience and platform availability.
Key Responsibilities:
Develop a strong understanding of platform products, APIs, end-to-end transaction flows, product architecture, system dependencies, and infrastructure components.
Build AI-assisted tools and automation for proactive detection of anomalies, incidents, performance issues, reliability risks, recurring failures, and client-impacting conditions.
Develop intelligent monitoring and observability solutions to improve visibility into system health, application behavior, transaction flows, logs, metrics, alerts, and operational trends.
Create automation frameworks and self-service tools to reduce manual operational tasks, accelerate troubleshooting, improve governance, and support faster incident resolution.
Support production applications on a day-to-day basis, including monitoring application health, handling client escalations, supporting business operations tasks, troubleshooting client-reported issues, and managing incidents.
Work closely with support teams during production incidents and issues, helping with triage, technical investigation, root cause analysis, mitigation, and timely communication to stakeholders.
Provide code-level bug fixes for production issues and contribute to application stability, reliability, performance, and resilience improvements.
Support client launches and production readiness activities, including feature understanding, log and monitoring improvements, operational readiness reviews, and early identification of gaps or risks.
Partner with engineering teams and stakeholders to improve daily operations, strengthen production support processes, and drive continuous improvement in reliability practices.
Analyze production trends and recurring issues to recommend improvements in architecture, application design, monitoring, alerting, documentation, and operational processes.
Work with minimal supervision to complete day-to-day engineering and operational activities with strong ownership, quality, and accuracy.
Visa requires at least 3 days in office, expectations of these days will be confirmed by your Hiring Manager.
Qualifications
Basic Qualifications:
-
Bachelor's degree, OR 3+ years of relevant work experience
Preferred Qualifications:
-
Bachelor's degree, OR 3+ years of relevant work experience
Bachelor’s degree, or 2+ years of relevant work experience in software engineering, production engineering, site reliability engineering, application support, or a related technical role.
2+ years of experience supporting or developing applications in a large-scale, highly available production environment.
2+ years of experience with Java, REST APIs, SQL/NoSQL databases, automation, debugging, and production operations.
Strong understanding of object-oriented programming, software engineering principles, and system design fundamentals.
Experience developing tools, scripts, utilities, or automation to improve operational efficiency.
Experience with production incident management, root cause analysis, change management, and problem management.
Experience with observability and log analysis tools such as Grafana, Splunk, Prometheus, ELK, Datadog, Dynatrace, AppDynamics, or similar.
Experience working in Agile, DevOps, or SRE-oriented engineering environments.
Strong communication, collaboration, documentation, and stakeholder management skills.
Visa is an EEO Employer
Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.