As a Senior Engineer focused on Performance and Cost Observability, you will take a lead role in architecting, migrating, and optimizing our end-to-end monitoring ecosystems. At Shipt, we believe technology lacks significance without a human touch, and that extends to how we empower our engineering teams. With your deep technical expertise, you will guide teams in delivering reliable, resilient, and cost-effective distributed systems that directly support our mission to spark connections and serve our communities. You will be responsible for driving critical technical discussions around system health and financial efficiency, mentoring junior engineers, and collaborating with cross-functional stakeholders to ensure our infrastructure scales sustainably. The role involves tackling high-impact technical challenges—from managing high-cardinality telemetry data to building transparent cost models—shaping both the current observability landscape and the long-term infrastructure strategy of the company.
Architect and Lead: Design, migrate, and optimize end-to-end monitoring and observability solutions that ensure the absolute reliability, resilience, and cost-effectiveness of our large-scale distributed systems.
Drive Reliability Standards: Lead the definition, implementation, and standardization of Service Level Objectives (SLOs) and Service Level Indicators (SLIs) across all engineering teams, fostering a culture of measurable reliability.
Manage Complex Telemetry: Drive the strategy for high-cardinality data management and distributed tracing using standards like OpenTelemetry, ensuring deep system insights without unchecked cost sprawl.
Optimize Financial Efficiency: Implement and refine critical performance and FinOps observability metrics. You will actively track and optimize Revenue per unit of infrastructure costs, Cost per deployed service, and Service utilization rates.
Collaborate Strategically: Work closely with Finance, Infrastructure, and Engineering Leadership to gather requirements, clarify efficiency objectives, and build transparent showback models of engineering costs to the business.
Mentor and Empower: Provide guidance and mentorship to junior and mid-level engineers, helping them master performance optimization, FinOps observability practices, and effective monitoring strategies.
Scale for the Future: Collaborate with Principal and Distinguished Engineers to design scalable observability architectures that align with Shipt’s long-term business growth and infrastructure sustainability.
Resolve and Innovate: Analyze complex performance bottlenecks and cost inefficiencies, propose innovative solutions, and take ownership of their resolution in production environments.
Bachelor’s degree in Computer Science, Software Engineering, or a related field (or equivalent practical experience).
4+ years of professional software development and systems engineering experience, with a heavy emphasis on Observability, Site Reliability Engineering (SRE), or FinOps.
Expertise in one or more programming languages such as Golang, Java, Python, or C++.
Deep technical knowledge of modern observability frameworks, specifically OpenTelemetry, and proven experience managing high-cardinality telemetry data.
Demonstrated experience in defining, measuring, and operationalizing SLOs and SLIs within large-scale microservices or distributed systems.
Strong understanding of performance and FinOps observability practices, with the ability to measure infrastructure efficiency and implement engineering cost showback models.
Excellent analytical skills, with the ability to identify the root causes of complex performance issues or cloud cost anomalies and develop effective, automated solutions.
Strong leadership, communication, and collaboration skills, with the ability to work effectively across technical teams and align engineering efforts with Finance and business stakeholders.