Job purpose
We are seeking a Site Reliability Engineer to help build and support scalable and highly reliable software systems for our production investment pipeline. Successful candidates display an aptitude for learning how to improve system availability, latency, performance, efficiency, and capacity measures while also participating in change management, monitoring, emergency response, and capacity planning processes. This role will collaborate closely with developers on designing scalable systems, improving release processes, and reducing operational burden from development teams. Site reliability engineers at firm must be able to adapt to rapid changes in business and technological requirements as well as operate distributed systems in a performant manner.
Key responsibilities
- Contribute to platform reliability by monitoring performance and setting up relevant metrics
- Assist in managing CI/CD procedures and promoting best DevOps practices through standardized release processes
- Help containerize and scale services to fit business needs
- Assist in monitoring and managing hardware estate as well as file systems
- Apply industry standard security practices
- Daily work with bleeding-edge LLM tools and frameworks
- Evaluate and adopt new open-source technologies while also leveraging mature solutions, with support from the team
- Be an active member of an agile engineering team who is continuously looking to improve team performance
Key competencies
- Strong experience in Python programming, with a willingness to deepen understanding of Python best practices, idioms, and limitations
- Hands-on experience in Linux system administration, with confidence in standard command-line tools and environments
- Working knowledge of containerization and orchestration tools such as Docker and Kubernetes; familiarity with Helm is a plus
- Exposure to monitoring and observability tools like Grafana and Prometheus
- Proficiency in version control systems (Git) and solid understanding of CI/CD concepts and pipelines
- Basic knowledge in various databases and messaging technologies such as SQL, Redis, and Kafka
- Strong interest in software engineering principles and running real-world applications at scale
- Beneficial if you have experience with Roo Code, Claude, or other LLM tools. Candidates need not have prior experience in financial services
- Ability to operate in a collaborative, team-oriented culture.
- Good problem-solving abilities and judgment with strong attention to detail.
- Good communication (verbal and written), critical thinking, and attention to detail
- Experience in managing client stakeholders