JOB DESCRIPTION
Senior Performance Engineer
Vision AI Platform Public Sector
Location
Global (remote) US business hours overlap required
Reporting to
Assurance Lead/ Assurance Director
Team
Globally distributed engineering team
Industry
Artificial Intelligence Edge Computing Public Sector
Employment
Contract
About the Role
Our Vision AI platform gives US public sector clients federal agencies, smart-city operators, defense contractors, and critical infrastructure teams a real-time window into their physical world. Think live sensor dashboards, geospatial overlays, AI inference result streams, and operational command interfaces used by people who cannot afford a slow or confusing UI.
We are seeking a specialized AI Performance Engineer (Consultant) to drive GPU acceleration, CUDA optimization, and distributed AI workload performance for VisionAI.
This is a hands-on performance engineering role focused on optimizing deep learning inference, GPU/CPU utilization, distributed orchestration, and capacity planning across city-scale AI deployments.
The consultant will work closely with AI, DevOps, and Infrastructure teams to improve latency, throughput, and overall system efficiency for production AI workloads.
Key Responsibilities
- Profile and optimize large-scale AI training and inference workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters.
- Build tools, frameworks, to detect and identify bottlenecks in compute, memory, interconnects, and communication libraries and deliver optimizations to maximize scaling efficiency.
- Develop, maintain and recommend benchmarks for AI training and inference workloads.
- Partner with framework teams (PyTorch, TensorFlow) to upstream performance improvements and enable better scaling APIs.
- Collaborate across the engineering organizations to deliver efficiency in our usage of hardware, software, and infrastructure
- Proactively monitor fleet wide utilization patterns, analyze existing inefficiency patterns, or discover new patterns, and deliver scalable solutions to solve them
Required Qualifications
- 5+ years in AI/ML performance engineering, HPC, or large-scale inference systems
- BS or similar background in Computer Science or related area (or equivalent experience)
- Strong understanding and hands-on modern ML techniques and tools
- Strong hands-on CUDA programming and optimization experience
- Deep understanding of GPU architecture and memory hierarchy
- Experience optimizing PyTorch and/or TensorFlow inference
- Hands-on experience with NVIDIA Triton, Apache Ray, and Kubernetes GPU scheduling
- Experience with RAPIDS and GPU-accelerated data pipelines
- Experience in benchmarking methodologies, performance analysis/profiling (e.g. Nsight), performance monitoring tools
- Strong track record of optimizing large-scale AI systems
Nice to Have
- Neural network architecture optimization experience
- Deep TensorRT optimization expertise
- Video analytics or real-time inference systems experience
- Experience operating large-scale GPU clusters Experience with WebAssembly (WASM) for performance-critical frontend computation.
- Advanced Linux OS, container (e.g. Docker) and GitHub skills
Job Type: Contractual / Temporary
Contract length: 12 months
Work Location: Remote