About Persistent
We are an AI-led, platform-driven Digital Engineering and Enterprise Modernization partner, combining deep technical expertise and industry experience to help our clients anticipate what’s next. Our offerings and proven solutions create a unique competitive advantage for our clients by giving them the power to see beyond and rise above. We work with many industry-leading organizations across the world, including 20 Fortune 50 companies and 4 of the 5 top banks in both the US and India, and numerous innovators across the healthcare ecosystem.
Our disruptor’s mindset, commitment to client success, and agility to thrive in the dynamic environment have enabled us to sustain our growth momentum. Persistent has been recognized across top industry platforms for innovation, leadership, and inclusion. We reported $1,654.4M FY26 revenue with 17.4% Y-o-Y growth. We have delivered 24 sequential quarters of growth with $436.0M in Q4 FY26 revenue, up 3.2% Q-o-Q and 16.2% Y-o-Y growth. Our 27,500+ global team members, located in 18 countries, have been instrumental in helping the market leaders transform their industries. We have been recognized as the Fastest Growing IT Services Brand Globally in the 2026 Brand Finance IT Services 25 Report. We named a Leader in the Everest Group Private Equity (PE) Services PEAK Matrix® Assessment 2026 and Software Product Engineering PEAK Matrix® Assessment 2026.
About Position:
You will be the team's primary authority on NVIDIA's inference ecosystem NIM (NVIDIA Inference Microservices), Triton Inference Server, TensorRT, and the BioNeMo platform. Your core mission is to take structural biology AI models whether NIM-ready or research-grade Python scripts and turn them into production-quality, API-accessible inference services.
Critical Requirement: Several target models (LigandMPNN, Boltz, custom AlphaFold2 variants) are not yet available as official NVIDIA NIM services. This role requires hands-on ability to build NIM-compliant containers from scratch and configure Triton model repositories for models that currently only have CLI or notebook interfaces.
Role: Nvidia Engineer-
Location: All Persistent Locations
-
Experience: 4-7 Years
-
Job Type: Full Time Employment
What You'll Do:
-
NIM Service Deployment
-
Deploy and configure NVIDIA NIM containers for bio models (AlphaFold2-Multimer, ESMFold, ProteinMPNN) on the GPU cluster
-
Manage NIM service lifecycle: versioning, health checks, rolling updates, rollback strategies
-
Tune NIM deployment parameters: instance count, GPU assignment, concurrency settings, request queuing
-
Integrate deployed NIM endpoints with upstream orchestration (SLURM, Nextflow, REST clients)
-
Custom NIM Packaging (Primary Focus)
-
Analyse non-NIM models (LigandMPNN, Boltz, RFDiffusion, etc.) and design their Triton serving architecture
-
Write Triton model configs (config.pbtxt): input/output tensors, batching policy, backend selection (PyTorch, Python, ONNX, TensorRT)
-
Build NIM-spec Docker images: base layers, model weights, dependency pinning, health endpoint, OpenAPI schema
Implement ensemble pipelines in Triton for multi-stage workflows (MSA search folding- scoring)
-
Export models to ONNX or TensorRT where inference optimization is feasible; document tradeoffs
-
Test packaged services against reference outputs from original model codebases to validate correctness
-
NVIDIA Ecosystem & Optimization
-
Work with NGC private registry: push/pull images, manage model cards, handle credential scoping
-
Apply TensorRT optimization, FP16/INT8 quantization where applicable for throughput gains
-
Profile GPU memory footprints and latency of each packaged model; document per-GPU requirements
-
Stay current with NVIDIA BioNeMo updates, NIM API spec changes, and new bio model releases
-
Evaluate new models from the research community (CASP, bioRxiv) for NIM packaging feasibility
-
Collaboration & Documentation
-
Partner with the MLOps Engineer to ensure packaged services deploy cleanly on cluster
-
Partner with the Computational Biologist to understand model I/O contracts and validation criteria
-
Write and maintain NIM packaging runbooks, Triton config templates, and container build guides
-
Define API schemas (OpenAPI/gRPC proto) for each service so downstream teams can integrate reliably
Expertise You'll Bring:
-
NVIDIA NIM
-
Direct hands-on experience deploying NVIDIA NIM containers (not just awareness; actual production use)
-
Thorough understanding of NIM container specifications:
-
Health endpoints
-
Model directory layout
-
Environment variables
-
Experience with:
-
NVIDIA NGC catalog
-
Private registry
-
API key management
-
Familiarity with NVIDIA BioNeMo (advantage):
-
ESMFold NIM
-
ProteinMPNN NIM
-
Triton Inference Server
-
Writing model repository configurations (config.pbtxt) for multiple backends:
-
PyTorch
-
Python
-
ONNX
-
TensorRT
-
Building Triton ensemble pipelines for multi-step inference workflows
-
Experience with:
-
Dynamic batching
-
Sequence batching
-
Model instance configuration
-
Using Triton client libraries (tritonclient) in Python for:
-
Testing
-
Benchmarking
-
Model Optimization
-
Hands-on with TensorRT:
-
Building engines from ONNX
-
Precision modes (FP32 / FP16 / INT8)
-
Profiling
-
ONNX export from:
-
PyTorch
-
JAX models (handling dynamic shapes)
-
GPU memory profiling using:
-
nvidia-smi
-
Nsight Systems
-
torch.cuda.memory_summary
-
Understanding transformer inference patterns:
-
Attention caching
-
Batching strategies
-
Bio Models (Preferred)
-
Practical experience running:
-
AlphaFold2
-
AlphaFold-Multimer (end-to-end, not just API usage)
-
Understanding of LigandMPNN:
-
Architecture
-
Input/output tensors (protein graph, ligand context)
-
Awareness of:
-
Boltz-1 (MIT, 2024)
-
Differences vs AlphaFold3 (serving requirements)
-
Familiarity with:
-
RoseTTAFold2
-
ESMFold
-
RFDiffusion
-
Programming
-
Advanced Python:
-
Async programming
-
Packaging
-
CLI development (click, argparse)
-
FastAPI / gRPC wrappers
-
Docker expertise:
-
Multi-stage builds
-
Layer optimisation
-
CUDA base image selection
-
Bash scripting:
-
Container build automation
-
CI pipelines
-
Experience with protein language model embeddings (ESM-2, ESM-3) as model inputs
-
Kubernetes / Helm experience for hybrid HPC + cloud NIM deployments
-
Published benchmarks or blog posts on model serving optimization
-
Experience with Run:ai (workloads, projects, quotas, fractional GPU).
-
NVIDIA AI Enterprise licensed-stack experience.
-
NVIDIA Dynamo or disaggregated inference experience.
-
NeMo Guardrails / NIM safety filters for any LLM-adjacent endpoints.
-
Slurm + Pyxis/Enroot experience for HPC-style NIM execution alongside Kubernetes.
Benefits:
-
Competitive salary and benefits package
-
Culture focused on talent development with quarterly growth opportunities and company-sponsored higher education and certifications
-
Opportunity to work with cutting-edge technologies
-
Employee engagement initiatives such as project parties, flexible work hours, and Long Service awards
-
Annual health check-ups
-
Insurance coverage: group term life, personal accident, and Mediclaim hospitalization for self, spouse, two children, and parents
Values-Driven, People-Centric & Inclusive Work Environment:
Persistent is dedicated to fostering diversity and inclusion in the workplace. We invite applications from all qualified individuals, including those with disabilities, and regardless of gender or gender preference. We welcome diverse candidates from all backgrounds.
-
We support hybrid work and flexible hours to fit diverse lifestyles.
-
Our office is accessibility-friendly, with ergonomic setups and assistive technologies to support employees with physical disabilities.
-
If you are a person with disabilities and have specific requirements, please inform us during the application process or at any time during your employment
Let’s unleash your full potential at Persistent - persistent.com/careers
“Persistent is an Equal Opportunity Employer and prohibits discrimination and harassment of any kind.”