Nvidia Engineer

Persistent -
Pune, Maharashtra

Apply Now

Job details

Full-time
4 hours ago

Benefits

Flexible schedule

Qualifications

Torch
Law
Kubernetes
Build automation
Master's degree
Bash (Unix shell)
Docker
Continuous integration
REST
Scripting
APIs
AI
gRPC
Python
Shell Scripting

Full job description

Job Description

About Persistent

We are an AI-led, platform-driven Digital Engineering and Enterprise Modernization partner, combining deep technical expertise and industry experience to help our clients anticipate what’s next. Our offerings and proven solutions create a unique competitive advantage for our clients by giving them the power to see beyond and rise above. We work with many industry-leading organizations across the world, including 20 Fortune 50 companies and 4 of the 5 top banks in both the US and India, and numerous innovators across the healthcare ecosystem.

Our disruptor’s mindset, commitment to client success, and agility to thrive in the dynamic environment have enabled us to sustain our growth momentum. Persistent has been recognized across top industry platforms for innovation, leadership, and inclusion. We reported $1,654.4M FY26 revenue with 17.4% Y-o-Y growth. We have delivered 24 sequential quarters of growth with $436.0M in Q4 FY26 revenue, up 3.2% Q-o-Q and 16.2% Y-o-Y growth. Our 27,500+ global team members, located in 18 countries, have been instrumental in helping the market leaders transform their industries. We have been recognized as the Fastest Growing IT Services Brand Globally in the 2026 Brand Finance IT Services 25 Report. We named a Leader in the Everest Group Private Equity (PE) Services PEAK Matrix® Assessment 2026 and Software Product Engineering PEAK Matrix® Assessment 2026.

About Position:

You will be the team's primary authority on NVIDIA's inference ecosystem NIM (NVIDIA Inference Microservices), Triton Inference Server, TensorRT, and the BioNeMo platform. Your core mission is to take structural biology AI models whether NIM-ready or research-grade Python scripts and turn them into production-quality, API-accessible inference services.

Critical Requirement: Several target models (LigandMPNN, Boltz, custom AlphaFold2 variants) are not yet available as official NVIDIA NIM services. This role requires hands-on ability to build NIM-compliant containers from scratch and configure Triton model repositories for models that currently only have CLI or notebook interfaces.

Role: Nvidia Engineer
Location: All Persistent Locations
Experience: 4-7 Years
Job Type: Full Time Employment

What You'll Do:

NIM Service Deployment
Deploy and configure NVIDIA NIM containers for bio models (AlphaFold2-Multimer, ESMFold, ProteinMPNN) on the GPU cluster
Manage NIM service lifecycle: versioning, health checks, rolling updates, rollback strategies
Tune NIM deployment parameters: instance count, GPU assignment, concurrency settings, request queuing
Integrate deployed NIM endpoints with upstream orchestration (SLURM, Nextflow, REST clients)
Custom NIM Packaging (Primary Focus)
Analyse non-NIM models (LigandMPNN, Boltz, RFDiffusion, etc.) and design their Triton serving architecture
Write Triton model configs (config.pbtxt): input/output tensors, batching policy, backend selection (PyTorch, Python, ONNX, TensorRT)
Build NIM-spec Docker images: base layers, model weights, dependency pinning, health endpoint, OpenAPI schema
scoring)
Export models to ONNX or TensorRT where inference optimization is feasible; document tradeoffs
Test packaged services against reference outputs from original model codebases to validate correctness
NVIDIA Ecosystem & Optimization
Work with NGC private registry: push/pull images, manage model cards, handle credential scoping
Apply TensorRT optimization, FP16/INT8 quantization where applicable for throughput gains
Profile GPU memory footprints and latency of each packaged model; document per-GPU requirements
Stay current with NVIDIA BioNeMo updates, NIM API spec changes, and new bio model releases
Evaluate new models from the research community (CASP, bioRxiv) for NIM packaging feasibility
Collaboration & Documentation
Partner with the MLOps Engineer to ensure packaged services deploy cleanly on cluster
Partner with the Computational Biologist to understand model I/O contracts and validation criteria
Write and maintain NIM packaging runbooks, Triton config templates, and container build guides
Define API schemas (OpenAPI/gRPC proto) for each service so downstream teams can integrate reliably

Expertise You'll Bring:

NVIDIA NIM
Direct hands-on experience deploying NVIDIA NIM containers (not just awareness; actual production use)
Thorough understanding of NIM container specifications:
Health endpoints
Model directory layout
Environment variables
Experience with:
NVIDIA NGC catalog
Private registry
API key management
Familiarity with NVIDIA BioNeMo (advantage):
ESMFold NIM
ProteinMPNN NIM
Triton Inference Server
Writing model repository configurations (config.pbtxt) for multiple backends:
PyTorch
Python
ONNX
TensorRT
Building Triton ensemble pipelines for multi-step inference workflows
Experience with:
Dynamic batching
Sequence batching
Model instance configuration
Using Triton client libraries (tritonclient) in Python for:
Testing
Benchmarking
Model Optimization
Hands-on with TensorRT:
Building engines from ONNX
Precision modes (FP32 / FP16 / INT8)
Profiling
ONNX export from:
PyTorch
JAX models (handling dynamic shapes)
GPU memory profiling using:
nvidia-smi
Nsight Systems
torch.cuda.memory_summary
Understanding transformer inference patterns:
Attention caching
Batching strategies
Bio Models (Preferred)
Practical experience running:
AlphaFold2
AlphaFold-Multimer (end-to-end, not just API usage)
Understanding of LigandMPNN:
Architecture
Input/output tensors (protein graph, ligand context)
Awareness of:
Boltz-1 (MIT, 2024)
Differences vs AlphaFold3 (serving requirements)
Familiarity with:
RoseTTAFold2
ESMFold
RFDiffusion
Programming
Advanced Python:
Async programming
Packaging
CLI development (click, argparse)
FastAPI / gRPC wrappers
Docker expertise:
Multi-stage builds
Layer optimisation
CUDA base image selection
Bash scripting:
Container build automation
CI pipelines
Experience with protein language model embeddings (ESM-2, ESM-3) as model inputs
Kubernetes / Helm experience for hybrid HPC + cloud NIM deployments
Published benchmarks or blog posts on model serving optimization
Experience with Run:ai (workloads, projects, quotas, fractional GPU).
NVIDIA AI Enterprise licensed-stack experience.
NVIDIA Dynamo or disaggregated inference experience.
NeMo Guardrails / NIM safety filters for any LLM-adjacent endpoints.
Slurm + Pyxis/Enroot experience for HPC-style NIM execution alongside Kubernetes.

Benefits:

Competitive salary and benefits package
Culture focused on talent development with quarterly growth opportunities and company-sponsored higher education and certifications
Opportunity to work with cutting-edge technologies
Employee engagement initiatives such as project parties, flexible work hours, and Long Service awards
Annual health check-ups
Insurance coverage: group term life, personal accident, and Mediclaim hospitalization for self, spouse, two children, and parents

Values-Driven, People-Centric & Inclusive Work Environment:

Persistent is dedicated to fostering diversity and inclusion in the workplace. We invite applications from all qualified individuals, including those with disabilities, and regardless of gender or gender preference. We welcome diverse candidates from all backgrounds.

We support hybrid work and flexible hours to fit diverse lifestyles.
Our office is accessibility-friendly, with ergonomic setups and assistive technologies to support employees with physical disabilities.
If you are a person with disabilities and have specific requirements, please inform us during the application process or at any time during your employment

Let’s unleash your full potential at Persistent - persistent.com/careers

“Persistent is an Equal Opportunity Employer and prohibits discrimination and harassment of any kind.”

Open Positions

AI,GPU Acceleration,Kubernetes

Skills Required

PUNE

Location

AI,GPU Acceleration,Kubernetes,ML Ops,Open API,Micro Services,Containers,NextFlow,PyTorch,ONNX,Docker,Scripting,LLM

Desirable Skills

178476

Job Code

Apply Now

Jobseeker tools

Employer Tools

Browse

Stay Connected