HPC System engineer ( remote)

Nexifyr Consulting Pvt Ltd
Bengaluru, Karnataka

Quick apply

Job details

Full-time

Qualifications

NFS
Bash (Unix shell)
AWS
Scripting
Linux
Python

Full job description

What we do

Our platforms serve as the foundation for digital R&D transformation across industries – helping teams innovate faster, collaborate securely, and operate efficiently across clouds.

HPC Systems Engineer JD (3–7 years in HPC)

Primary Responsibilities

Diagnose and resolve HPC issues (HPC applications, scheduler, storage).

Analyze job failures, performance bottlenecks, and system logs.

Manage and optimize schedulers like SLURM, PBS.

Perform cluster health checks and proactive monitoring.

Install and Support HPC applications and user environments.

Troubleshoot networking issues (InfiniBand, Ethernet).

Identify recurring issues and implement permanent fixes.

Collaborate with L3 for deep technical issues.

Automate routine operational tasks using scripts.

Update and improve standard operating procedures (SOPs), runbooks, and documentation.

Required Skills

Deep understanding of HPC architecture

Strong Linux administration skills

Experience with SLURM, PBS

Knowledge of parallel computing (MPI, OpenMP)

Storage systems (Lustre, NFS, GPFS)

Networking (InfiniBand preferred)

Scripting (Python, Bash)

Knowledge of AWS ParallelCluster and AWS PCS will be an advantage

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected