Job Purpose
Cloud System Engineer will function as a core infrastructure specialist within cloud engineering, integrating compute, storage, virtualization and Backup into multi-tenant cloud platforms, while enabling automation, orchestration, and service delivery via IaaS frameworks.
Role Description
- Deploy, configure, and manage enterprise-grade servers (Dell, HPE, Supermicro GPU/CPU platforms)
- Perform OS provisioning (Linux/Windows) and lifecycle management
- Monitor system performance and optimize CPU, memory, and IO utilization
- Implement firmware upgrades, patching, and hardware health monitoring
- Support workload sizing and capacity planning for cloud tenants
- Manage and administer storage platforms (Block, File, Object – e.g., PowerScale, SAN/NAS, SDS)
- Configure storage pools, volumes, quotas, and replication policies
- Ensure optimal performance through tiering, caching, and load balancing
- Implement data lifecycle management and storage efficiency (dedupe, compression)
- Monitor storage health, latency, and throughput KPIs
- Deploy and manage hypervisors (VMware, KVM, Hyper-V)
- Manage VM lifecycle: provisioning, cloning, migration, decommissioning
- Configure HA, DRS, clustering, and resource scheduling
- Optimize hypervisor performance and troubleshoot VM-related issues
- Support multi-tenant virtualization environments and cloud orchestration platforms
- Design and manage backup solutions (e.g., Commvault, Veeam)
- Configure backup policies for VMs, databases, file systems, and applications
- Monitor backup jobs, ensure SLA compliance, and troubleshoot failures
- Perform restore operations (file-level, VM-level, cross-platform)
- Implement DR strategies including replication, snapshot management, and failover testing
- Integrate compute, storage, and virtualization into cloud orchestration platforms
- Support IaaS platform provisioning and service catalog enablement
- Automate provisioning using scripts/tools (e.g., Ansible, Terraform )
- Ensure API-driven integration for cloud service consumption
- Monitor infrastructure using enterprise tools (alerts, logs, dashboards)
- Perform root cause analysis (RCA) for incidents and failures
- Participate in change management and release processes
- Ensure adherence to SLA, uptime, and service availability targets
- Implement hardening (OS, hypervisor, storage access)
- Ensure compliance with enterprise security policies
- Manage access control (RBAC, IAM integration)
- Support audit readiness and regulatory compliance
- Maintain LLDs, HLDs, SOPs, and runbooks
- Prepare reports on capacity, utilization, incidents, and availability
- Contribute to service improvement and optimization initiatives
Experience & Educational Requirements
Qualifications and Experience
EDUCATIONAL QUALIFICATIONS: (degree, training, or certification required)
BE/B-Tech or equivalent with Computer Science or Electronics & Communication
RELEVANT EXPERIENCE: (no. of years of technical, functional, and/or leadership experience or specific exposure required)
- Experience: 5–12 years of experience in Data Center / Cloud Infrastructure
- Cloud Infrastructure Expertise: Proven track record in the design, operations, and maintenance of public/private cloud platforms, including service migration from on-premises to cloud or between enterprise data centers.
- Proficiency with containerization and orchestration technologies (Kubernetes, Docker).
- Private Cloud Technologies: Hands-on experience with OpenStack, VMware, Nutanix, and Kubernetes-based private cloud platforms, along with SaaS integration.
- Multi-Cloud & Hybrid Cloud: Skilled in design, deployment, and maintenance of hybrid/multi-cloud architectures with seamless connectivity and governance.
- Cloud Migration & Consulting: Strong expertise in cloud migration strategy, execution, and advisory consulting for enterprise clients.
- Automation & DevOps: Proficient in Infrastructure as Code (Terraform, Ansible, etc.), CI/CD pipelines, and automation practices for cloud deployments.
Preferred Skills
Preferred Technical Skills
- Servers: Dell PowerEdge, HPE ProLiant, GPU Servers
- Storage: SAN/NAS/Object, PowerScale, SDS platforms
- Hypervisors: VMware ESXi, KVM, Hyper-V
- Backup: Commvault / Veeam / enterprise backup tools
- OS: Linux (RHEL/CentOS/Ubuntu), Windows Server
- Networking basics (VLANs, IP, load balancing)
- Cloud platforms (OpenStack / VMware Cloud / Azure Stack type environments)
- Automation (Ansible, Python, PowerShell)
- Container / Kubernetes awareness
- DR orchestration and multi-site replication
- Strong troubleshooting and analytical ability
- Ability to work in 24x7 support / operations model