Role Title : Distributed Cache & DevOps Engineer
Must Have : Hazelcast , Redis / Redis Cluster, Apache Ignite
Role Summary
We are looking for a technically strong Distributed Cache Support Engineer to provide operational, technical, and production support for distributed caching solutions. The ideal candidate should have a strong Java background, prior experience with enterprise caching or in-memory data grid platforms, and hands-on exposure to DevOps and Kubernetes-based environments.
This role requires a hybrid profile combining Java backend engineering, distributed cache troubleshooting, platform operations, performance analysis, and production support.
Key Responsibilities
Cache Platform Support
-
Provide L2/L3 support for Distributed Cache clusters and related application integrations.
-
Monitor cluster health, member status, partition distribution, memory utilization, latency, and throughput.
-
Support cache configuration, including distributed maps, Near Cache, eviction policies, TTL, backup count, serialization, and cluster discovery.
-
Troubleshoot cache-related incidents such as high latency, memory pressure, node restarts, split-brain scenarios, data inconsistency, and degraded performance.
-
Assist in capacity planning, performance tuning, and operational improvements for environments.
-
Coordinate with vendor support teams for product-level issues, patches, upgrades, and escalations.
Java / Application Support
-
Analyze Java application behavior related to distributed cache platform integration.
-
Troubleshoot JVM-level issues including heap usage, garbage collection, thread dumps, memory leaks, and serialization overhead.
-
Work with application teams to identify cache misuse, inefficient access patterns, and performance bottlenecks.
-
Support Spring Boot / Java microservices interacting with distributed cache platform.
-
Review and validate application-side configurations and integration patterns.
DevOps / Kubernetes Operations
-
Support distributed cache platform deployments running on Kubernetes or containerized environments.
-
Work with Kubernetes objects such as pods, services, namespaces, configmaps, secrets, deployments, and stateful workloads.
-
Analyze pod restarts, resource limits, liveness/readiness probe failures, service discovery issues, and container logs.
-
Support configuration management and deployment activities through CI/CD pipelines.
-
Assist with TLS/mTLS certificate-related troubleshooting where applicable.
-
Work with infrastructure and platform teams on network, DNS, storage, compute, and security-related issues.
Monitoring, Incident & RCA Management
-
Monitor platform and application metrics using tools such as AppDynamics, Splunk, Prometheus, Grafana, ELK, or similar.
-
Participate in incident management, troubleshooting calls, war-room support, and issue triage.
-
Prepare root cause analysis reports for production incidents.
-
Recommend preventive actions, operational improvements, and automation opportunities.
-
Maintain runbooks, SOPs, known-error documents, and support knowledge base articles.
Required Skills & Experience
Mandatory Skills
-
Strong hands-on experience in Java backend development or Java platform support.
-
Good understanding of JVM internals, memory management, garbage collection, thread dumps, and heap analysis.
-
Prior experience with distributed caching or in-memory data grid solutions.
-
Hands-on experience with at least one of the following:
Hazelcast
Redis / Redis Cluster
Apache Ignite
-
Experience supporting applications in production or near-production environments.
-
Working knowledge of Kubernetes, containers, Linux, and basic networking.
-
Ability to analyze logs, metrics, alerts, and application behavior during incidents.
-
Strong troubleshooting, communication, and documentation skills.
Preferred Skills
Direct hands-on experience with distributed cache platform & Understanding of:
-
Distributed maps
-
Near Cache
-
Eviction and expiry policies
-
Partitioning
-
Backup/replication
-
Split-brain protection
-
Serialization
-
Cluster discovery
-
Experience with Spring Boot and microservices architecture.
-
Experience with CI/CD tools such as Jenkins, GitLab CI, Azure DevOps, or similar.
-
Exposure to Anthos, OpenShift, or enterprise Kubernetes platforms.
-
Experience with AppDynamics, Splunk, Prometheus, Grafana, ELK, or similar observability tools.
-
Knowledge of TLS/mTLS, certificates, service mesh, and secure service communication.
-
Experience working in banking, telecom, or other mission-critical enterprise environments.
Minimum Qualification
-
Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or equivalent experience.
-
5+ years of experience in Java backend development, platform engineering, or production support.
-
2+ years of experience with caching, distributed systems, Kubernetes, or DevOps-related operations.
-
Prior production support experience in enterprise environments.
pND7FZnANH