Data Scientist - Paytm Large Model (PLM)
About the Role
We're seeking Big & Deep Learners to join our pioneering team developing Paytm Large Model (PLM), a foundation model for digital payment intelligence.
You'll be at the forefront of :
Building and tweaking big transformer architectures on big data of payment and other activities (onboarding, login, order, invest, …)
Build Encoder to create embeddings that capture the DNA of payment ecosystems
Further build Decoder or Autoregressive Generative model for behavioural prediction/detection Do whatever it takes to enrich and use the model for downstream applications : IFT/DPO/RPO….
Key Responsibilities-
Design and develop large-scale foundation models for learning from payment transactions, user behavior, merchant patterns etc.- Create sophisticated encoder architecture to generate embeddings that capture nuanced relationships in financial data- Build decoder only model as well for generative downstream tasks- Build and optimize training pipelines for processing billions of payment transactions- Develop evaluation frameworks for measuring model performance across multiple financial use cases- Implement fine-tuning mechanisms for model adaptation across different payment environments- Collaborate with MLOps team for production deployment and scaling- Research and implement latest advances in foundation models and their applications to financial data- Make the model wiser everyday by enriching it through state of the art IFT, DPO, PPO ..
Required Qualifications-
Solid in 3 pillars of learning from data : Optimization, Statistics and Linear Algebra- Ph.D. or Master's degree in Deep Learning or related field- Strong track record in building and deploying foundation models, preferably on non text/image data such as time series, temporal point process or in general, numerical vector sequences- Deep understanding of transformer architectures and their variations- Expertise in PyTorch or TensorFlow for large-scale model development- Experience with distributed training systems and optimization techniques- Strong programming skills in Python and ML deployment frameworks- Must be good in data structures and algorithms- Understanding of payment systems and financial data structures
Preferred Experience-
Previous work with financial or transaction data- Experience with large encoder/decoder models and representation learning- Knowledge of privacy-preserving ML techniques- Background in self-supervised learning approaches- Familiarity with MLOps and production deployment of large models- Experience with model compression and optimization techniques- Working knowledge on Knowledge Graph representation learning using Graph Neural Network (GraphSage/GAT/Graph Transformer ..) is a plus
Technical Skills-
Languages: Python, PySpark, SQL- Frameworks: PyTorch/TensorFlow, Hugging Face- Infrastructure: Cloud platforms (AWS/GCP), distributed training systems- Tools: Git, Docker, ML experiment tracking platforms- Processing: Spark, distributed computing
Soft Skills-
Strong research orientation with practical implementation skills- Ability to translate complex technical concepts to business stakeholders- Excellence in technical writing and documentation- Collaborative mindset for cross-functional team projects
What We Offer-
Opportunity to work on cutting-edge foundation models in fintech- High-impact role shaping the future of payment intelligence- Access to large-scale computing resources and datasets- Collaborative research and development environment- Competitive compensation and benefits- Professional development and conference opportunities
Required
Mathematical Background Linear Algebra- Vector spaces and subspaces- Eigenvalues and eigenvectors- Matrix factorization techniques- High-dimensional linear transformations- Computational complexity of matrix operations
Optimization- Convex optimization theory- Gradient descent and its variants- Stochastic optimization methods- Constrained optimization- Loss function design and optimization- Learning rate scheduling strategies
Statistics & Probability- Probability distributions and their properties- Statistical inference and hypothesis testing- Bayesian statistics and modeling- Time series analysis- Sampling techniques- Information theory concepts
Deep Learning Foundations- Backpropagation and computational graphs- Neural network architectures- Regularization techniques- Normalization methods- Loss functions and their gradients- Model calibration and uncertainty
Required Transformer Architecture Knowledge
Attention Mechanisms- Self-attention : theory and implementation- Scaled dot-product attention- Key-Query-Value concept and computations- Attention masking techniques- Efficient attention variants- Sparse attention mechanisms
Multi-Head Attention- Parallel attention head design- Head projection techniques- Cross-attention mechanisms- Head pruning and importance scoring- Multi-query attention- Grouped-query attention
Positional Encoding- Absolute positional encodings- Relative positional encodings- Rotary positional encoding (RoPE)- ALiBi position encoding- T5-style relative position biases- Implementation of various positional encoding schemes
Tokenization & Embedding- Subword tokenization algorithms- Byte-Pair Encoding (BPE)- WordPiece and SentencePiece- Numerical feature tokenization- Categorical encoding strategies- Embedding table design and optimization
Architecture Specifics- Transformer blocks and their variants- Feed-forward network design- Layer normalization approaches- Residual connections- Activation functions- Architecture search and scaling laws
Specialized Knowledge for Financial Data- Time-aware attention mechanisms- Handling numerical features in transformers- Variable-length sequence processing- Missing value handling- Hierarchical attention for transaction data- Multi-modal fusion techniques
Technical Skills
Implementation- Vectorized implementation of attention- Efficient batch processing- Memory optimization techniques- CUDA optimization fundamentals- Distributed training patterns- Model parallelism strategies
Frameworks & Tools- PyTorch/TensorFlow proficiency- HuggingFace Transformers library- Distributed training frameworks- GPU programming basics- ML experiment tracking platforms- Version control for ML models