Kintan Saha

kintansaha.jpg

I am currently a third-year undergraduate (rising junior) pursuing a B.Tech in Mathematics and Computing at the Indian Institute of Science (IISc), Bengaluru. My current GPA (till end of semester 5) is 9.4/10.0. I am within the top 5% of my cohort.

My research interests lie in Reinforcement Learning, with a focus on Stochastic Approximation methods and establishing theoretical guarantees for learning algorithms, as well as in theoretical aspects of generative models, particularly of Diffusion Models. I am also interested in 3D Computer Vision, especially in 3D Scene Reconstruction and Novel View Synthesis.

In addition, I have contributed to peer-reviewed publications submitted to top-tier venues and bring a strong theoretical grounding, supported by advanced coursework and rigorous, application-driven research projects.

You can view or download my full resume here: View / Download Resume


research experience


Nov, 25 - Ongoing

Stochastic Analysis on Graphons

Advisor
Lab
Research Area
Theoretical Machine Learning
Description
Studying gradient-flow analysis and convergence properties on the space of graphons.

Nov, 25 - Ongoing

Finite-Time Guarantees for Multi-Timescale Stochastic Approximation and Hierarchical Reinforcement Learning

Advisor
Lab
Research Area
Theoretical Machine Learning
Description
Developing finite-time analysis of multi-timescale stochastic approximation, with applications to hierarchical reinforcement learning.

May, 25 - Sep, 25

Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning

Advisor
Lab
Research Area
Reinforcement Learning
Description
This project aimed to further enhance the Reliable Policy Iteration(RPI) framework to augment multiple SOTA algorithms such as PPO, TD3, DDPG etc and test it on diverse environments such as Atari, MuJoCo, MiniGrid. The goal is to establish new SOTA results on these environments using these RPI augmented algorithms.
Role
Designed a novel loss function incorporating the RPI framework to be used as a plug and play substitute in SOTA Deep RL algorithms such as PPO, TD3, DDPG etc. Also performed extensive experiments on the extreme sparse-reward MiniGrid environment and performed ablation studies to establish new baselines on MiniGrid.
Technical Stack
PyTorch, Stable-Baselines3, WandB(for logging and hyperparameter sweeps), MatplotLib, Seaborn
Outcome
Paper accepted for publication at the Indian Control Conference 2025

May, 25 - July, 25

Feed Forward Deblurring in 3DGS

Advisor
Lab
Research Area
Computer Vision
Description
This project aims to create a generalisable scene-agnostic deblurring framework to be integrated into 3DGS foundation models for scene-agnostic scene deblurring.
Role
Developed a deblurring framework which can be readily plugged into SOTA 3DGS foundation models such as NoPoSplat, Dust3R etc to enable deblurring of scenes in a feed forward fashion. Current methods tackling scene deblurring within the 3DGS framework are scene-specifc methods; we developed a scene agnostic framework.
Technical Stack
PyTorch, Hydra(Config Management), Blender, PyTorch Lightning, Weights and Biases

Jan, 25 - April, 25

Towards Uncertainty-aware Alignment

Advisor
Lab
Research Area
Reinforcement Learning
Description
Developed an Alignment framework with uncertainty quantification for Preference-based RL. This framework was extended to LLM alignment by modifying PPO(Proximal Policy Optimization) to account for uncertainty in the reward estimates of the reward models being used in the RLHF pipeline.
Role
Experimentally verified the LLM alignment framework by modifying the RLHF pipeline to include our novel uncertainty estimation framework. The framework was tested on LLMs of multiple sizes: GPT-2, Qwen2.5, Mistral-7B and mulitple reward models such as custom ensemble reward models and prompted reward models such as Gemini2.0, Deepseek-V3.
Technical Stack
HuggingFace Transformers, HuggingFace Transformers Reinforcement Learning(TRL), Weights and Biases
Outcome
Paper published. Pre print available.

Jan, 24 - June, 24

HinglishEval: Evaluating the Effectiveness of Code-generation Models on Hinglish Prompts

Advisor
Lab
Research Area
Computer Science Education
Description
This project aimed to evaluate code-generation LLMs on Hinglish prompts obtained from a translated HumanEval dataset. The end goal is to evaluate the effectiveness of such code-gen LLMs in CS101 courses in the Indian context.
Role
Helped in translating the HumanEval dataset to Hinglish and evaluating multiple code gen LLMs such as GPT-4 ,Gemma, Phi-3, PolyCoder, StarCoder etc. The evaluation criteria used was pass@k and Item Response Theory (IRT).
Technical Stack
HuggingFace Transformers, OpenAI API, MatplotLib
Source Code
Outcome
Paper accepted for publication at the Indian Control Conference 2025
For details, please refer to projects section

research publications

  1. S.R. Eshwar | Kintan Saha | Aniruddha Mukherjee | Krishna Agarwal | Gugan Thoppe | Aditya Gopalan | Gal Dalal
    Dec 2025
  2. Debangshu Banerjee | Kintan Saha | Aditya Gopalan
    Jul 2025
  3. Mrigank Pawagi | Anirudh Gupta | Siddharth Reddy Rolla | Kintan Saha
    Mar 2025

skills

My core technical expertise spans across -

Languages and Frameworks
Python Shell scripting Conda/Miniconda
Deep Learning & ML Engineering
PyTorch HuggingFace Transformers and TRL Weights & Biases (W&B) Hydra OpenCV
Deep Reinforcement Learning
Stable-Baselines3 OpenAI Gym Environments: Atari, MuJoCo, MiniGrid
3D Vision and Generative Models
NeRF 3D Gaussian Splatting (3DGS) COLMAP Diffusion and Flow-Based Models
Data Analysis and Visualization
Matplotlib Pandas NumPy
For details, please refer to skills section

presentations


volunteering activity

Notable volunteering activities are -
  • I am a senior core committee member of Databased, the IISc UG computer science club.
  • I am a co-convener of Rhythmica – the IISc music club. For details on my musical journey, please refer to music section.
  • .

contacts