Kintan Saha | Reinforcement Learning & Computer Vision Research Kintan Saha

I am currently a third-year undergraduate (rising junior) pursuing a B.Tech in Mathematics and Computing at the Indian Institute of Science (IISc), Bengaluru. My current GPA (till end of semester 4) is 9.3/10.0

My research interests lie in Reinforcement Learning, with a focus on Stochastic Approximation methods and establishing theoretical guarantees for learning algorithms, as well as in theoretical aspects of generative models, particularly of Diffusion Models. I am also interested in 3D Computer Vision, especially in 3D Scene Reconstruction and Novel View Synthesis.

In addition, I have contributed to peer-reviewed publications submitted to top-tier venues and bring a strong theoretical grounding, supported by advanced coursework and rigorous, application-driven research projects.

You can view or download my full resume here: View / Download Resume

skills

My core technical expertise spans across -

Languages and Frameworks

Python Shell scripting Conda/Miniconda

Deep Learning & ML Engineering

PyTorch HuggingFace Transformers and TRL Weights & Biases (W&B) Hydra OpenCV

Deep Reinforcement Learning

Stable-Baselines3 OpenAI Gym Environments: Atari, MuJoCo, MiniGrid

3D Vision and Generative Models

NeRF 3D Gaussian Splatting (3DGS) COLMAP Diffusion and Flow-Based Models

Data Analysis and Visualization

Matplotlib Pandas NumPy

For details, please refer to skills section

research experience

May, 25 - Ongoing	Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning Advisor Prof Gugan Thoppe and Prof. Aditya Gopalan Lab Department of Computer Science and Automation, IISc and Department of Electrical Communication and Engineering, IISc Description This project aims to further enhance the Reliable Policy Iteration(RPI) framework to augment multiple SOTA algorithms such as PPO, TD3, DDPG etc and test it on diverse environments such as Atari, MuJoCo, MiniGrid. The goal is to establish new SOTA results on these environments using these RPI augmented algorithms. Role Designed a novel loss function incorporating the RPI framework to be used as a plug and play substitute in SOTA Deep RL algorithms such as PPO, TD3, DDPG etc. Also performed extensive experiments on the extreme sparse-reward MiniGrid environment and performed ablation studies to establish new baselines on MiniGrid. Technical Stack PyTorch, Stable-Baselines3, WandB(for logging and hyperparameter sweeps), MatplotLib, Seaborn
May, 25 - Ongoing	Feed Forward Deblurring in 3DGS Advisor Prof R. Venkatesh Babu Lab Vision & AI Lab (VAL), IISc Description This project aims to create a generalisable scene-agnostic deblurring framework to be integrated into 3DGS foundation models for scene-agnostic scene deblurring. Role Developing a deblurring framework which can be readily plugged into SOTA 3DGS foundation models such as NoPoSplat, Dust3R etc to enable deblurring of scenes in a feed forward fashion. Current methods tackling scene deblurring within the 3DGS framework are scene-specifc methods; we aim to develop a scene agnostic framework. Technical Stack PyTorch, Hydra(Config Management), Blender, PyTorch Lightning, Weights and Biases
Jan, 25 - April, 25	Towards Uncertainty-aware Alignment Advisor Prof. Aditya Gopalan Lab Department of Electrical Communication and Engineering, IISc Description Developed an Alignment framework with uncertainty quantification for Preference-based RL. This framework was extended to LLM alignment by modifying PPO(Proximal Policy Optimization) to account for uncertainty in the reward estimates of the reward models being used in the RLHF pipeline. Role Experimentally verified the LLM alignment framework by modifying the RLHF pipeline to include our novel uncertainty estimation framework. The framework was tested on LLMs of multiple sizes: GPT-2, Qwen2.5, Mistral-7B and mulitple reward models such as custom ensemble reward models and prompted reward models such as Gemini2.0, Deepseek-V3. Technical Stack HuggingFace Transformers, HuggingFace Transformers Reinforcement Learning(TRL), Weights and Biases Status Submitted for review at NeurIPS 2025
Jan, 24 - June, 24	HinglishEval: Evaluating the Effectiveness of Code-generation Models on Hinglish Prompts Advisor Prof Viraj Kumar Lab Division of Electrical, Electronics and Computer Sciences Description This project aimed to evaluate code-generation LLMs on Hinglish prompts obtained from a translated HumanEval dataset. The end goal is to evaluate the effectiveness of such code-gen LLMs in CS101 courses in the Indian context. Role Helped in translating the HumanEval dataset to Hinglish and evaluating multiple code gen LLMs such as GPT-4 ,Gemma, Phi-3, PolyCoder, StarCoder etc. The evaluation criteria used was pass@k and Item Response Theory (IRT). Technical Stack HuggingFace Transformers, OpenAI API, MatplotLib Source Code Link to Repository Status Published at ACM COMPUTE 2024

For details, please refer to projects section

research publications

Reliable Policy Iteration: Consistent Performance across Function Approximators

S.R. Eshwar | Kintan Saha | Aniruddha Mukherjee | Krishna Agarwal | Gugan Thoppe | Aditya Gopalan | Gal Dalal

Jul 2025
Towards Reliable, Uncertainty-Aware Alignment

Debangshu Banerjee | Kintan Saha | Aditya Gopalan

Apr 2025
Evaluating the Effectiveness of Code-Generation Models on Hinglish Prompts

Mrigank Pawagi | Anirudh Gupta | Siddharth Reddy Rolla | Kintan Saha

Dec 2024

presentations

Jun 01, 2025	Presentation on Structure From Motion
May 23, 2025	Presentation on Diffusion and Flow based models
May 19, 2025	Presentation on 3D Scene Representation
Apr 14, 2025	Presentation of course project for UMC203 - AI and ML
Apr 14, 2025	Presentation of course project for E1 277 - Reinforcement Learning
Dec 04, 2024	Presentation at ACM COMPUTE 2024 of my paper
Oct 20, 2024	Presentation on Spectral Clustering

volunteering activity

Notable volunteering activities are -

I am a senior core committee member of Databased, the IISc UG computer science club.
I am a co-convener of Rhythmica – the IISc music club. For details on my musical journey, please refer to music section.

skills

research experience

research publications

presentations

volunteering activity

contacts