Kintan Saha

kintansaha.jpg

I am currently a third-year undergraduate (rising junior) pursuing a B.Tech in Mathematics and Computing at the Indian Institute of Science (IISc), Bengaluru. My current GPA (till end of semester 4) is 9.3/10.0

My research interests lie in Reinforcement Learning, with a focus on Stochastic Approximation methods and establishing theoretical guarantees for learning algorithms, as well as in theoretical aspects of generative models, particularly of Diffusion Models. I am also interested in 3D Computer Vision, especially in 3D Scene Reconstruction and Novel View Synthesis.

In addition, I have contributed to peer-reviewed publications submitted to top-tier venues and bring a strong theoretical grounding, supported by advanced coursework and rigorous, application-driven research projects.

You can view or download my full resume here: View / Download Resume


skills

My core technical expertise spans across -

Languages and Frameworks
Python Shell scripting Conda/Miniconda
Deep Learning & ML Engineering
PyTorch HuggingFace Transformers and TRL Weights & Biases (W&B) Hydra OpenCV
Deep Reinforcement Learning
Stable-Baselines3 OpenAI Gym Environments: Atari, MuJoCo, MiniGrid
3D Vision and Generative Models
NeRF 3D Gaussian Splatting (3DGS) COLMAP Diffusion and Flow-Based Models
Data Analysis and Visualization
Matplotlib Pandas NumPy
For details, please refer to skills section


research experience


May, 25 - Ongoing

Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning

Advisor
Lab
Description
This project aims to further enhance the Reliable Policy Iteration(RPI) framework to augment multiple SOTA algorithms such as PPO, TD3, DDPG etc and test it on diverse environments such as Atari, MuJoCo, MiniGrid. The goal is to establish new SOTA results on these environments using these RPI augmented algorithms.
Role
Designed a novel loss function incorporating the RPI framework to be used as a plug and play substitute in SOTA Deep RL algorithms such as PPO, TD3, DDPG etc. Also performed extensive experiments on the extreme sparse-reward MiniGrid environment and performed ablation studies to establish new baselines on MiniGrid.
Technical Stack
PyTorch, Stable-Baselines3, WandB(for logging and hyperparameter sweeps), MatplotLib, Seaborn

May, 25 - Ongoing

Feed Forward Deblurring in 3DGS

Advisor
Lab
Description
This project aims to create a generalisable scene-agnostic deblurring framework to be integrated into 3DGS foundation models for scene-agnostic scene deblurring.
Role
Developing a deblurring framework which can be readily plugged into SOTA 3DGS foundation models such as NoPoSplat, Dust3R etc to enable deblurring of scenes in a feed forward fashion. Current methods tackling scene deblurring within the 3DGS framework are scene-specifc methods; we aim to develop a scene agnostic framework.
Technical Stack
PyTorch, Hydra(Config Management), Blender, PyTorch Lightning, Weights and Biases

Jan, 25 - April, 25

Towards Uncertainty-aware Alignment

Advisor
Lab
Description
Developed an Alignment framework with uncertainty quantification for Preference-based RL. This framework was extended to LLM alignment by modifying PPO(Proximal Policy Optimization) to account for uncertainty in the reward estimates of the reward models being used in the RLHF pipeline.
Role
Experimentally verified the LLM alignment framework by modifying the RLHF pipeline to include our novel uncertainty estimation framework. The framework was tested on LLMs of multiple sizes: GPT-2, Qwen2.5, Mistral-7B and mulitple reward models such as custom ensemble reward models and prompted reward models such as Gemini2.0, Deepseek-V3.
Technical Stack
HuggingFace Transformers, HuggingFace Transformers Reinforcement Learning(TRL), Weights and Biases
Status
Submitted for review at NeurIPS 2025

Jan, 24 - June, 24

HinglishEval: Evaluating the Effectiveness of Code-generation Models on Hinglish Prompts

Advisor
Lab
Description
This project aimed to evaluate code-generation LLMs on Hinglish prompts obtained from a translated HumanEval dataset. The end goal is to evaluate the effectiveness of such code-gen LLMs in CS101 courses in the Indian context.
Role
Helped in translating the HumanEval dataset to Hinglish and evaluating multiple code gen LLMs such as GPT-4 ,Gemma, Phi-3, PolyCoder, StarCoder etc. The evaluation criteria used was pass@k and Item Response Theory (IRT).
Technical Stack
HuggingFace Transformers, OpenAI API, MatplotLib
Source Code
Status
For details, please refer to projects section

research publications

  1. S.R. Eshwar | Kintan Saha | Aniruddha Mukherjee | Krishna Agarwal | Gugan Thoppe | Aditya Gopalan | Gal Dalal
    Jul 2025
  2. Debangshu Banerjee | Kintan Saha | Aditya Gopalan
    Apr 2025
  3. Mrigank Pawagi | Anirudh Gupta | Siddharth Reddy Rolla | Kintan Saha
    Dec 2024

presentations


volunteering activity

Notable volunteering activities are -
  • I am a senior core committee member of Databased, the IISc UG computer science club.
  • I am a co-convener of Rhythmica – the IISc music club. For details on my musical journey, please refer to music section.
  • .

contacts