I am a visiting research scholar at the Georgia Institute of Technology, supervised by Devi Parikh and Dhruv Batra, working closely with Stefan Lee and Peter Anderson. I am primarily interested in deep learning, with applications in computer vision and natural language processing. Before my current position, I completed my undergraduate major in Electrical Engineering and minor in Computer Science from Indian Institute of Technology, Roorkee.

During my undergraduate studies, I was selected twice as a Google Summer of Code student, with TARDIS Foundation (2016) and the OpenCV Foundation (2017). Parallel to my second Google Summer of Code, I was a software engineering intern at Goldman Sachs, Bangalore. I was also an active member of the Mobile Development Group IITR, and co-founded a reading group to encourage students towards reading academic literature, now named Vision and Language Group.

Feel free to say hi: kdexd at gatech dot edu


[Apr 2019] Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering accepted to ICML 2019!
[Feb 2019] Paper out on arxiv: Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering.
[Jan 2019] New version for Visual Dialog Challenge starter code released!
[Jan 2019] Organizing the second Visual Dialog challenge, results to be announced at CVPR 2019!
[Dec 2018] Paper out on arxiv: nocaps: novel object captioning at scale.
[Oct 2018] Visiting the annual Google Summer of Code mentor summit (Google Mountain View).
[Jul 2018] Co-organizing the first Visual Dialog challenge, results to be announced at ECCV 2018!
[Jun 2018] Joining Georgia Tech as a visiting research scholar under the supervision of Devi Parikh and Dhruv Batra.
[May 2018] Graduated from Indian Institute of Technology Roorkee, major in Electrical Engineering.


Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering
Ramakrishna Vedantam, Karan Desai, Stefan Lee, Marcus Rohrbach, Dhruv Batra, Devi Parikh
probnmn paper

nocaps: novel object captioning at scale
Harsh Agrawal*, Karan Desai*, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson
nocaps paper

Open Source

Starter code for the Visual Dialog Challenge. Built using PyTorch v1.0 and provides out of the box support with CUDA 9 and CuDNN 7. Provides a simple implementation of Late Fusion encoder and Discriminative decoder. Complete with efficient scripts for data preprocessing, image feature extraction, training and evaluation, along with support to generate a submission file for the challenge.
Implementation of the paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" by Kottur et al (EMNLP 2017), using PyTorch and ParlAI.
Implementation of Neural Turing Machines introduced in Graves et al (2014), using PyTorch. Supports training and evaluation on four out of six tasks described in the paper.
Trianglify is a highly customizable library to generate beautiful triangle art views for android. Uses the Delaunay Triangulation algorithm behind the curtains.
Yolog wraps over vanilla git log and displays commit history complete with graph, timestamp, author and refs. Colors are configurable, and all standard git log commands work.

First Projects

First neural network built using numpy (2015), a multi layer perceptron classifier for MNIST. This repository stayed on Github trending charts for almost a week, with 600+ stars.
First github repository (2015), browser based game of snake. During my first year of undergrad studies. The game still works on gh-pages!