Civil War Photo Sleuth
The American Civil War (1861–1865) was the first major conflict to have been extensively photographed, with the images being widely displayed and sold in large quantities. Around 4,000,000 soldiers fought the war, and most of them were photographed at least once. After 150 years, thousands of these photographs have survived, but most of the identities of these soldiers are lost.
We introduce a web-based platform called Civil War Photo Sleuth for helping users identify unknown soldiers in portraits from the American Civil War era. This system employs a novel person identification pipeline by leveraging the complementary strengths of crowdsourced human vision and face recognition algorithms.
- V. Mohanty, D. Thames, S. Mehta, and K. Luther. Photo Sleuth: Combining Human Expertise and Face Recognition to Identify Historical Portraits.
ACM Conference on Intelligent User Interfaces (IUI 2019), Los Angeles, CA, USA, 2019. (25% acceptance rate)
- V. Mohanty, D. Thames, and K. Luther. Are 1,000 Features Worth A Picture? Combining Crowdsourcing and Face Recognition to Identify Civil War Soldiers. AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2018), Zurich, Switzerland, 2018. [POSTER LINK] (BEST POSTER/DEMO AWARD)
- V. Mohanty, D. Thames, and K. Luther. Photo Sleuth: Combining Collective Intelligence and Computer Vision to Identify Historical Portraits. ACM Conference on Collective Intelligence (CI 2018), Zurich, Switzerland, 2018. (32% acceptance rate for oral presentations)
- GRAND PRIZE WINNER (25000 USD) of the Microsoft Cloud AI Challenge 2018
- News Coverage on Slate link
- Public Launch of the website www.civilwarphotosleuth.com at the National Archives Building in Washington, DC [Related Article ]
- Initial Launch of the website www.civilwarphotosleuth.com at the 45th Civil War Artifact and Collectibles Show in Gettysburg, PA
This is a semester-long project done as part of the "User Interface Software" course I am currently working on. This is a web application aimed at online crowd workers or users gathering additional photos of identified Civil War soldiers from online sources. The identified soldiers in the database will have a prior photo for comparing the user-uploaded photos with. This application aims to enhance the existing database of current online Civil War database and communities like Civil War Photo Sleuth, American Civil War Database, etc. This application required development through wireframe iterations, high-fidelity mockups, UX inspection and prototyping. A demo of the prototype can be seen here.
Website for VT Junoon: Bollywood Dance Group of Virginia Tech
This was a semester-long project done as part of the "Usability Engineering" course. As part of a team, we proposed a one-stop solution to VT Junoon (Bollywood dance group of Virginia Tech) in the form of a web portal for handling their internal operations and also as an interface for the general public to access information about the group. This required conducting interviews with the group for data elicitation about the team organization and how they conduct their operations. Wireframes and high-fidelity mockups were developed. Subsequently, a prototype was developed and a UX evaluation was conducted through focus groups and user interviews.
To assist the group’s internal operations, the proposed platform would serve as an interface to facilitate group interactions, with the administrators being able to add members in chat channels. Website administrators will have the access to post desired updates, create and maintain fundraising campaigns, and update event information on the calendar. The website will support administrators to moderate comments, initiate discussions, and collect feedback from the general public. For the general public, the website will afford as the one-stop access to information about VT-Junoon, connect and interact with the dance team members, and show their support. It will also have a feature for them to sign up for notifications on upcoming events. They can also track past and upcoming events through the calendar on the website.
Identifying Artist from Artwork
This was submitted as the term project for the "Computer Vision" course. For a non-expert in paintings and other related art forms, it is very difficult to identify the artist behind the art form just by merely looking at the style of the painting or just the brush strokes or a sculpture. Usually, the person has to resort to contextual information in the painting or meta-tags about the painting to take a guess at who the artist is. This project is an attempt to build a system that learns the style of an artist from his/her paintings, without using any form of contextual information, such as a lady wearing black having a mysterious smile, or image meta-tags, such as "Monalisa"/"Louvre"/"16th Century".
The project, in its final form, would have been a tool to input an artist's name and retrieve all images that seem most likely to have been made by the artist. An extension of this project would have been to transfer the artist's style to any painting/photograph. The "Neural Style Transfer" method was used to extract style features from paintings, and they were trained with weak classifiers, like SVM and Mixture of Gaussians, to check if there's any pattern leading up to successful identificaation. The results, however, were not as one would hope for the artist identification task. For identifying the painting styles like "impressionism" or "abstract", the classifiers showed some improvement.
Check the project website here.
Recipe Generation Using Cuisine and Taste Cross Validation
This was submitted as the final term project for the "Advanced Machine Learning" course. This project aimed at generating cuisine-specific recommendations of recipes, based on a user’s prior taste preferences learned from his/her previous cuisine choices. Imagine a scenario, where a person from Germany, who is very keen on trying Chinese food for the first time, fails to enjoy the whole experience because the dishes were too spicy or sour for him/her. This project attempts to generate a set of ingredients, which are local to, in this case, Chinese cuisine, and would be most preferred by the person based on his prior food preferences, which in this case, may be European cuisine.
The proposed four-fold classifier tries to 1) learn the probabilities of taste categories for every ingredient in the dataset, 2) learn the individual probability of each ingredient belonging to a certain cuisine, 3) generate the set of ingredients that are local to the user-defined target cuisine, and are most similar to the user-provided food preferences, and 4) generate a recipe that best fits the generated set of ingredients.
The detailed report can be found here.
Autonomous Ground Vehicle
I was associated with the Autonomous Ground Vehicle (AGV) research group at IIT Kharagpur from 2013 to 2017, where I was a part of the teams that participated in the Annual Intelligent Ground Vehicle Competition (IGVC) held at Oakland University, Michigan, USA in 2014, 15 and 16. Over the years, I worked on a multitude of problems, ranging from designing circuitry for low-cost encoders, motor control, robot localization and mapping, visual odometry and obstacle detection.
I initiated the usage of Probabilistic Robotics concepts for improving the robot's navigation module, employed different techniques for filtering sensor noise, fusing sensor data and tweaking Robot Operating System (ROS) packages for accurately estimating the robot's position. I also collaborated on traditional Computer Vision problems like Traffic Sign Detection and Obstacle Detection.
Select Projects at AGV
This work addresses the problem of Monocular Visual Odometry using a Deep Learning-based framework, instead of the regular 'feature detection and tracking' approaches. Several experiments were performed to understand the influence of a known/unknown environment, a conventional trackable feature and pre-trained activations tuned for object classification on the network’s ability to accurately estimate the motion trajectory of the camera (or the vehicle). Based on these observations, a Convolutional Neural Network architecture was proposed, best suited for estimating the object’s pose under known environment conditions.
The arXiv preprint can be found here.
2. Visual Odometry for Android
This project also addresses the problem of Monocular Visual Odometry, but involves processing in smartphones and use of low-resolution images. An android application was built to process the image feed and compute the trajectory of the user using FAST (Features from Accelerated Segment Test) features. The app also allowed the user to capture a panorama image at any desired location. The idea of combining the trajectory of the user with the panorama images was an attempt at creating a virtual walkthrough app similar to Google StreetView with all the processing done in the smartphone itself.
A rough working of the application can be seen in the video (right).
Depth Estimation from Monocular Images
This project was done as part of a Summer Internship project at the Autonomous Robotics and Perception Group, University of Colorado at Boulder, under the guidance of Prof. Christoffer Heckman. Images captured by monocular cameras are devoid of depth, a key piece of information useful for mapping the environment in autonomous robotics. This project was aimed at generating a depth map for a particular object class (e.g. doors in this project) from a monocular color image input.
An object detection pipeline based on Faster R-CNN by Shaoqing Ren et al detects the object of interest in the image and passes the corresponding depth map of that region to train a Depth Regression Network. A Kinect was used to gather the RGB-D training images from indoor environments. The idea was to use standardized dimensions of these objects (e.g. door dimensions) in order to find the relation with the door detected in the image, refined over multiple iterations, and use that to infer an accurate depth map of the object. This would be used as a scale prior for a Monocular SLAM (Simultaneous Localization and Mapping) pipeline for improving pose estimation. A pivot was proposed to learn 2D object sizes from the 3D models.
Equipment Detection in Operating Room
This project was done as part of a Summer Internship project at the Computational Modelling and Analysis of Medical Activities (CAMMA) Research Group, University of Strasbourg in France, under the guidance of Prof. Nicolas Padoy.The XAware project at the University of Strasbourg aims at developing a global X-Ray monitoring system that can provide radiation risk information to the clinicians and staff working on interventional radiology procedures. In order to accurately simulate the X- Ray radiations, the 3D Positions of the equipments in the operating room have to be considered . However, equipment detection in the operating room is a big challenge because of multiple occlusions due to moving staff.
An object detection pipeline based on LINEMOD, a multimodal template matching approach, was built for detecting the equipments in an operating room and estimating their poses. The aim of this project was to evaluate the equipment detection approach and help decide the optimum parameters required for improving the detection. This required building a ground truth dataset and an evaluation pipeline considering different template parameters.
A link to the complete report can be found here.
Valeo organizes an annual innovation challenge for coming up with solutions to make cars more intelligent, intuitive, environment-friendly and fun. T.R.A.N.S.I.T (Traffic Regulation using Automated Networks for Safe and Intelligent Transport) was the submission to the Valeo Innovation Challenge 2016 and was adjudged as one of the top 24 solutions (semi-finalist entry) among 1400 submissions from 90 countries. The proposal was to develop an algorithm for building a fully automated zone in a city, with seamless traffic flow that directs vehicles along minimum time paths. This solution was awarded 5000 Euros to develop a proof-of-concept model.
The submission can be viewed on the right and downloaded from here.
Emotion Analysis from Facial Expressions
This was part of the term project for the "Machine Intelligence and Expert Systems" course. Emotions in human beings are usually the characteristics of the face, which shows visible change in the form of change in muscle positions, shape of the lips, eyebrows, chin and the nose. These features can then be used for predicting the emotion of a person.
Using a combination of Haar classifiers and the Active Shape Models resulted in a wide range of feature points spread across the face, tracing the eyes, nose, mouth and the face circumference. These low-level features were used to construct 20 unique high-level features like Chin Circumference, Eyebrow Centroid distance, Mouth Width, etc. A Naive Bayes classifer was trained on the JAFFA database to classify the emotions as "Happy", "Angry", "Sad" and "Surprise".
The detailed report can be found here.
Movie script analysis for determining success
This was the term project for the “Speech and Natural Language Processing” course. Which traits in a movie script determine the success of the movie? This project was an attempt to uncover such success-determining features in movie scripts.
945 movie scripts were analyzed. The scripts were broken down into scenes, and sentiment analysis was used to correlate emotions expressed in these scripts to their success. Individual scene themes were correlated to the overall theme of the movie as a whole. A character interaction network was built to understand how characters interacted with each other over the course of the movie. A topic extractor was built to build a distribution of the overlap of keywords in a scene and the overall movie. These features fed into a Mixture of Gaussians model for building clusters of similar scripts.
The report can be found here.
Imposter Detection using Keystroke Dynamics
This was part of the term project for the "Machine Intelligence and Expert Systems" course. This project attempted to distinguish people by their typing rhythms, much like a handwriting can be used to identify the author of a written text. A keylogger was used to generate a dataset of over 10000 keystrokes per user (5 users in this case). Latency between two consecutive keystrokes, hold time for a particular keystroke, characters per minute and character frequencies were computed from this keylogger data. This low-level information was developed into a set of unique high-level features, based on the keyboard layout, that represented a user's typing pattern. These characteristic features then fed into a Mixture of Gaussians model for classifying the users.
The detailed report can be found here.