Research Statement

Since 2007, I have been involved in a lot of academic and industrial work focused on machine intelligence, especially in Computer Vision & Machine Learning (CVML), and Multimedia & Natural Language Processing (MMNLP). Such long-term interest emerges from my strong desire to build towards “Machines that think”. My general goal is to effectively contribute to the ongoing research in Machine Intelligence. I aim, during my PhD phase, to significantly impact our community; like how latent-SVM opened the door for wide range of Machine Learning applications, or how PhotoSynth became a vital product, as an output of Computer Vision and Multimedia research. Hence, I present a spotlight of my experiences in CVML and MMNLP:

For my CVML Experiences, I have published some papers in the top Computer Vision conferences (e.g. ICCV, ICIP, CVPR). I have gained experiences in three sub-areas under CVML, which are “Object Recognition and Zero Shot Learning”, “Structured Regression” and “Video-Surveillance Systems”. (1) For my “Object Recognition and Zero Shot Learning”, I published a paper in ICCV13 paper, titled “Write a Classifier: Zero Shot Learning Using Purely Textual Descriptions”, that presents a solution to a novel problem.  The problem is to use purely textual descriptions of unseen visual categories to predict their corresponding visual classifiers. My solution involves learning a heterogeneous domain adaptation function to predict a visual classifier from a textual description. More recently, I developed a kernel-classifier predictor of unseen classes, which was submitted to CVPR14. While, the aforementioned two projects focuses on Zero shot learning, I am currently working on a Geometry Preserving Kernel for Object Recognition.  (2) For myStructured Regression” experiences, I worked on structured regression applied to Computer Vision problems, which were concluded by two submissions to ICML14 and CVPR14, respectively.  (3) In the direction of Video-Surveillance Systems, I have a publication, titled “MultiClass Object Classification in Video Surveillance Systems -Experimental Study”, which was orally presented in SISM workshop at CVPR13 (demo).  In addition, my MSc-thesis topic was “High Performance Activity Monitoring for Scenes including Multi-Agents”, in which I developed a GPU-framework for teamwork Activity Recognition in Video-Surveillance Systems in 2010. As a Bachelor projects’ mentor, I designed and participated in the implementation of three interesting video processing projects: “Gait Analysis for Human Identification (GAHI)” (demo) in 2009,  “Intelligent Presentation Guru (IPG)” (demo) in 2010, and “On-Chip Action Recognition System” in 2010. These three projects received ITIDA awards (link). 

For MMNLP Experiences, I invented the concept of Multi-level (ML) MindMaps, which is defined as a method to jointly visualize and summarize textual information. The visualization is achieved pictorially across multiple levels using semantic information (i.e. ontology), while the summarization is achieved by the information in the highest levels as they represent the abstract information in the text. In contrast to prior work, ML-MindMap representation gives a meaningful control to learn about the direction of details that the user might be interested in, starting from the root-level. This work resulted in  "English2MindMap" paper that proposes the first automation of this concept, which was published in the International Symposium on Multimedia, Dec 2012; a journal version was submitted to Information Processing &Management.  This work has also a US-Patent Pending. (project website)