Research Statement

Since2007, I have been involved in a lot of academic and industrial work focused onmachine intelligence, especially in Computer Vision & Machine Learning (CVML),and Multimedia & Natural Language Processing (MMNLP). Such long-terminterest emerges from my strong desire to build towards “Machines thatthink”. My general goal is to effectively contribute to the ongoingresearch in Machine Intelligence. I aim, during my PhD phase, to significantlyimpact our community; like how latent-SVM opened the door for wide range ofMachine Learning applications, or how PhotoSynth became a vital product, as anoutput of Computer Vision and Multimedia research. Hence, I present a spotlightof my experiences in CVML and MMNLP:

For my CVML Experiences, I have published some papers in the top Computer Vision conferences(e.g. ICCV, ICIP, CVPR). I have gained experiences in three sub-areas underCVML, which are “Object Recognition and Zero Shot Learning”, “Structured Regression”and “Video-Surveillance Systems”. (1) For my “Object Recognition and Zero ShotLearning”, I published a paper in ICCV13 paper, titled “Write a Classifier: Zero Shot Learning Using PurelyTextual Descriptions”,that presents a solution to a novel problem.  The problem is to use purely textual descriptions of unseen visual categories topredict their corresponding visual classifiers. My solution involves learning aheterogeneous domain adaptation function to predict a visual classifier from a textualdescription.More recently, I developed a kernel-classifierpredictor of unseen classes, which was submitted to CVPR14. While, theaforementioned two projects focuses on Zero shot learning, I am currentlyworking on a Geometry Preserving Kernel for Object Recognition.  (2) For myStructured Regression” experiences,I worked on structured regression applied to Computer Vision problems, which wereconcluded by two submissions to ICML14 and CVPR14, respectively.  (3) Inthe direction of Video-Surveillance Systems, I have apublication, titled “MultiClass Object Classification in Video SurveillanceSystems -Experimental Study”, which was orally presented in SISM workshop atCVPR13 (demo).  In addition, my MSc-thesis topic was “HighPerformance Activity Monitoring for Scenes including Multi-Agents”, in which Ideveloped a GPU-framework for teamwork Activity Recognition in Video-SurveillanceSystems in 2010. As a Bachelor projects’ mentor, I designed and participated inthe implementation of three interesting video processing projects: “GaitAnalysis for Human Identification (GAHI)” (demo) in 2009,  “Intelligent Presentation Guru (IPG)” (demo) in 2010, and “On-ChipAction Recognition System” in 2010. These three projects received ITIDA awards(link). 

For MMNLP Experiences, I invented the concept of Multi-level(ML) MindMaps, which is defined as a method to jointly visualize and summarizetextual information. The visualization is achieved pictorially acrossmultiple levels using semantic information (i.e. ontology), while thesummarization is achieved by the information in the highest levels as theyrepresent the abstract information in the text. In contrast to prior work,ML-MindMap representation gives a meaningful control to learn about thedirection of details that the user might be interested in, starting from theroot-level. This work resulted in  "English2MindMap" paper that proposesthe first automation of this concept, which was published in the InternationalSymposium on Multimedia, Dec 2012; a journal version was submitted to InformationProcessing &Management.  This workhas also a US-Patent Pending. (projectwebsite)