Research and Collaborators

Collaborators

Leo Guibas, Stanford University
Marcus Rohbrach Research Scientist at Facebook AI Research (FAIR),  co-author
Devi Parikh, Georgia Tech/FAIR MPK,   co-author
Dhruv Batra, Georgia Tech/FAIR MPK,   co-author
Camille Couprie, FAIR Paris,   co-author
Othman Sbai, FAIR Paris,   co-author
Antoine Bordes, FAIR Paris,   co-author
Yann LeCun, Professor of Computer Scienc at NYU and Director of AI Research at Facebook.
Sayna Ebrahimi, PhD Student at UC Berkeley, co-author
Othman Sbai, University of Paris;  co-author
Tinne Tuytelaars, Professor of Computer Science at KULauven;  co-author
Ahmed Elgammal,  Professor of Computer Science at Rutgers University: MS & PhD advisor.
Dimitris Metaxas, Professor of Computer Science at Rutgers University: co-author
Harpreet Sawhney(CTO SRI International): Mentor at SRI International,   co-author
Hui Cheng (Program Director, SRI International): Mentor at SRI International,   co-author
Ji Zhang, Rutgers, co-author
Ethan Zhu, Rutgers, co-author'
Mennatullah Siam, University of Alberta;  co-author
Tao Xu, Lehigh University: co-author;  co-author
Trevor Darrel, Professor at UC Berkeley, co-author
Philip Torr, University of Oxford
Arslan Chaudry,University of Oxford
Scott Cohen, Adobe Research
Brian Price, Adobe Research
Walter Chang, Adobe Research
Manohar Paluri, Facebook Research
Yannis Kalantidis, Facebook Research
Jingen Liu (Senior Research Scientist, SRI International): Mentor at SRI International,   co-author
Xiaolei Huang (Associate Professor of Computer Science and Engineering at Lehigh University):  co-author
Shaoting Zhang (Associate Professor of Computer Science at  UNC at Charlotte):  co-author
Scott Cohen (Principal Scientist at Adobe Research): Mentor at Adobe Research,   co-author
Brian Price (Senior Research Scientist at Adobe Research): Mentor at Adobe Research, co-author
Walter Chang (Principal Scientist at  Adobe Research): Mentor at Adobe Research, co-author
Sheng Huang, Chongqing University: co-author
Han Zhang, Rutgers University: co-author
Hossam Faheem (Ain Shams University, Egypt); co-author

Research Statement: Imagination Inspired Computer Vision
Imagination is one of the key properties of human intelligence that enables us not only to generate creative products like art and music but also to understand the visual world. My research focuses mostly on developing imagination-inspired techniques that empower AI machines to see (computer vision) or to create (e.g., fashion and art); “Imagine to See” and “Imagine to Create”.
Imagine to See. There are over 10,000 living bird species, yet most computer vision datasets of birds have only 200-500 categories. Typically, there are few images available for training classifiers for most of these categories (long-tail classes) and hence the number of training images per category shows a Zipf distribution [25]. How could imagination help understand visual classes with zero/few examples? Many people might not know what “Parakeet Auklet” is but can imagine it when described in language by saying that “Parakeet Auklet is a bird that has an orange bill, dark above and white below.”. If we give this description to an average person, he will be able to select the relevant bird among other different birds, due to our capability to imagine the “Parakeet Auklet” class from the language description. I have worked on setting up tasks that are inspired by imagination to study visual recognition of unseen classes and long-tail classes guided by language. Since ICCV2013, I have proposed and pioneered the “Write a Classifier” task [8,13,16,17,44] which got recently recognized at the United Nations conference on biodiversity[45]. Recently expanding from the “Sherlock” model and dataset I proposed at AAAI17 [20,21,46], I have been working on developing methods with a deeper understanding and a capacity to learn an ever-growing set of concepts [23,24] (covered by the media); see the figure (left). Learning millions of these concepts impacts not only our relevant content experiences on Google and Facebook but also gets us closer to helpful robots that are continually learning like us about the visual world.

Imagine to Create. Imagination has been the source of novel ideas that enable humanity to progress at an ever-faster rate. Creative AI is a relatively understudied direction in machine learning where the goal is for machines to generate original items with realistic, aesthetic and/or thoughtful attributes, usually in artistic contexts. In the short term, Creative AI has a high potential to speed up our rate of generating creative products like paintings, music, animations, etc. as a source of inspiration. As I detail later, I have worked on modeling Creative AI to produce art [18] and fashion [19]; see the figure (right). Our pioneering creativity work grabbed attention from the scientific community, media, and industry. One of the exciting results we achieved in [19] is that our model was able to create new pants with additional arm sleeves (non-existing in the dataset). The surprising aspect of this design is that professional fashion designers found it inspirational for designing new pants, showing how creativity may impact the fashion industry. I am also excited about future exploration of Creative AI in producing 3D models, videos, and animation. It also may help imagine unseen likely dangerous situations to make self-driving cars more reliable.

Key Projects

Sherlock (2016-present)

Goal Scalable Fact Learning images (objects, attributes, actions, and interactions),
publications at AAAI2017 and ACLW2016

Write a Classifier (2013-present)
Goal Language Guided Visual Perception of object categories by pure text articles.
publications at ICCV 2013, CVPRW 2015, EMNLPW 2015, TPAMI 2016, ongoing work.

For Publications, please visit Publications tab. This page is NOT up to date

I have published some papers in the top Computer Vision conferences (e.g. ICCV, ICIP, CVPR), where I have gained experiences in five main tasks

Zero Shot Learning for Object Recognition and Multimedia Retrieval from text and Attributes

Write a Classifier Project: Zero Shot Learning from Pure Text description for Fine-Grained Categories, ICCV13, ICCVW13, TPAMI15(to be submitted).
Hypergraph based approach for Zero-Shot Learning, N-Shot Learning, and Attribute Prediction, CVPR 2015
Image Classification and Matching by local descriptors (e.g. SIFT), ICIP13

Deep Learning

Joint Category and Pose Recognition with Deep Neural Networks, submitted
Weather Classification using Convolutional Neural Networks, ICIP, 2015

Structured Regression

Structured Regression: Structured Regression Methods applied to 3D Pose Estimation, Image Reconstruction, Toy Examples, Machine Learning Journal (accepted).

Natural Language Processing and Multimedia

English2MindMap: Pure Text In- Hierarchical Pictorial Visualization Out, SWWS09, ISM12, MTAP15.
Undergraduate-Research Mentored Projects (2007-2011),

Video-Surveillance Systems

Activity Recognition and Motion Characterization, CVPRW13
Undergraduate-Research Mentored Projects (2007-2011), IPCV09

References
1.      M. Elhoseiny, J. Liu, H. Cheng, H. Sawhney, A. Elgammal, “Zero Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos”, AAAI,  2016, acceptance rate 25.6%.
2.      A. Baky, T. El-Gaaly, M. Elhoseiny, A. Elgammal, “Joint Object Recognition and Pose Estimation using a Nonlinear View-invariant Latent Generative Mode”, WACV, 2016, algorithms track acceptance rate 30%.
3.      M. Elhoseiny, A. Elgammal, “Overlapping Domain Cover for Scalable and Accurate Kernel Regression Machines”, BMVC (oral), 2015, oral acceptance rate 7%.
4.      M. Elhoseiny, S. Huang, A El.gammal, “Weather Classification with deep Convolutional Neural Networks”, ICIP, 2015
5.      M. Elhoseiny, A. Elgammal, “Generalized Twin Gaussian Processes using Sharma-Mittal Divergence”, Machine Learning journal, 2015
6.      S. Huang, M. Elhoseiny, A. Elgammal, “Learning Hypergraph-regularized Attribute Predictors”, CVPR, 2015  (28.4%)
7.      M. Elhoseiny, A. Elgammal, B. Saleh, “Tell and Predict: Kernel Classifier Prediction for Unseen Visual Classes from Unstructured Text Descriptions”, Arxiv, 2015, presented in CVPR15  and EMNLP15 Workshosp on Language and Vision.
8.      M. Elhoseiny, B. Saleh, A. Elgammal, “Write a Classifier: Zero Shot Learning Using Purely Textual Descriptions”, ICCV, 2013  (27.8%)
9.      M. Elhoseiny, B. Saleh, A. Elgammal, “Heterogeneous Domain Adaptation: Learning Visual Classifiers from Textual Description”, VisDA Workshop, ICCV, 2013 (presentation)
10.    M. Elhoseiny, S. Cohen, W. Chang, B. Price, A. Elgammal, “Sherlock: Modeling Structured Knowledge in Images”, AAAI, 2017
11.    M. Elhoseiny, T. El-Gaaly, A. Bakry, A. Elgammal, “A Comparative Analysis and Study of Multiview Convolutional Neural Network Models for Joint Object Categorization and Pose Estimation”, ICML, 2016
12.    A. Bakry, M. Elhoseiny, T. El-Gaaly, A. Elgammal, “Digging Deep into the layers of CNNs: In Search of How CNNs Achieve View Invariance”, ICLR, 2016
13.    M. Elhoseiny, A. Elgammal, B Saleh,“Write a Classifier:  Predicting Visual Classifiers from Unstructured Text Descriptions”, TPAMI, 2016
14.    M.Welling, “Are ML and Statistics Complementary” http://www.ics.uci.edu/~welling/publications/papers/WhyMLneedsStatistics.pdf,  2015
15.    Reed, S, Z Akata, X. Yan, Lajanugen Logeswaran, B. Schiele, and H. Lee. "Generative adversarial text to image synthesis.", ICML, 2016
16.    M. Elhoseiny, Y. Zhu, H. Zhang, A. Elgammal, Link the head to the "peak'': Zero Shot Learning from Noisy Text descriptions at Part Precision, CVPR, 2017
17.    Y. Zhu, M. Elhoseiny, B. Liu, A. Elgammal “Imagine it for me: Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts”, CVPR 2018
18.    A. Elgammal, B. Liu, M. Elhoseiny, M. Mazzone, “CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms”, International Conference on Computational Creativity, 2017
19.    O. Sbai, M. Elhoseiny, C. Couprie, and Y. LeCun, “DesIGN: Design Inspiration from Generative Networks”, ECCVW18, best paper award, JMLR submission, 2018
20.    M. Elhoseiny, S. Cohen, W. Chang, B. Price, A. Elgammal, “Automatic Annotation of Structured Facts in Images”, ACL Proceedings of the Vision&Language Workshop, 2016
21.    M. Elhoseiny, S. Cohen, W. Chang, B. Price, A. Elgammal, “Sherlock: Scalable Fact Learning in Images”, AAAI, 2017
22.    R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, T. Tuytelaars. “Memory Aware Synapses: Learning what (not) to forget”, ECCV, 2018.
23.    M. Elhoseiny, F. Babiloni, R. Aljundi, M. Rohrbach, T. Tuytelaars. “An Evaluation of Large-Scale Lifelong Fact Learning”,  ACCV, 2018
24.    A. Elgammal, B. Liu, D. Kim, M. Elhoseiny, M. Mazzone,  “The Shape of Art History in the Eyes of the Machine",  AAAI, 2018
25.    R. Salakhutdinov, A. Torralba, J. Tenenbaum, “Learning to Share Visual Appearance for Multiclass Object Detection”, CVPR, 2011
26.    https://www.technologyreview.com/s/609710/neural-networks-are-learning-what-to-remember-and-what-to-forget/
27.    https://www.newscientist.com/article/2139184-artificially-intelligent-painters-invent-new-styles-of-art/
28.    https://www.technologyreview.com/s/608195/machine-creativity-beats-some-modern-art/
29.    https://www.bostonglobe.com/arts/2017/08/03/can-art-created-algorithms/2MGWapvSfJVJOl8Sq1OIPO/story.html
30.    C. Martindale. “The clockwork muse: The predictability of artistic change.” Basic Books, 1990.
31.    Antol, S., A., Lu, J., Mitchell, M., Batra, D., L. Zitnick, C. and Parikh, D., 2015. Vqa: Visual question answering. ICCV, 2015
32.    Das, A., Kottur, S., Gupta, K., Singh, A., Yadav, D., Moura, J.M., Parikh, D. and Batra, D.. Visual dialog, CVPR, 2017
33.    https://i.pinimg.com/originals/df/fe/a0/dffea08fdeda5fb3d4e241fc8d8cf538.jpg
34.    Zhu, J. Y., Park, T., Isola, P., & Efros, A. A.. Unpaired image-to-image translation using cycle-consistent adversarial networks, ICCV, 2017
35.    https://sites.google.com/site/digihumanlab/products-services
36.    https://publishingperspectives.com/2017/10/frankfurt-book-fair-arts-equals-tech-innovation/
37.    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, & Bengio, Y. Generative adversarial nets, NIPS, 2014
38.    Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. Neural module networks, CVPR, 2016
39.    Hu, R., Andreas, J., Rohrbach, M., Darrell, T., & Saenko, K. End-to-end module networks for visual question answering, ICCV, 2017
40.    Johnson, J., Hariharan, B., van der Maaten, L., Hoffman, J., Fei-Fei, L., Zitnick, C. L., & Girshick, R. Inferring and Executing Programs for Visual Reasoning, ICCV, 2017
41.    A. Chaudhry, M. Ranzato, M. Rohrbach, M. Elhoseiny, “Efficient Lifelong Learning with A-GEM”, ICLR submission, 2019
42.    S. Ebrahimi, M. Elhoseiny, T. Darrell,M. Rohrbach, “Uncertainty-guided Lifelong Learning in Bayesian Networks”, ICLR submission, 2019
43.    M. Elfeki, C. Couprie, M. Elhoseiny, “Learning Diverse Generations using Determinantal Point Processes”, ICLR submission, 2019
44.    M Elhoseiny, M Elfeki, “Creativity Inspired Zero Shot Learning”, CVPR, 2019  submission
45.    https://www.facebook.com/UNBiodiversity/videos/2221370044851625
46.    J. Zhang, Y. Kalantidis, M. Rohrbach, M Paluri, A. Elgammal, M Elhoseiny. Large-Scale Visual Relationship Understanding, AAAI, 2019

Subpages (1):Natural Language Processing and Multmedia Projects