Haoxuan You (有昊轩)


I am a Research Scientist at Apple AI/ML Foundation Models. I work on foundamental problems in vision-and-language, with an emphasis on scalable, unified and generalizable models/methods.

I am a fifth-year Computer Science PhD student received my PhD degree at the Columbia University, advised by Prof. Shih-Fu Chang and co-advised by Prof. Kai-Wei Chang from UCLA. Previously I received a Bachelor dregree from Xidian University in 2018. Then I spent a gap year working as a Research Assistant in Tsinghua University advised by Prof. Yue Gao, in the middle of which, I visited MCL lab in University of Southern California, advised by Prof. C.-C. Jay Kuo.

In my Ph.D. study, I am fortunate to intern at Microsoft Azure Cognitive Services Research (Mentor: Luowei Zhou), Google Research (Mentor: Jiahui Yu, Mandy Guo and Jason Baldridge), and Apple AI/ML (Mentor: Liangliang Cao, Zhe Gan and Yinfei Yang).

Opening: I am looking for a 2025 summer research intern with the focus of Vision-Language Modeling/Multimodal LLM at Apple AI/ML, feel free to drop me an email.

CV Mail Scholar Github LinkedIn Twitter

Profile picture
📷 credit to: my wife Xiaohui

Selected Publications (Full List)

Project image
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
A significant and comprehensive upgrade of MM1
arxiv 2024
Project Page / Paper
Project image
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Haotian Zhang*, Haoxuan You*, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Wang, Shih-Fu Chang, Yinfei Yang
COLM 2024
Project Page / Paper
Project image
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Haoxuan You*, Haotian Zhang*, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang
ICLR 2024, Spotlight (5% Acceptance Rate)
Project Page / Paper / Code
Project image
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Haoxuan You, Mandy Guo, Zhecan Wang, Kai-Wei Chang, Jason Baldridge, Jiahui Yu
ICLR 2024
Project Page / Paper
Project image
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You*, Rui Sun*, Zhecan Wang*, Long Chen, Gengyu Wang, Hammad Ayyubi, Kai-Wei Chang, Shih-Fu Chang
Empirical Methods in Natural Language Processing - Findings (EMNLP-Findings), 2023
Project Page / Paper / Code
Project image
Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Haoxuan You, Rui Sun*, Zhecan Wang*, Kai-Wei Chang, Shih-Fu Chang
Empirical Methods in Natural Language Processing - Findings (EMNLP-Findings), 2022
Project Page / Paper / Code
Project image
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You*, Luowei Zhou*, Bin Xiao*, Noel Codella*, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan
Proc. of the European Conf. on Computer Vision (ECCV), 2022
Project Page / Paper / Code
Project image
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Zhecan Wang*, Haoxuan You*, Liunian Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang
Proc. of the AAAI Conference on Artificial Intelligence (AAAI), 2022
Project Page / Paper
Project image
Rethinking network design and local geometry in point cloud: A simple residual MLP framework
Xu Ma, Can Qin, Haoxuan You, Haoxi Ran, Yun Fu
The International Conference on Learning Representations (ICLR), 2022
Project Page / Paper / Code
Project image
Unsupervised vision-and-language pre-training without parallel images and captions
Liunian Li, Haoxuan You*, Zhecan Wang*, Alireza Zareian, Shih-Fu Chang, Kai-Wei Chang
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Project Page / Paper / Code
Project image
Learning Visual Commonsense for Robust Scene Graph Generation
Alireza Zareian*, Zhecan Wang*, Haoxuan You*, Shih-Fu Chang
Proc. of the European Conf. on Computer Vision (ECCV), 2020
Project Page / Paper / Code
Project image
PointHop: An Explainable Machine Learning Method for Point Cloud Classification
Min Zhang, Haoxuan You*, Pranav Kadam, Shan Liu, C-C Kuo (*Corresponding Author)
IEEE Transactions on Multimedia, 2020
Project Page / Paper / Code
Project image
PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation
Can Qin*, Haoxuan You*, Lichen Wang, C-C Kuo, Yun Fu
Advances in Neural Information Processing Systems (NeurIPS), 2019
Project Page / Paper / Code
Project image
PVRNet: Point-View Relation Neural Network for 3D Shape Recognition
Haoxuan You, Yifan Feng, Xibin Zhao, Changqing Zou, Rongrong Ji, Yue Gao
Proc. of the AAAI Conference on Artificial Intelligence (AAAI), 2019
Project Page / Paper / Code
Project image
Hypergraph Neural Networks
Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, Yue Gao
Proc. of the AAAI Conference on Artificial Intelligence (AAAI), 2019
Project Page / Paper / Code
Project image
PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition
Haoxuan You, Yifan Feng, Rongrong Ji, Yue Gao
Proc. of ACM international conference on Multimedia (ACM-MM), 2018
Project Page / Paper

Website template borrowed from Michael Niemeyer.