Publications:

*: My advisee; #: visiting student (work done/started at OSU); ‡: equal contribution; also see discussions on Twitter.

Refereed Publications:

  • Xiang Deng*, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang*, Huan Sun, Yu Su, “Mind2Web: Towards a Generalist Agent for the Web,” The Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS'23, Spotlight)[paper, project website]
  • Chang-You Tai*‡, Ziru Chen*‡, Tianshu Zhang*, Xiang Deng*, Huan Sun, “Exploring Chain-of-Thought Style Prompting for Text-to-SQL,” The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP'23, long paper)[paper]
  • Kai Zhang‡, Lingbo Mo*‡, Wenhu Chen, Huan Sun, Yu Su, “MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing,” The Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS'23, poster)[paper, project website]
  • Xiang Yue*, Boshi Wang*, Kai Zhang, Ziru Chen*, Yu Su, Huan Sun, “Automatic Evaluation of Attribution by Large Language Models,” Findings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023 (Findings of EMNLP'23, long paper) [paper, code, data]
  • Boshi Wang*, Xiang Yue*, Huan Sun, “Can ChatGPT Defend the Truth? Automatic Dialectical Evaluation Elicits LLMs’ Deficiencies in Reasoning,” Findings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023 (Findings of EMNLP'23, long paper) [paper]
  • Shijie Chen, Ziru Chen*, Huan Sun, Yu Su, “Error Detection for Text-to-SQL Semantic Parsing,” Findings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023 (Findings of EMNLP'23, long paper) [paper]
  • Boshi Wang*, Sewon Min, Xiang Deng*, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun, “Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters,” The Annual Conference of the Association for Computational Linguistics, 2023 (ACL 2023, long paper). Honorable Mention for Best Paper Awards. [paper, code]
  • Xiang Yue*, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim, “Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe,” The Annual Conference of the Association for Computational Linguistics, 2023 (ACL 2023, long paper). Honorable Mention for Best Paper Awards. [paper]
  • Bernal Jimenez Gutierrez, Huan Sun, Yu Su, “Biomedical Language Models are Robust to Sub-optimal Tokenization,” The 22nd BioNLP Workshop at ACL, 2023 (BioNLP 2023, long paper). [paper, code]
  • Tianshu Zhang*, Changchang Liu, Wei-Han Lee, Yu Su, Huan Sun, “Federated Learning for Semantic Parsing: Task Formulation, Evaluation Setup, New Algorithms,” The Annual Conference of the Association for Computational Linguistics, 2023 (ACL 2023, long paper). [paper, code]
  • Ziru Chen*, Shijie Chen, Michael White, Raymond Mooney, Ali Payani, Jayanth Srinivasa, Yu Su, Huan Sun, “Text-to-SQL Error Correction with Language Models of Code,” The Annual Conference of the Association for Computational Linguistics, 2023 (ACL 2023, short paper). [paper, code]
  • Zhen Wang*, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim, “Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning,” The Eleventh International Conference on Learning Representations (ICLR 2023). [paper, code]
  • Boshi Wang*, Xiang Deng*, Huan Sun, “Iteratively Prompt Pre-trained Language Models for Chain of Thought,” The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022, long paper). [paper, code]
  • Bernal Jiménez Gutiérrez, Nikolas McNeal, Clay Washington, You Chen, Lang Li, Huan Sun, Yu Su, “Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again,” Findings of the 2022 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP 2022, long paper). [paper, code]
  • Shijie Chen, Ziru Chen*, Huan Sun, Yu Su, “Error Detection for Interactive Text-to-SQL Semantic Parsing,” The 2nd Workshop on Interactive Learning for Natural Language Processing at NeurIPS, 2022 (InterNLP'22). [paper]
  • Shijie Chen‡, Ziru Chen‡, Xiang Deng†, Ashley Lewis†, Lingbo Mo†, Samuel Stevens† Zhen Wang†, Xiang Yue†, Tianshu Zhang†, Yu Su, Huan Sun, “Bootstrapping a User-Centered Task-Oriented Dialogue System,” 1st Proceedings of Alexa Prize TaskBot (Alexa Prize 2021, report finished in 2022; Third Place Winner with $50K prize for students). [paper, project website] (‡: Team Lead; †: Equal Contribution)
  • Xiang Yue*, Ziyu Yao, Huan Sun, “Synthetic Question Value Estimation for Domain Adaptation of Question Answering,” The 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022, long). [paper, code]
  • Lingbo Mo*, Ashley Lewis, Huan Sun, Michael White, “Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction,” Findings of the 60th Annual Meeting of the Association for Computational Linguistics (Findings of ACL 2022, long). [paper, data&code] (A shortened version was also accepted by the 2nd Workshop on Interactive Learning for Natural Language Processing (InterNLP 2022) co-located with NeurIPS 2022.)
  • Xiang Deng*, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Huan Sun, “DOM-LM: Learning Generalizable Representations for HTML Documents,” arXiv 2022.
  • Xiang Yue*, Xinliang Frederick Zhang*, Ziyu Yao*, Simon Lin, Huan Sun, “CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering,” 2021 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2021, long paper). Best Paper Award. [paper, code]
  • Xiang Deng*, Yu Su, Alyssa Lees, You Wu, Cong Yu, Huan Sun, “ReasonBERT: Pre-trained to Reason with Distant Supervision,” The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, long paper). [paper, code]
  • Xinliang Frederick Zhang*, Heming Sun*, Xiang Yue*, Simon Lin, Huan Sun, “COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval,” The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, short paper). [paper, code]
  • Boyuan Pan#, Yazheng Yang, Cai Deng, Huan Sun, “TopNet: Learning from Neural Topic Model to Generate Long Stories,” The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2021, research track, acceptance rate: ~15.4%). [paper, code]
  • Xiang Yue*, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, Sherman S. M. Chow, “Differential Privacy for Text Analytics via Natural Text Sanitization,” Findings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Findings of ACL-IJCNLP 2021, long). [paper, code]
  • Xiang Deng*, Ahmed Hassan Awadallah, Christopher Meek, Oleksandr Polozov, Huan Sun, Matthew Richardson, “Structure-Grounded Pretraining for Text-to-SQL,” The 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021, long). [paper, code]
  • Ziyu Yao*, Frank F. Xu, Pengcheng Yin, Huan Sun, Graham Neubig, “Learning Structural Edits via Incremental Tree Transformations,” The Ninth International Conference on Learning Representations 2021 (ICLR'21). [paper, code]
  • Xiang Deng*, Huan Sun, Alyssa Lees, You Wu, Cong Yu, “TURL: Table Understanding through Representation Learning,” 47th International Conference on Very Large Data Bases (VLDB'21). Selected for the 2022 ACM SIGMOD Research Highlight Award. [paper, code, an earlier version on arXiv]
  • Zhen Wang*, Bo Zong, Huan Sun, “Modeling Context Pair Interaction for Pairwise Tasks on Graphs,” The 14th International Conference on Web Search and Data Mining (WSDM'21, acceptance rate: ~18.6%) [paper, code]
  • Kaushik Mani*, Xiang Yuh*, Bernal Jimenez Gutierrez*, Yungui Huang, Simon Lin, and Huan Sun, “Clinical Phrase Mining with Language Models,” IEEE International Conference on Bioinformatics and Biomedicine 2020 (BIBM'20, short). [paper, code, a longer version] (The first two authors contributed equally.)
  • Xiang Yue*, Xinliang (Frederick) Zhang*, Ziyu Yao*, Simon Lin, and Huan Sun, “CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering,” arXiv, 2020. [paper, code] (The first two authors contributed equally.)
  • Ziyu Yao*, Yiqi Tang, Wen-tau Yih, Huan Sun, Yu Su, “An Imitation Game for Learning Semantic Parsers from User Interaction,” 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP'20, long). [paper, code, an earlier version on arXiv]
  • Bernhard Kratzwald#, Stefan Feuerriegel, Huan Sun, “Learning a Cost-Effective Annotation Policy for Question Answering,” 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP'20, long). [paper, code]
  • Jie Zhao*, Huan Sun, “Adversarial Training for Code Retrieval with Question-Description Relevance Regularization,” Findings of 2020 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP'20, A new acceptance category, long). [paper, code] [Scores of this paper (reviewed with all papers to EMNLP'20): 4/4/4, with 4 being "Strong: I learned a lot from it. I would like to see it accepted" under a rating scale of 1-5 (5 being the highest)]
  • Zhen Wang*, Jennifer Lee, Simon Lin, Huan Sun, “Rationalizing Medical Relation Prediction from Corpus-level Statistics,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL'20, long). [paper, code]
  • Xiang Yue*, Bernal Jimenez*, Huan Sun, “Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL'20, long). [paper, code]
  • Jiankai Sun, Jie Zhao*, Huan Sun, Srinivasan Parthasarathy, “EndCold: An End-to-End Framework for Cold Question Routing in Community Question Answering Services,” The 29th International Joint Conference on Artificial Intelligence (IJCAI'20). [paper]
  • Jie Zhao*, Xiang Deng*, Huan Sun, “Easy-to-Hard: Leveraging Simple Questions for Complex Question Generation,” arXiv, 2019. [paper, code]
  • Ziyu Yao*, Yu Su, Huan Sun, Wen-tau Yih, “Model-based Interactive Semantic Parsing: A Unified Formulation and A Text-to-SQL Case Study,” 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP'19, long). [paper, code]
  • Xiang Deng*, Huan Sun, “Leveraging 2-hop Distant Supervision from Table Entity Pairs for Relation Extraction,” 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP'19, long). [paper, code]
  • Xiang Yue*, Zhen Wang*, Jingong Huang*, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon M. Lin, Wen Zhang, Ping Zhang, and Huan Sun, “Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations,” Bioinformatics, 2019 [paper, code]
  • Boyuan Pan#, Hao Li, Ziyu Yao*, Deng Cai, Huan Sun, “Reinforced Dynamic Reasoning for Conversational Question Generation,” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL'19, long). [paper, code]
  • Jie Zhao*, Ziyu Guan, Huan Sun, “Riker: Mining Rich Keyword Representations for Interpretable Product Question Answering,” The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (SIGKDD'19, research track, acceptance rate: ~14.2%, poster). [paper, code]
  • (SIGKDD'19: ~110 oral + ~60 poster presentations selected from ~1200 submissions)
  • Zhen Wang*, Xiang Yue*, Soheil Moosavinasab, Yungui Huang, Simon Lin and Huan Sun, “SurfCon: Synonym Discovery on Privacy-Aware Clinical Data,” The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (SIGKDD'19, research track, acceptance rate: ~14.2%, oral). [paper, code]
  • Z. Yao*, J. Peddamail*, H. Sun, “CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning,” The Web Conference (former WWW Conference) 2019 (WWW'19, acceptance rate: 18%, Oral + Poster). [paper, code]
  • Z. Yao*, X. Li, J. Gao, B. Sadler, H. Sun, “Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning,” The AAAI Conference on Artificial Intelligence 2019 (AAAI’19, acceptance rate: 16.2%). [paper, code]
  • L. Chen, Z. Guan, W. Zhao, W. Zhao, X. Wang, Z. Zhao, H. Sun, “Answer Identification from Product Reviews for User Questions by Multi-task Attentive Networks,” The AAAI Conference on Artificial Intelligence 2019 (AAAI’19, acceptance rate: 16.2%). [paper]
  • W. Zhao, Z. Guan, Y. Huang, T. Xi, H. Sun, Z. Wang, X. He, “Discerning Influence Patterns with Beta-Poisson Factorization in Microblogging Environments,” Transactions on Knowledge and Data Engineering (TKDE 2019). [paper]
  • J. Peddamail*, Z. Yao*, Z. Wang*, H. Sun, “A Comprehensive Study of StaQC for Deep Code Summarization,” SIGKDD Deep Learning Day 2018. [paper, slides] (SPOTLIGHT)
  • Z. Yao*, D. S. Weld, W.P. Chen, H. Sun, “StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow,” The Web Conference (former WWW Conference) 2018 (WWW'18, acceptance rate: 14.8%). [paper, code]
  • Y. Su, H. Liu, S. Yavuz, I. Gur, H. Sun, X. Yan, “Global Relation Embedding for Relation Extraction,” In Proc. of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018 (NAACL-HLT’18, long). [paper, code]
  • J. Zhao*, Y. Su, Z. Guan, H. Sun, “An End-to-End Deep Framework for Answer Triggering with a Novel Group-Level Objective,” Empirical Methods in Natural Language Processing 2017 (EMNLP'17). [paper, code]
  • Y. Li, N. Du, C. Liu, Y. Xie, W. Fan, Q. Li, J. Gao, H. Sun, “Reliable Medical Diagnosis from Crowdsourcing: Discover Trustworthy Answers from Non-Experts,” ACM Int. Conf. on Web Search and Data Mining 2017 (WSDM’17). [paper]
  • C. Liu, H. Sun, N. Du, S. Tan, H. Fei, W. Fan, T. Yang, H. Wu, Y. Li, C. Zhang, “Augmented LSTM Framework to Construct Medical Self-diagnosis Android,” IEEE Int. Conf. on Data Mining 2016 (ICDM’16). [paper]
  • Y. Su, H. Sun, B. Sadler, M. Srivatsa, I. Gur, Z. Yan, X. Yan, “On Generating Characteristic-rich Question Sets for QA Evaluation ,” Empirical Methods in Natural Language Processing 2016 (EMNLP'16, long). [paper, appendix, New Question-Answer Set (with rich characteristics to train more advanced QA systems)]
  • H. Sun, H. Ma, X. He, W. Yih, Y. Su, X. Yan, “Table Cell Search for Question Answering,” The 25th Int. World Wide Web Conference (WWW'16). [paper]
  • Y. Li, S. Tan, H. Sun, J. Han, D. Roth, X. Yan, “Entity Disambiguation with Linkless Knowledge Bases,” The 25th Int. World Wide Web Conference (WWW'16). [paper]
  • F. Han, S. Tan, H. Sun, X. Yan, M. Srivatsa, D. Cai, “Distributed Representations of Expertise,” SIAM Int. Conf. on Data Mining 2016 (SDM'16). [paper]
  • H. Sun, H. Ma, W. Yih, C. Tsai, J. Liu, M. Chang, “Open Domain Question Answering via Semantic Enrichment,” The 24th Int. World Wide Web Conference (WWW'15, acceptance rate: 14.1%). [paper]
  • Y. Su, S. Yang, H. Sun, M. Srivatsa , S. Kase, M. Vanni, X. Yan, "Exploiting Relevance Feedback in Knowledge Graph Search”, Proc. of the 21st Int. Conf. on Knowledge Discovery and Data Mining (KDD’15, acceptance rate: 19.4%). [paper]
  • Z. Guan, S. Yang, H. Sun, M. Srivatsa, X. Yan, “Fine-Grained Knowledge Sharing in Collaborative Environments ,” Transactions on Knowledge and Data Engineering (TKDE 2015). [paper]
  • H. Sun, M. Srivatsa, S. Tan, Y. Li, L. Kaplan, S. Tao, X. Yan, “Analyzing Expert Behaviors in Collaborative Networks,” Proc. of the 20th Int. Conf. on Knowledge Discovery and Data Mining (KDD'14, acceptance rate: 14.6%). [paper, slides, poster, Source Code]
  • S. Yang, Y. Wu, H. Sun, X. Yan, “Schemaless and Structureless Graph Querying,” Proc. of Int. Conf. on Very Large Data Bases (VLDB'14).[paper, poster]
  • S. Yang, Y. Xie, Y. Wu, T. Wu, H. Sun, J. Wu, X. Yan, “SLQ: A User-friendly Graph Querying System,” Proc. of Int. Conf. on Management of Data (SIGMOD'14, Demo Track ).
  • H. Sun, M. Srivatsa, L. Kaplan, X. Yan, “Analyzing Expert Behaviors in Collaborative Networks,” International School and Conference on Network Science 2014 (NetSci'14)
  • N. Li, H. Sun, K. Chipman, J. George, X. Yan,“A Probabilistic Approach to Uncovering Attributed Graph Anomalies,” SIAM Int. Conf. on Data Mining 2014 (SDM'14, acceptance rate: 15.4%).[paper]
  • H. Sun, A. Morales, X. Yan,“Synthetic Review Spamming and Defense,” Proc. of the 19th Int. Conf. on Knowledge Discovery and Data Mining(KDD'13, acceptance rate: 17%). [paper, poster, Demo]
  • S. Tan, Y. Li, H. Sun, Z. Guan, X. Yan, J. Bu, C. Chen, X.He. “Interpreting the Public Sentiment Variations on Twitter” , Transactions on Knowledge and Data Engineering (TKDE 2014) .[paper]
  • H. Sun, G. Miao, X. Yan, “Noise-Resistant Bicluster Recognition,” IEEE Int. Conf. on Data Mining 2013 (ICDM'13, Oral presentation, acceptance rate: 11.6%).[paper] [slides][homepage] [A talk related to deep learning literature and techniques in this paper]
  • A. Morales, H. Sun, X. Yan,“Synthetic Review Spamming and Defense,” Proc. Of the 22nd International World Wide Web Conference(WWW'13, Companion Volume).
  • H. Sun, G. Miao, X. Yan, “Noise-Resistant Bicluster Recognition,” the 17th Annual International Conference on Research in Computational Molecular Biology (RECOMB'13, Poster).

Dissertations I advised:

Tutorials:

  • J. Pujara, P. Szekely, H. Sun, M. Chen. “From Tables to Knowledge: Recent Advances in Table Understanding,” Tutorials of KDD'21 (co-presenter). [website][slides (Part III)]
  • F. Zhu, H. Sun, X. Yan. “Network Mining and Analysis for Social Applications,” Tutorials of KDD'14 (co-presenter). [slides]

Miscellaneous:

  • Spatial Continuity Constrained Robust PCA for Recovering Images with Continuous Corruption, Intership work during 01/2010~06/2010, supervised by Dr. Yi Ma at MSRA. Excellent Graduation Thesis Award of USTC (top 5%) in 2010
  • Rating prediction of Collaborative Filtering recommendation systems, Undergraduate Research Project during 06~09/2009, supervised by Prof. Nenghai Yu at USTC. Excellent Undergraduate Research Project Scholarship (University-wide top 20%) in 2009