Conferences/Journals:

*: My advisee; #: visiting student (work done/started at OSU). A few papers might appear under multiple topics.

On semantic parsing and NLP/ML for automated programming:

  • Xiang Deng*, Ahmed Hassan Awadallah, Christopher Meek, Oleksandr Polozov, Huan Sun, Matthew Richardson, “Structure-Grounded Pretraining for Text-to-SQL,” The 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021). [paper, code, an earlier version]
  • Ziyu Yao*, Frank F. Xu, Pengcheng Yin, Huan Sun, Graham Neubig, “Learning Structural Edits via Incremental Tree Transformations,” The Ninth International Conference on Learning Representations 2021 (ICLR'21). [paper, code]
  • Ziyu Yao*, Yiqi Tang, Wen-tau Yih, Huan Sun, Yu Su, “An Imitation Game for Learning Semantic Parsers from User Interaction,” 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP'20). [paper, code, an earlier version on arXiv]
  • Jie Zhao*, Huan Sun, “Adversarial Training for Code Retrieval with Question-Description Relevance Regularization,” Findings of 2020 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP'20, A new acceptance category). [paper, code] [Scores of this paper (reviewed with all papers to EMNLP'20): 4/4/4, with 4 being "Strong: I learned a lot from it. I would like to see it accepted" under a rating scale of 1-5 (5 being the highest)]
  • Ziyu Yao*, Yu Su, Huan Sun, Wen-tau Yih, “Model-based Interactive Semantic Parsing: A Unified Formulation and A Text-to-SQL Case Study,” 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP'19). [paper, code]
  • Z. Yao*, J. Peddamail*, H. Sun, “CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning,” The Web Conference (former WWW Conference) 2019 (WWW'19, acceptance rate: 18%, Oral + Poster). [paper, code]
  • Z. Yao*, X. Li, J. Gao, B. Sadler, H. Sun, “Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning,” The AAAI Conference on Artificial Intelligence 2019 (AAAI’19, acceptance rate: 16.2%). [paper, code]
  • Z. Yao*, D. S. Weld, W.P. Chen, H. Sun, “StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow,” The Web Conference (former WWW Conference) 2018 (WWW'18, acceptance rate: 14.8%). [paper, code]
  • J. Peddamail*, Z. Yao*, Z. Wang*, H. Sun, “A Comprehensive Study of StaQC for Deep Code Summarization,” SIGKDD Deep Learning Day 2018. [paper, slides] (SPOTLIGHT)
On pre-training and representation learning on diverse data sources (e.g., text, tables, relational databases):
  • Xiang Deng*, Yu Su, Alyssa Lees, You Wu, Cong Yu, Huan Sun, “ReasonBERT: Pre-trained to Reason with Distant Supervision,” The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, long paper). [paper, code]
  • Xiang Deng*, Ahmed Hassan Awadallah, Christopher Meek, Oleksandr Polozov, Huan Sun, Matthew Richardson, “Structure-Grounded Pretraining for Text-to-SQL,” The 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021). [paper, code, an earlier version]
  • Xiang Deng*, Huan Sun, Alyssa Lees, You Wu, Cong Yu, “TURL: Table Understanding through Representation Learning,” 47th International Conference on Very Large Data Bases (VLDB'21). [paper, code, an earlier version on arXiv]
  • Xiang Deng*, Huan Sun, “Leveraging 2-hop Distant Supervision from Table Entity Pairs for Relation Extraction,” 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP'19). [paper, code]
On knowledge representation and reasoning in (textual) graphs, with emphasis on interpretability:
  • Zhen Wang*, Bo Zong, Huan Sun, “Modeling Context Pair Interaction for Pairwise Tasks on Graphs,” The 14th International Conference on Web Search and Data Mining (WSDM'21, acceptance rate: ~18.6%) [paper, code]
  • Zhen Wang*, Jennifer Lee, Simon Lin, Huan Sun, “Rationalizing Medical Relation Prediction from Corpus-level Statistics,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL'20, long). [paper, code]
  • Zhen Wang*, Xiang Yue*, Soheil Moosavinasab, Yungui Huang, Simon Lin and Huan Sun, “SurfCon: Synonym Discovery on Privacy-Aware Clinical Data,” The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (SIGKDD'19, research track, acceptance rate: ~14.2%, oral). [paper, code]
  • Y. Su, H. Liu, S. Yavuz, I. Gur, H. Sun, X. Yan, “Global Relation Embedding for Relation Extraction,” In Proc. of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018 (NAACL-HLT’18). [paper, code]
On question answering and reading comprehension, with applications to the clinical domain:
  • Xiang Yue*, Xinliang Frederick Zhang*, Ziyu Yao*, Simon Lin, Huan Sun, “CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering,” 2021 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2021, long paper). Best Paper Award. [paper, code]
  • Xinliang Frederick Zhang*, Heming Sun*, Xiang Yue*, Simon Lin, Huan Sun, “COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval,” The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, short paper). [paper, code]
  • Xiang Deng*, Yu Su, Alyssa Lees, You Wu, Cong Yu, Huan Sun, “ReasonBERT: Pre-trained to Reason with Distant Supervision,” The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, long paper). [paper, code]
  • Bernhard Kratzwald#, Stefan Feuerriegel, Huan Sun, “Learning a Cost-Effective Annotation Policy for Question Answering,” 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP'20). [paper, code]
  • Xiang Yue*, Bernal Jimenez*, Huan Sun, “Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL'20, long). [paper, code]
  • Xiang Yue*, Xinliang (Frederick) Zhang*, Ziyu Yao*, Simon Lin, and Huan Sun, “CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering,” arXiv, 2020. [paper, code] (The first two authors contributed equally.)
  • Jiankai Sun, Jie Zhao*, Huan Sun, Srinivasan Parthasarathy, “EndCold: An End-to-End Framework for Cold Question Routing in Community Question Answering Services,” The 29th International Joint Conference on Artificial Intelligence (IJCAI'20). [paper]
  • Jie Zhao*, Xiang Deng*, Huan Sun, “Easy-to-Hard: Leveraging Simple Questions for Complex Question Generation,” arXiv, 2019. [paper, code]
  • Boyuan Pan#, Hao Li, Ziyu Yao*, Deng Cai, Huan Sun, “Reinforced Dynamic Reasoning for Conversational Question Generation,” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL'19). [paper, code]
  • Jie Zhao*, Ziyu Guan, Huan Sun, “Riker: Mining Rich Keyword Representations for Interpretable Product Question Answering,” The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (SIGKDD'19, research track, acceptance rate: ~14.2%, poster). [paper, code]
  • (SIGKDD'19: ~110 oral + ~60 poster presentations selected from ~1200 submissions)
  • L. Chen, Z. Guan, W. Zhao, W. Zhao, X. Wang, Z. Zhao, H. Sun, “Answer Identification from Product Reviews for User Questions by Multi-task Attentive Networks,” The AAAI Conference on Artificial Intelligence 2019 (AAAI’19, acceptance rate: 16.2%). [paper]
  • J. Zhao*, Y. Su, Z. Guan, H. Sun, “An End-to-End Deep Framework for Answer Triggering with a Novel Group-Level Objective,” Empirical Methods in Natural Language Processing 2017 (EMNLP'17). [paper, code]
  • H. Sun, H. Ma, X. He, W. Yih, Y. Su, X. Yan, “Table Cell Search for Question Answering,” The 25th Int. World Wide Web Conference (WWW'16). [paper]
  • Y. Su, H. Sun, B. Sadler, M. Srivatsa, I. Gur, Z. Yan, X. Yan, “On Generating Characteristic-rich Question Sets for QA Evaluation ,” Empirical Methods in Natural Language Processing 2016 (EMNLP'16). [paper, appendix, New Question-Answer Set (with rich characteristics to train more advanced QA systems)]
  • H. Sun, H. Ma, W. Yih, C. Tsai, J. Liu, M. Chang, “Open Domain Question Answering via Semantic Enrichment,” The 24th Int. World Wide Web Conference (WWW'15, acceptance rate: 14.1%). [paper]
  • S. Yang, Y. Wu, H. Sun, X. Yan, “Schemaless and Structureless Graph Querying,” Proc. of Int. Conf. on Very Large Data Bases (VLDB'14).[paper, poster]
  • S. Yang, Y. Xie, Y. Wu, T. Wu, H. Sun, J. Wu, X. Yan, “SLQ: A User-friendly Graph Querying System,” Proc. of Int. Conf. on Management of Data (SIGMOD'14, Demo Track ).
On biomedical and clinical data analytics:
  • Xiang Yue*, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, Sherman S. M. Chow, “Differential Privacy for Text Analytics via Natural Text Sanitization,” Findings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Findings of ACL-IJCNLP 2021, long). [paper, code]
  • Kaushik Mani*, Xiang Yue*, Bernal Jimenez Gutierrez*, Yungui Huang, Simon Lin, and Huan Sun, “Clinical Phrase Mining with Language Models,” IEEE International Conference on Bioinformatics and Biomedicine 2020 (BIBM'20, short). [paper, code, a longer version] (The first two authors contributed equally.)
  • Xiang Yue*, Zhen Wang*, Jingong Huang*, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon M. Lin, Wen Zhang, Ping Zhang, and Huan Sun, “Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations,” Bioinformatics, 2019 [paper, code]
  • Xinliang Frederick Zhang*, Heming Sun*, Xiang Yue*, Simon Lin, Huan Sun, “COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval,” The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, short paper). [paper, code]
  • Zhen Wang*, Jennifer Lee, Simon Lin, Huan Sun, “Rationalizing Medical Relation Prediction from Corpus-level Statistics,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL'20, long). [paper, code]
  • Zhen Wang*, Xiang Yue*, Soheil Moosavinasab, Yungui Huang, Simon Lin and Huan Sun, “SurfCon: Synonym Discovery on Privacy-Aware Clinical Data,” The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (SIGKDD'19, research track, acceptance rate: ~14.2%, oral). [paper, code]
  • Y. Li, N. Du, C. Liu, Y. Xie, W. Fan, Q. Li, J. Gao, H. Sun, “Reliable Medical Diagnosis from Crowdsourcing: Discover Trustworthy Answers from Non-Experts,” ACM Int. Conf. on Web Search and Data Mining 2017 (WSDM’17). [paper]
  • C. Liu, H. Sun, N. Du, S. Tan, H. Fei, W. Fan, T. Yang, H. Wu, Y. Li, C. Zhang, “Augmented LSTM Framework to Construct Medical Self-diagnosis Android,” IEEE Int. Conf. on Data Mining 2016 (ICDM’16). [paper]
On other topics in the area of NLP and data analytics (especially language generation, network analysis and text mining):
  • Boyuan Pan#, Yazheng Yang, Cai Deng, Huan Sun, “TopNet: Learning from Neural Topic Model to Generate Long Stories,” The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2021, research track, acceptance rate: ~15.4%). [paper, code]
  • W. Zhao, Z. Guan, Y. Huang, T. Xi, H. Sun, Z. Wang, X. He, “Discerning Influence Patterns with Beta-Poisson Factorization in Microblogging Environments,” Transactions on Knowledge and Data Engineering (TKDE 2019). [paper]
  • Y. Li, S. Tan, H. Sun, J. Han, D. Roth, X. Yan, “Entity Disambiguation with Linkless Knowledge Bases,” The 25th Int. World Wide Web Conference (WWW'16). [paper]
  • F. Han, S. Tan, H. Sun, X. Yan, M. Srivatsa, D. Cai, “Distributed Representations of Expertise,” SIAM Int. Conf. on Data Mining 2016 (SDM'16). [paper]
  • Y. Su, S. Yang, H. Sun, M. Srivatsa , S. Kase, M. Vanni, X. Yan, "Exploiting Relevance Feedback in Knowledge Graph Search”, Proc. of the 21st Int. Conf. on Knowledge Discovery and Data Mining (KDD’15, acceptance rate: 19.4%). [paper]
  • Z. Guan, S. Yang, H. Sun, M. Srivatsa, X. Yan, “Fine-Grained Knowledge Sharing in Collaborative Environments ,” Transactions on Knowledge and Data Engineering (TKDE 2015). [paper]
  • H. Sun, M. Srivatsa, S. Tan, Y. Li, L. Kaplan, S. Tao, X. Yan, “Analyzing Expert Behaviors in Collaborative Networks,” Proc. of the 20th Int. Conf. on Knowledge Discovery and Data Mining (KDD'14, acceptance rate: 14.6%). [paper, slides, poster, Source Code]
  • H. Sun, M. Srivatsa, L. Kaplan, X. Yan, “Analyzing Expert Behaviors in Collaborative Networks,” International School and Conference on Network Science 2014 (NetSci'14)
  • N. Li, H. Sun, K. Chipman, J. George, X. Yan,“A Probabilistic Approach to Uncovering Attributed Graph Anomalies,” SIAM Int. Conf. on Data Mining 2014 (SDM'14, acceptance rate: 15.4%).[paper]
  • H. Sun, A. Morales, X. Yan,“Synthetic Review Spamming and Defense,” Proc. of the 19th Int. Conf. on Knowledge Discovery and Data Mining(KDD'13, acceptance rate: 17%). [paper, poster, Demo]
  • S. Tan, Y. Li, H. Sun, Z. Guan, X. Yan, J. Bu, C. Chen, X.He. “Interpreting the Public Sentiment Variations on Twitter” , Transactions on Knowledge and Data Engineering (TKDE 2014) .[paper]
  • H. Sun, G. Miao, X. Yan, “Noise-Resistant Bicluster Recognition,” IEEE Int. Conf. on Data Mining 2013 (ICDM'13, Oral presentation, acceptance rate: 11.6%).[paper][slides][homepage] [A talk related to deep learning literature and techniques in this paper]
  • A. Morales, H. Sun, X. Yan,“Synthetic Review Spamming and Defense,” Proc. Of the 22nd International World Wide Web Conference(WWW'13, Companion Volume).
  • H. Sun, G. Miao, X. Yan, “Noise-Resistant Bicluster Recognition,” the 17th Annual International Conference on Research in Computational Molecular Biology (RECOMB'13, Poster).

Tutorial:

  • J. Pujara, P. Szekely, H. Sun, M. Chen. “From Tables to Knowledge: Recent Advances in Table Understanding,” Tutorials of KDD'21 (co-presenter). [website][slides (Part III)]
  • F. Zhu, H. Sun, X. Yan. “Network Mining and Analysis for Social Applications,” Tutorials of KDD'14 (co-presenter). [slides]

Miscellaneous:

  • Spatial Continuity Constrained Robust PCA for Recovering Images with Continuous Corruption, Intership work during 01/2010~06/2010, supervised by Dr. Yi Ma at MSRA. Excellent Graduation Thesis Award of USTC (top 5%) in 2010
  • Rating prediction of Collaborative Filtering recommendation systems, Undergraduate Research Project during 06~09/2009, supervised by Prof. Nenghai Yu at USTC. Excellent Undergraduate Research Project Scholarship (University-wide top 20%) in 2009