TEXT ANALYTICS, HEALTH ANALYTICS

Raymond Ng, Professor of Computer Science, Canada Research Chair in Data Science and Analytics,Vancouver, Canada

Title: Mining and Summarizing Text Conversations

 Summary:

With the ever-increasing popularity of Internet technologies and communication devices such as smartphones and tablets, and with huge amounts of such conversational data generated on hourly basis, intelligent text analytic approaches can greatly benefit organizations and individuals.  For example, managers can find the information exchanged in forum discussions crucial for decision making. Moreover, the posts and comments about a product can help business owners to improve the product.

In this lecture, we first give an overview of important applications of mining text conversations, using sentiment summarization of product reviews as a case study. Then we examine three topics in this area: (i) topic modeling; (ii) natural language summarization; and (iii) extraction of rhetorical structure and relationships in text.

Syllabus: (5 hours)

  1. Text conversations and business intelligence
  2. Sentiment extraction and summarization as applications
  3. Topic modeling
  4. Extractive and abstractive summarization
  5. Rhetorical analysis
  6. Summary

Pre-requisites: Basic knowledge of machine learning and natural language processing is preferred but not required.

References:

  1. Shafiq Joty, Giuseppe Carenini and Raymond Ng. Topic Segmentation and Labeling in Asynchronous Conversations. Journal of AI Research(JAIR) (2013), Vol. 47, Page 521-573.
  2. Shafiq Joty, Giuseppe Carenini, Gabriel Murray and Raymond Ng. Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), MIT, Massachusetts, USA.
  3. Yashar Mehdad, Giuseppe Carenini, Raymond Ng andShafiq Joty. Towards Topic Labeling with Phrase Entailment and Aggregation. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), Atlanta, USA.
  4. Yashar Mehdad, Giuseppe Carenini, and Raymond Ng. Abstractive Summarization of Spoken and Written Conversations based on Phrasal Queries. In Proceedings of Association of Computational Linguistics (ACL 2014).
  5. Shafiq Joty, Giuseppe Carenini and Raymond Ng. A Novel Discriminative Framework for Sentence-Level Discourse Analysis. EMNLP 2012.
  6. Shafiq Joty, Giuseppe Carenini, Raymond Ng and Yashar Mehdad. Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis. ACL 2013.
  7. Shima Gerani, Yashar Mehdad, Giuseppe Carenini, Raymond Ng and Bita Nejat. Abstractive Summarization of Product Reviews Using Discourse Structure. EMNLP 2014.
  8. Kelsey Allen, Giuseppe Carenini and Raymond Ng, Detecting Disagreement in Conversations using Pseudo-Monologic Rhetorical Structure, EMNLP 2014.

Title: Big Data for Personalized Medicine

Summary:

Personalized medicine has been hailed as one of the main frontiers for medical research in this century. In the first half of this lecture, we will give an overview on our projects that use gene expression, proteomics, DNA and clinical features for biomarker discovery. In the second half, we will describe some of the challenges involved in biomarker discovery. One of the challenges is the lack of quality assessment tools for data generated by ever-evolving genomics platforms. We will conclude the talk by giving an overview of some of the techniques we have developed on data cleansing and pre-processing.

Syllabus: (2 hours)

  1. Overview of selected biomarker discovery applications: heart transplants and COPD
  2. Top challenges facing biomarker discovery
  3. Quality control tools for molecular data
  4. Graph construction for systems biology