My research interests are in Social Media Mining (see my recent textbook on the subject), Big Data Analytics, Machine Learning, Graph Mining, Social Network Analysis, and Social Computing. Most of my research revolves around analyzing social media and large scale information networks.
My research lies in the intersection of data mining, machine learning, social sciences, and theory. A common pattern in my research is to collect and analyze large scale data to glean actionable patterns. I often employ theories from social sciences, psychology, or anthropology, in addition to developing and using advanced mathematical, statistical, and machine learning machinery to prove the validity of such patterns. My research is supported by an NSF CAREER award.
For a sample of my work see our recent WSDM'19/KDD'19 Tutorials: Fake News: Fundamental Theories, Detection Strategies and Challenges or older ICDM Tutorial: Social Media Mining: Fundamental Issues and Challenges and the list of current research directions below.
To mine across social media sites, we particularly focus on two specific problems. First, how does user behavior vary across sites (e.g., difference between LinkedIn Friends and Facebook Friends). In addition to designing new techniques, we investigate means to scale and adapt traditional models that analyze user behavior for a single site to multiple sites. For recent results on this research question, see my papers in Information Fusion'16 and ICWSM'14 and this book chapter. Second, I study user behaviors that are only observed across sites. An example includes our study on user migrations across sites.
A summary of fake news research can be obtained through our survey. Our work includes research on detecting fake news using content or link (network) information and ways to detect fake news early. For more information see our KDD and WSDM Tutorials on the topic here.
My research has investigated means to realistically analyze human behavior online by focusing on ways to exploit information redundancies generated by user behavior. The methodology has been used to identify sarcasm on Twitter, to identify users across sites, among other behaviors. For more on the topic see this article, this chapter, or our recent workshop on the topic. As a by-product, my research on human behavior modeling has had implication in information verification, privacy and security.
In data mining terms, ground truth is rarely available online. I recently started to investigate this problem and identified some ways to tackle the problem. For a succinct review of the topic see my recent Communciations of the ACM (CACM) paper on this issue.
I have looked at how to utilize minimum information to identify users, detect malicious users, or to recommend friends on social media sites with high accuracy. As these methods utilize only minimum information, they scale easily to millions of users. Recently, I have been investigating theoretical limits of using minimum information.
I have recently investigated the balance between privacy and mining user-generated content by connecting ideas from complexity theory, specifically Kolmogrov complexity, information theory, and statistical natural language processing. See this paper for some preliminary results.
My research has focused on (1) online means to map areas impacted by natural disasters in real-time [ICDM'15], (2) identifying relevant users that provide most useful information in case of crises [HT 2014], and (3) systematic approaches to crowdsource user-generated content in case of disasters [CMOT'12].