Department of Communication PhD Student Workshop
Web Mining for Communication Research
April 22-25, 2014, Room M5064

Click to download the HD version; More photos

Workshop Schedule

Date/Time Topic Speaker Hands-on Tutorial/Software Materials
Introduction to Web Mining Jonathan Zhu Slides
Web Data Collection Hai Liang Hai Liang/NodeXL, VWR, API, Scraping Slides
Web Data Preprocessing Zhenzhen Wang Zhenzhen Wang/Xerox (online), SPSS, R, Python Slides
Web Data Analysis Chengjun Wang Hai Liang/SPSS, NodeXL, R (optional) Slides
Web Data Visualization Jie Qin Hexin Chen/Wordle, HTML5 Word Cloud, Voyant Tools/ NodeXL, Google Fusion Tables/ Google Charts Slides
Social Issues in Web Mining Marko Skoric Slides


Name Position
Marko Skoric Faculty
Jonathan Zhu Faculty
Jiang Crystal Faculty
Fen Lin Faculty
Fei Shen Faculty
Chen Qiang PhD student (HUST)
Cao Qian PhD student (year 1)
Zeng Yuan PhD student (year 1)
Xia Chuanli PhD student (year 1)
Liu Bingjie PhD student (year 1)
Madrid-Morales Dani PhD student (year 1)
Scialpi Valentina PhD student (year 1)
Li Boliang PhD student (year 1)
Cao Bolin PhD student (year 2)
Chen Hexin PhD student (year 2)
Gong Wanqi PhD student (year 2)
Chen Chujie PhD student (year 3)
Jiang Yalong PhD student (year 3)
Liu Na PhD student (year 3)
Yan Jing PhD student (year 4)
Ho Jeffrey PhD student (year 4)
Qin Jie PhD student (year 4)
Liang Hai PhD student (year 4)
Wang Zhenzhen PhD student (year 4)
Wang Chengjun PhD student (year 4)
An Shanshan Young scholar (Liaoning U)
Wu Ying Young scholar (Shanghai Intl Studies U)
Deng Lifeng Young scholar (Sun Yat-San U)
Yang Man Young scholar (Wuhan U)
CHEN Yuanyuan Young scholar (Hubei Economic U)
Wu Jing Young scholar (Fudan U)
Huang Yueqin Young scholar (Hubei U)
Tang Sihui Visiting fellow (South China U of Tech)
Yang Huixiong Visiting fellow (Fujian Normal U)


  1. Readings
    1.1. Basic
    —1.1.1. C. Hanretty (2013). Scraping the web for arts and humanities, U of East Anglia.
    —1.1.2. E.R. Tufte (2001/1983), The visual display of quantitative information, Graphics Press.
    —1.1.3. I. Feinerer (2013). Text mining in R.
    —1.1.4. B. Grun (2011). R package for topic models.
    1.2. Advanced
    —1.2.1. C. Manning (2008). Introduction to information retrieval. Cambridge University Press.
    —1.2.2. M. A. Russell (2013). Mining the social web. O’Reilly.
    1.3. Examples
    —1.3.1. Zhu, et al. (2011), Social Science Computer Review, 29, 327-339. (sampling)
    —1.3.2. Peng, et al. (2013), New Media & Society, 15, 644-664. (science literature)
    —1.3.3. Wang, et al. (2013), Cyberpsychology, Behavior and Social Networks, 16, 679-685. (Twitter)
    —1.3.4. Xu, et al. (2013), IEEE Transactions on Visualization and Computer Graphics, 19, 2012-2021. (visualization)
    —1.3.5. Liang. (2014), Social Science Computer Review. (online forums)
  2. Tools
    2.1. Data Collection
    —2.1.1. NodeXL: Network overview, discovery and exploration for Excel
    —2.1.2. Visual Web Ripper: Extract data from the Web
    —2.1.3. Python API and JSON
    —2.1.4. Web scraping with Beautiful soup
    2.2. Data PreProcessing
    —2.2.1. SPSS
    —2.2.2. Python NLTK
    —2.2.3. R tm
    —2.2.4. R e1071
    —2.2.5. R topicmodels
    2.3. Data Analysis
    —2.3.1. R & Rstudio
    —2.3.2. Network analysis with igraph package in R
    —2.3.3. Survival analysis with survival package in R
    —2.3.4. Sentiment analysis with sentiment package in R
    —2.3.5. Naive Bayes analysis with e1071 package in R
    —2.3.6. Text mining with RTextTools in R
    —2.3.7. Decision trees with Rpart and rattle in R
    2.4. Data Visualization
    —2.4.1. Wordle
    —2.4.2. HTML5 word cloud
    —2.4.3. Voyant Tools
    —2.4.4. NodeXL
    —2.4.5. Google Fusion Tables
    —2.4.6. Google Charts
  3. Online Videos
    3.1. Visualizaion
    —3.1.1. Wordle
    —3.1.2. Voyant Tools
    —3.1.3. NodeXL
    —3.1.4. Excel and mapping points with Google Fusion Tables

Leave Comment

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.