The main purpose of this paper is to show how social media can be used to gain psychological insights.
Different from other papers in the past which use a pre-compiled word category list (e.g. LIWC),
it uses an open vocabulary approach that allowing discovery of unanticipated language.
Data
- 75,000 Volunteers
- Facebook Status Update
- Age
- Gender
- Personality (Through Standard Personality Questionnaire)
Architecture
- Linguistic Feature Extraction
- N-Gram
- Point-Wise Mutual Information
- Topic
- Probability a person mentioning a topic (Derived from LDA)
- N-Gram
- Correlation analysis
- Least Squares Linear Regression
- Visualization
- Differential Word Clouds
- Word size represents correlation strength.
- Color represents relative frequency
- Standardized Frequency Plot
- Plot the word frequency against age
- Differential Word Clouds
Result
Most results confirm what is already known or obvious.
However, I think this method might still be useful to gain insight in other kinds of datasets.