The main purpose of this paper is to show how social media can be used to gain psychological insights.

Different from other papers in the past which use a pre-compiled word category list (e.g. LIWC),
it uses an open vocabulary approach that allowing discovery of unanticipated language.


  • 75,000 Volunteers
    • Facebook Status Update
    • Age
    • Gender
    • Personality (Through Standard Personality Questionnaire)


  1. Linguistic Feature Extraction
    • N-Gram
      • Point-Wise Mutual Information
    • Topic
      • Probability a person mentioning a topic (Derived from LDA)
  2. Correlation analysis
    • Least Squares Linear Regression
  3. Visualization
    • Differential Word Clouds
      • Word size represents correlation strength.
      • Color represents relative frequency
    • Standardized Frequency Plot
      • Plot the word frequency against age


Most results confirm what is already known or obvious.
However, I think this method might still be useful to gain insight in other kinds of datasets.

Share on: TwitterFacebookEmail


Do you like this article? What do your tink about it? Leave you comment below


Read Time

1 min




Keep In Touch