[Paper] Toward Personality Insights from Language Exploration in Social Media

The main purpose of this paper is to show how social media can be used to gain psychological insights.

Different from other papers in the past which use a pre-compiled word category list (e.g. LIWC),
it uses an open vocabulary approach that allowing discovery of unanticipated language.


  • 75,000 Volunteers
    • Facebook Status Update
    • Age
    • Gender
    • Personality (Through Standard Personality Questionnaire)


  1. Linguistic Feature Extraction
    • N-Gram
      • Point-Wise Mutual Information
    • Topic
      • Probability a person mentioning a topic (Derived from LDA)
  2. Correlation analysis
    • Least Squares Linear Regression
  3. Visualization
    • Differential Word Clouds
      • Word size represents correlation strength.
      • Color represents relative frequency
    • Standardized Frequency Plot
      • Plot the word frequency against age


Most results confirm what is already known or obvious.
However, I think this method might still be useful to gain insight in other kinds of datasets.
