[Paper] Deep Learning-Based Document Modeling for Personality Detection from Text

Category Tech

Practical Application of Personality Detection

  • Product and Service Recommendation (People with similar personalities might have similar favors)
  • Mental Health Diagnosis
  • Forensics: Reduce the circle of suspects
  • Human Resource: One's suitablitlty for certain jobs

Personality Theory Used in This Paper

Big Five Personality Trait

Basic Idea of the Method

  1. Feed sentences from essays to convolution filter → Sentence model in the form of n-gram feature vectors
  2. Aggregate the vectors of a document's sentences and combine them with Masiresse features to represent the document
  3. Classification: Feed the document vectors into a fully connected neural network

Overview of the Method

1. Preprocessing

  • Sentence Splitting
  • Data Cleaning
  • Unification (e.g. lowercase)

2. Document-level feature extraction

Mairesse baseline feature set (e.g. word count, average sentence length)

3. Filtering

Sentences without personliaty clues are dropped
(Based on NRC Word-Emotion Association Lexicon)

4. Word-level feature extraction

  • word2vec
  • Variable number of fixed-length word feature vectors → Variable number of sentences → Document

5. Classification

Deep CNN (Conolutional Nerual Network)

  • Input
    • Words: Fixed-length feature vector using word2vec
    • Sentences: Variable number of word vectors
  • Process
    • Word Vector is reduced to a fixed length vector of each sentence
    • Document: Variable number of such fixed-length sentence vector
    • Document vector is then reduced to a fixed-length document vector
    • This Document vector is then concatenated with document-level features
  • Predict
    • Yes / No (5 different personality traits are trained separately)

Network Architecture in Detail

Main Steps (7 Layers)

Word Vectorization

  • Layer 1: Input
    • \(R ^{D \times S\times W \times E}\)
    • Use Google's pretrained word2vec
    • In implementation, all the documents contain the same number of sentences.
      Shorter documents are padded shorter sentences with dummy words.

Sentence Vectorization

  • Layer 2: Convolution
    • 3 convolutional filters: unigram, bigram, trigram
  • Layer 3: Max Polling

Document Vectorization

  • Layer 4: 1-max pooling

Classification: (Yes/No)

  • Layer 5: Linear with Sigmoid activation
  • Layer 6, 7
    • 2 Neuron (yes/no) Softmax Output (ReLU and tanh perform worse)
    • fully connected layer of size 200

Training

Objective Function: Negative Log Likelihood