[Paper] Deep Learning-Based Document Modeling for Personality Detection from Text

Practical Application of Personality Detection

Product and Service Recommendation (People with similar personalities might have similar favors)
Mental Health Diagnosis
Forensics: Reduce the circle of suspects
Human Resource: One's suitablitlty for certain jobs

Feed sentences from essays to convolution filter → Sentence model in the form of n-gram feature vectors
Aggregate the vectors of a document's sentences and combine them with Masiresse features to represent the document
Classification: Feed the document vectors into a fully connected neural network

Mairesse baseline feature set (e.g. word count, average sentence length)

Sentences without personliaty clues are dropped
(Based on NRC Word-Emotion Association Lexicon)

word2vec
Variable number of fixed-length word feature vectors → Variable number of sentences → Document

Deep CNN (Conolutional Nerual Network)

Input
- Words: Fixed-length feature vector using word2vec
- Sentences: Variable number of word vectors
Process
- Word Vector is reduced to a fixed length vector of each sentence
- Document: Variable number of such fixed-length sentence vector
- Document vector is then reduced to a fixed-length document vector
- This Document vector is then concatenated with document-level features
Predict
- Yes / No (5 different personality traits are trained separately)

Layer 1: Input
- \(R ^{D \times S\times W \times E}\)
- Use Google's pretrained word2vec
- In implementation, all the documents contain the same number of sentences.
  Shorter documents are padded shorter sentences with dummy words.

Layer 5: Linear with Sigmoid activation
Layer 6, 7
- 2 Neuron (yes/no) Softmax Output (ReLU and tanh perform worse)
- fully connected layer of size 200

Objective Function: Negative Log Likelihood