[Paper] Deep Learning-Based Document Modeling for Personality Detection from Text

Paper
Implementation: Personality-Detection
Data Set
- James Pennebaker and Laura King's stream-of-consciousness essay dataset
- NRC Word-Emotion Association Lexicon

Practical Application of Personality Detection

Product and Service Recommandation (People with similar personalities might have similar favors)
Mental Health Diagnosis
Forensics: Reduce the circle of suspects
Human Resource: One's suitablitlty for certain jobs

Personality Theory Used in This Paper

Big Five Personality Trait

Basic Idea of the Method

Feed sentences from essays to convolution filter → Sentence model in the form of n-gram feature vectors
Aggregate the vectors of a document's sentences and combine them with Masiresse features to represent the document
Classification: Feed the document vectors into a fully connected neural network

Overview of the Method

1. Preprocessing

Sentence Splitting
Data Cleaning
Unification (e.g. lowercase)

2. Document-level feature extraction

Mairesse baseline feature set (e.g. word count, average sentence length)

3. Filtering

Sentences without personliaty clues are dropped
(Based on NRC Word-Emotion Association Lexicon)

4. Word-level feature extraction

word2vec
Variable number of fixed-length word feature vectors → Variable number of sentences → Document

5. Classfication

Deep CNN (Conolutional Nerual Network)

Input
- Words: Fixed-length feature vector using word2vec
- Sentences: Variable number of word vectors
Process
- Word Vector is reduced to a fixed length vector of each sentence
- Document: Variable number of such fixed-length sentence vector
- Document vector is then reduced to a fixed-length document vector
- This Document vector is then concatenated with document-level features
Predict
- Yes / No (5 different personality traits are trained separately)

Network Architecture in Detail

Main Steps (7 Layers)

Word Vectorization

Layer 1: Input
- \(R ^{D \times S\times W \times E}\)
- Use Google's pretrained word2vec
- In implementation, all the documents contain the same number of sentences.
  Shorter documents are padded shorter sentences with dummy words.

Sentence Vectorization

Layer 2: Convolution
- 3 convolutional filters: unigram, bigram, trigram
Layer 3: Max Polling

Document Vectorization

Layer 4: 1-max pooling

Classification: (Yes/No)

Layer 5: Linear with Sigmoid activation
Layer 6, 7
- 2 Neuron (yes/no) Softmax Output (ReLU and tanh perform worse)
- fully connected layer of size 200

Training

Objective Function: Negative Log Likelihood

Do you like this article? What do your tink about it? Leave you comment below

Comments