Enhancing Depression Detection using BERT Models Pre-trained on Reddit Corpora

Yuan Gao

Enhancing Depression Detection using BERT Models Pre-trained on Reddit Corpora

Authors: Yuan Gao

Depression is a pervasive and severe mental health disorder affecting millions worldwide, with its often covert nature making early detection challenging (World Health Organization, 2021). The proliferation of social media platforms, particularly Reddit, has created unprecedented opportunities for individuals to express their mental health concerns and seek support online (De Choudhury & De, 2014). This digital footprint provides a unique avenue for leveraging natural language processing techniques to automatically identify users potentially suffering from depression, facilitating early intervention. This study builds upon the model architecture proposed by Chen et al. (2023), which utilizes BERT (Bidirectional Encoder Representations from Transformers)(Devlin et al., 2019) for feature extraction from individual user posts, followed by a Convolutional Neural Network(Krizhevsky, Sutskever, & Hinton, 2017) for user-level classification. While this approach has shown promise, we hypothesize that the pre-trained BERT model, typically trained on formal corpora such as books and Wikipedia(Devlin et al., 2019), may not optimally capture the nuanced language patterns prevalent in social media discourse. To address this potential limitation, we propose a novel approach of pre-training the BERT model on a large corpus of Reddit data before integrating it into the BERT+CNN architecture. This study aims to evaluate whether this Reddit-specific pre-training can enhance the model's performance in detecting depression through social media content analysis. We conducted extensive experiments comparing the performance of the original BERT+CNN model against our Reddit-pre-trained variant. Performance metrics including accuracy, recall, F1 score, and validation loss were meticulously analyzed. Our findings indicate a significant improvement in performance, with the Reddit-pre-trained model achieving a 2.1 point increase in F1 score compared to the baseline model. This research contributes to the growing body of literature on digital mental health assessment and demonstrates the potential of domain-specific language model pre-training in improving the accuracy of depression detection in social media contexts. The implications of this study extend to both clinical practice and public health policy, offering insights into more effective, data-driven approaches for early mental health intervention strategies.

Comments: 16 Pages.

Download: PDF

Submission history

[v1] 2025-04-06 03:47:21

Unique-IP document downloads: 172 times

Vixra.org is a pre-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.

Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.

Artificial Intelligence

Enhancing Depression Detection using BERT Models Pre-trained on Reddit Corpora

Submission history