NLP model that predicts subreddit based on the title of a post
An NLP model that predicts subreddit based on the title of a post.
Play with it on HuggingFace Space
Post on r/MachineLearning
The model was trained using the titles of the top 1000 posts from the top 250 subreddits scraped using PRAW.
Dataset hosted on HuggingFace
pip install -r requirements.txt
.env
file consisting of reddit authentication info like thisID = <YOUR_ID>
SECRET = <YOUR_SECRET>
AGENT = <YOUR_AGENT>
python3 dataset.py <npage> <dfilename>
npage
is the no of pages to scrape for top subreddits from redditlist.com (1 page => 125 subs) and filename
is the csv filename to save the dataset to.
HuggingFace Transformers' DistilBERT, is fine-tuned on the dataset of post titles labelled with their respective subreddit.
For steps to make the model check out the model notebook in the repo or open in Colab.
Model hosted on HuggingFace
If you want to contribute code, simply create a pull request. If you have an idea, create an issue and the developers will look into it!