STriP Net: Semantic Similarity of Scientific Papers (S3P) Network
Do you read a lot of Scientific Papers? Have you ever wondered what are the overarching themes in the papers that you've read and how all the papers are semantically connected to one another? Look no further!
Leverage the power of NLP Topic Modeling, Semantic Similarity, and Network analysis to study the themes and semantic relations within a corpus of research papers.
β Generate STriP Network on your own collection of research papers with just three lines of code!
β Interactive plots to quickly identify research themes and most important papers
β This repo was hacked together over the weekend of New Year 2022. This is only the initial release, with lots of work planned.
πͺ Please leave aΒ βΒ to let me know that STriP Net has been helpful to you so that I can dedicate more of my time working on it.
This is perhaps the most hasslefree option for installing stripnet with conda.
conda install -c conda-forge stripnet
If you want to install stripnet
using pip, it is highly recommend to install in a conda environment.
stripnet
) and activate it.conda create -n stripnet python=3.8 jupyterlab -y
conda activate stripnet
pip install stripnet
title
and abstract
of papers separated by [SEP]
keyword. Please see below# Load some data
import pandas as pd
data = pd.read_csv('data.csv')
# Keep only title and abstract columns
data = data[['title', 'abstract']]
# Concat the title and abstract columns separated with [SEP] keyword
data['text'] = data['title'] + '[SEP]' + data['abstract']
# Instantiate the StripNet
from stripnet import StripNet
stripnet = StripNet()
# Run the StripNet pipeline
stripnet.fit_transform(data['text'])
stripnet.most_important_docs()
threshold
currently used by stripnetcurrent_threshold = stripnet.threshold
print(current_threshold)
threshold
in steps of 0.05 and try again until you see a good looking network. Remember the max value of threshold is 1! If you're threshold is already 0.95 then try increasing in steps of 0.01 instead.stripnet.fit_transform(data['text'], threshold=current_threshold+0.05)
min_topic_size
to a value lower than the default value of 10 until you get topics that look reasonable to youstripnet.fit_transform(data['text'], min_topic_size=5)
stripnet.fit_transform(data['text'], remove_isolated_nodes=True)
stripnet.fit_transform(data['text'], threshold=current_threshold+0.05, min_topic_size=5, remove_isolated_nodes=True)
I'm testing out the network on a variety of data to pick better default values. Do let me know if some specific values worked the best for you!
To cite STriP Net in your work, please use the following bibtex reference:
@software{marie_stephen_leo_2022_5823822,
author = {Marie Stephen Leo},
title = {STriP Net: Semantic Similarity of Scientific Papers (S3P) Network},
month = jan,
year = 2022,
publisher = {Zenodo},
version = {v0.0.5.zenodo},
doi = {10.5281/zenodo.5823822},
url = {https://doi.org/10.5281/zenodo.5823822}
}
STriP Net stands on the shoulder of giants and several prior work. The most notable being
If this work helped you in any way, please consider the following way to give me feedback so I can spend more time on this project