Using machine learning & deep learning to analyze the News
Information Retrieval and Text Mining project
https://www.youtube.com/watch?v=9PFZ0_C2Sxo&feature=share
https://docs.google.com/document/d/10-7H9bPJYQRMdOUdugDlWeifdpvoN9twXZGT-m1fhdc/edit?usp=sharing
https://docs.google.com/document/d/1I9SWihDkgXx1NCYCsY-0e_XDicAK346PqQu5wMaesd0/edit
https://docs.google.com/presentation/d/1lRDR40UfcLpdRUSnfMbi6eOsR_jjxFdOcKwa8HvxHh8/edit#slide=id.p
Outline:
動機: 為什麼要做?因為假新聞氾濫、影響閱聽人、帶選舉風向的問題
Solution
GOAL
Dataset
三個dataset的text,label合併資料集:https://drive.google.com/drive/u/2/folders/19CER5SrMU29n3UPAkQc2hPu3HA8vyqbc
Method
目前只看news content
bs類別代表意義不大
testing Kaggle: https://www.kaggle.com/c/fake-news/submit
(測試clf好壞結果、reg好壞結果)
https://www.kaggle.com/c/fake-news/data (title、author、text、true/false;來自爬文的news articles) =>
https://github.com/KaiDMML/FakeNewsNet/tree/master/Data (news source, headline, image, body_text, publish_data, etc、包含真假新聞;爬文新聞)
https://www.kaggle.com/mrisdal/fake-news (uuidUnique identifier,ord_in_thread,authorauthor of story,publisheddate published ,titletitle of the story,texttext of story,languagedata from webhose.io,crawleddate the story was archived,site_urlsite URL from BS detector,countrydata from webhose.io,domain_rankdata from webhose.io,thread_title,spam_scoredata from webhose.io,main_img_urlimage from story,replies_countnumber of replies,participants_countnumber of participants,likesnumber of Facebook likes,commentsnumber of Facebook comments,sharesnumber of Facebook shares,typetype of website (label from BS detector)) https://github.com/bs-detector/bs-detector
https://github.com/GeorgeMcIntire/fake_real_news_dataset (csv file and contains 1000s of articles tagged as either real or fake)
https://www.cs.ucsb.edu/~william/data/liar_dataset.zip (假新聞程度分級;UCSB)(statement、speaker、conext、label、src)
https://www.kaggle.com/jruvika/fake-news-detection (URLs,Headline,Body,Label(T/F);)
https://www.kaggle.com/c/fake-news-pair-classification-challenge/data (fake news classification)
https://github.com/JasonKessler/fakeout (完整的project)
datasets: https://data.world/datasets/fake-news 、 https://github.com/sumeetkr/AwesomeFakeNews
preprocess ref: https://www.kaggle.com/rchitic17/fake-news 、 https://www.kaggle.com/michaleczuszek/fake-news-analysis
Datasets for sentiment analysis are available online.[1][2]
The following is a list of a few open source sentiment analysis tools.
Open Source Dictionary or resources:
方向: 文字分類(classification) or 程度回歸(regression)
文字分類