A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
This repository provides link to useful dataset and another resources for NLP in Bahasa Indonesia.
Last Update: 15 Mar 2022
I have made the combined root words list from all of the above repositories.
I have made the combined slang words dictionary from all of the above repositories.
I have made the combined stop words list from all of the above repositories.
You can adjust this code with Bahasa corpus to do the spelling correction
Usage:
import GetOldTweets3 as got
tweetCriteria=got.manager.TweetCriteria().setQuerySearch('#CoronaVirusIndonesia').setSince("2020-01-01").setUntil("2020-03-05").setNear("Jakarta, Indonesia").setLang("id")
tweets=got.manager.TweetManager.getTweets(tweetCriteria)
for tweet in tweets:
print(tweet.username)
print(tweet.text)
print(tweet.date)
print("tweet.to")
print("tweet.retweets")
print("tweet.favorites")
print("tweet.mentions")
print("tweet.hashtags")
print("tweet.geo")
Step-by-step how to use Tweepy. https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1
Sign in to Twitter Developer. https://developer.twitter.com/en
Full List of Tweets Object. https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object
Increasing Tweepy’s standard API search limit. https://bhaskarvk.github.io/2015/01/how-to-use-twitters-search-rest-api-most-effectively./