Korean Easy Data Augmentation
Easy Data Augmentation for Korean
This is a project that re-implemented Easy data augmentation and A Easier Data Augmentation, which were implemented for English, to fit Korean.
This repository is tested on Python 3.7 - 3.9.
KoEDA can be installed using pip as follows:
$ pip install koeda
from koeda import EDA
eda = EDA(
morpheme_analyzer="Okt", alpha_sr=0.3, alpha_ri=0.3, alpha_rs=0.3, prob_rd=0.3
)
text = "아버지가 방에 들어가신다"
result = eda(text)
print(result)
# 아버지가 정실에 들어가신다
result = eda(text, p=(0.9, 0.9, 0.9, 0.9), repetition=2)
print(result)
# ['아버지가 객실 아빠 안방 방에 정실 들어가신다', '아버지가 탈의실 방 휴게실 에 안방 탈의실 들어가신다']
from koeda import AEDA
aeda = AEDA(
morpheme_analyzer="Okt", punc_ratio=0.3, punctuations=[".", ",", "!", "?", ";", ":"]
)
text = "어머니가 집을 나가신다"
result = aeda(text)
print(result)
# 어머니가 ! 집을 , 나가신다
result = aeda(text, p=0.9, repetition=2)
print(result)
# ['! 어머니가 ! 집 ; 을 ? 나가신다', '. 어머니 ? 가 . 집 , 을 , 나가신다']
There are two ways to load Augmenter.
The first is to use the full name.
from koeda import EasyDataAugmentation
The second is to use abbreviations.
from koeda import EDA
augmenter = EDA(
morpheme_analyzer: str = None, # Default = "Okt"
alpha_sr: float = 0.1,
alpha_ri: float = 0.1,
alpha_rs: float = 0.1,
prob_rd: float = 0.1
)
result = augmenter(
data: Union[List[str], str],
p: List[float] = None, # Default = (0.1, 0.1, 0.1, 0.1)
repetition: int = 1
)
augmenter = AEDA(
morpheme_analyzer: str = None, # Default = "Okt"
punc_ratio: float = 0.3,
punctuations: List[str] = None # default = ('.', ',', '!', '?', ';', ':')
)
result = augmenter(
data: Union[List[str], str],
p: float = None, # Default = 0.3
repetition: int = 1
)
augmenter = RD(
morpheme_analyzer: str = None,
)
augmenter = RI(
morpheme_analyzer: str = None,
stopword: bool = False
)
augmenter = SR(
morpheme_analyzer: str = None,
stopword: bool = False
)
augmenter = RS(
morpheme_analyzer: str = None,
)
result = augmenter(
data: Union[List[str], str],
p: float = 0.1,
repetition: int = 1
)
Easy Data Augmentation Paper
Easy Data Augmentation Repository
A Easier Data Augmentation Paper
A Easier Data Augmentation Repository
Korean WordNet