Simple sentence mining tool for language learning
Note: All chat rooms are bridged/mirrored. You won't miss out on anything by choosing one over the other. We recommend you use Matrix if possible.
Platform | Address | Notes |
---|---|---|
Matrix (Recommended) | #general:freelanguagetools.org | Requires a Matrix account on any homeserver. A list of homeservers can be found here |
Telegram | https://t.me/fltchat | |
Discord (proprietary) | https://discord.gg/DNSsTtHRxz |
VocabSieve is a companion program for language learning with Anki. Its primary function is sentence mining, in which sentences with vocabulary words are collected and added into Anki for long term retention. It aims to help intermediate learners gain vocabulary efficiently by allowing card creation with minimal friction. Possible use cases include sentence mining from videos, texts, asynchronously from ereader highlights, and even completely automatically from books or subtitles. See workflow page for more details.
books
-> book
, ran
-> run
). This works well for most European languages.Manual (The text originally on the wiki or this document or the blog post has since been moved there, with some updates.)
Windows and Mac users: If you want to install this program, go to Releases and from the latest release, download the appropriate file for your operating system.
For a nightly build, please check the CI artifacts page. These are not considered ready for release and likely contain bugs. It is recommended to use the debug version to get more details when things go wrong.
To run from source:
python3 -m venv env
pip install -r requirements.txt
python3 vocabsieve.py
For debugging purposes, set the environmental variable VOCABSIEVE_DEBUG
to any value. This will create a separate profile (settings and databases for records and dictionaries) so you may perform tests without affecting your normal profile. For each different value of VOCABSIEVE_DEBUG
, a separate profile is generated. This can be any number or string.
Pull requests are welcome! If you want to implement a significant feature, be sure to first ask by creating an issue so that no effort is wasted on doing the same work twice.
This is currently beta software. You should not expect it to be completely bug-free, but you may expect that:
master
branch and only upgrading should usually not break things, but this is not guaranteed. You are expected to read commit messages to take proper precaution.
You are welcome to report bugs, suggest features/enhancements, or ask for clarifications by opening a GitHub issue.
If you appreciate this tool, consider making a donation to the Free Software Foundation or the Electronic Frontier Foundation to protect our digital future and defend our freedom. Do your part to refuse to pay for DRM'd content and devices.
The definitions provided by the program by default come from English Wiktionary, without which this program would never have been created. LingvaTranslate is used to obtain Google Translate results. Fоrvо scraping code is inspired by this repository. Lemmatization capabilities come from simplemma and pymorphy3.
App icon is made from icons by Freepik available on Flaticon.