Elasticsearch File System Crawler (FS Crawler)
Full Changelog: https://github.com/dadoonet/fscrawler/compare/fscrawler-2.8...fscrawler-2.9
a52f2ab6-086b-4285-a7a1-78ecdc6404ba
vulnerability id" (thanks to @dadoonet)a52f2ab6-086b-4285-a7a1-78ecdc6404ba
vulnerability id (thanks to @dadoonet)latest
docker tag should be only the latest stable version (thanks to @dadoonet)fs.ocr.enabled
is always false (thanks to @ywjung)@NickUfer, @cbb-colab, @cwperry, @dadoonet, @dependabot, @dependabot[bot], @dfbm, @mergify[bot], @sahin52 and @ywjung
The FSCrawler team is pleased to announce the FSCrawler 2.7 release!
FS Crawler offers a simple way to index binary files into elasticsearch.
Download FSCrawler 2.7:
wget https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler-es7/2.7/fscrawler-es7-2.7.zip
Start FS crawler with:
bin/fscrawler job_name
FS crawler will read a local file (default to ~/.fscrawler/{job_name}/_settings.json
).
If the file does not exist, FS crawler will propose to create your first job.
$ bin/fscrawler job_name
18:28:58,174 WARN [f.p.e.c.f.FsCrawler] job [job_name] does not exist
18:28:58,177 INFO [f.p.e.c.f.FsCrawler] Do you want to create it (Y/N)?
y
18:29:05,711 INFO [f.p.e.c.f.FsCrawler] Settings have been created in [~/.fscrawler/job_name/_settings.json]. Please review and edit before relaunch
Create a directory named /tmp/es
or c:\tmp\es
, add some files you want to index in it and start again:
$ bin/fscrawler job_name
18:30:34,330 INFO [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
18:30:34,332 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.
18:30:34,682 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler started for [job_name] for [/tmp/es] every [15m]
More details in the documentation.
file.content_type
field on folders.file.filename
field on folders.path_prefix
option.fs.pdf_ocr
setting to fs.ocr.pdf_strategy
.Have fun! -FSCrawler team
.fscrawlerignore
file is detected (#633) @dadoonethocr
option for Tesseract-based OCR (#583) @dadoonetstore_source
without indexing content (#544) @dadoonetcore
module (#508) @dadoonet@FredDut, @Jdecaudin, @Quix0r, @babadofar, @barts2108, @coder-sa, @ctamisier, @dadoonet, @edjeavons, @fgaujous, @gpcmol, @it20one, @kneubi, @shadiakiki1986, @soruly, @vakopian, @xcorail, Ajitpal Singh and Julien Decaudin