A scalable, mature and versatile web crawler based on Apache Storm
Apache StormCrawler is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
This is our first release after joining the ASF incubator as a poddling. It is a breaking change with renamings in the group ids and the removal of the elasticsearch module.
Full Changelog: https://github.com/apache/incubator-stormcrawler/compare/2.11...stormcrawler-3.0
This is a Pre-ASF release and did not undergo a formal review by the PMC.
Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.10...2.11
This is a Pre-ASF release and did not undergo a formal review by the PMC.
and a lot more!
Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.9...2.10
See https://digitalpebble.blogspot.com/2023/10/focus-on-protocol-improvements-in.html for more details on the protocol improvements
This is a Pre-ASF release and did not undergo a formal review by the PMC.
Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.8...2.9
This is a Pre-ASF release and did not undergo a formal review by the PMC.
Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.7...2.8
This is a Pre-ASF release and did not undergo a formal review by the PMC.
Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.6...2.7
This is a Pre-ASF release and did not undergo a formal review by the PMC.
Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/storm-crawler-2.5...2.6
This is a Pre-ASF release and did not undergo a formal review by the PMC.
Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.4...storm-crawler-2.5
This is a Pre-ASF release and did not undergo a formal review by the PMC.
Upgrade to Apache Storm 2.4 Upgrade to Elasticsearch 7.17.2 bugfix Setting "maxDepth": 0 in urlfilter.json prevents ES seed injection #959 Allow compatibility.mode for rest client to connect to ES8+ #962
Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.3...2.4
This is a Pre-ASF release and did not undergo a formal review by the PMC.
https://digitalpebble.blogspot.com/2022/03/whats-new-in-stormcrawler-23.html
Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.2...2.3