Storm Crawler Versions Save

A scalable, mature and versatile web crawler based on Apache Storm

stormcrawler-3.0

1 week ago

Disclaimer

Apache StormCrawler is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Release Summary

This is our first release after joining the ASF incubator as a poddling. It is a breaking change with renamings in the group ids and the removal of the elasticsearch module.

What's Changed

New Contributors

Full Changelog: https://github.com/apache/incubator-stormcrawler/compare/2.11...stormcrawler-3.0

2.11

4 months ago

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

What's Changed

New Contributors

Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.10...2.11

2.10

6 months ago

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

What's Changed

and a lot more!

Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.9...2.10

See https://digitalpebble.blogspot.com/2023/10/focus-on-protocol-improvements-in.html for more details on the protocol improvements

2.9

8 months ago

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

What's Changed

New Contributors

Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.8...2.9

2.8

1 year ago

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

What's Changed

New Contributors

Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.7...2.8

2.7

1 year ago

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

What's Changed

Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.6...2.7

2.6

1 year ago

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

Highlights

Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/storm-crawler-2.5...2.6

storm-crawler-2.5

1 year ago

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

In a nutshell

  • various dependency upgrades (JSoup, CrawlerCommons, Tika, Elasticsearch)
  • Java 11
  • bugfix AggregationSpout does not release IsInQuery boolean sometimes
  • various improvements to URLFrontier module

In more details

New Contributors

Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.4...storm-crawler-2.5

2.4

2 years ago

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

Upgrade to Apache Storm 2.4 Upgrade to Elasticsearch 7.17.2 bugfix Setting "maxDepth": 0 in urlfilter.json prevents ES seed injection #959 Allow compatibility.mode for rest client to connect to ES8+ #962

Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.3...2.4

2.3

2 years ago

Disclaimer

This is a Pre-ASF release and did not undergo a formal review by the PMC.

https://digitalpebble.blogspot.com/2022/03/whats-new-in-stormcrawler-23.html

What's Changed

New Contributors

Full Changelog: https://github.com/DigitalPebble/storm-crawler/compare/2.2...2.3