Crawlab Versions Save

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

v0.6.3

9 months ago

v0.6.2

10 months ago

Web Crawler Management Platform Crawlab v0.6.2 Official Release

Overview

Crawlab v0.6.2 is the latest iterative version of Crawlab v0.6.x, bringing a series of improvements, including bug fixes, feature enhancements, and enhanced functionality for environment variables.

Changelog

Bug Fixes

Feature Enhancements

Community

If you find Crawlab helpful for your daily development or your company, please consider starring it on GitHub. If you encounter any issues, feel free to raise them as issues on GitHub. Additionally, you're welcome to contribute to the development of Crawlab. You can also join the Crawlab technical discussion group by adding WeChat account tikazyq1, where you can communicate and discuss with other developers regarding technical development and deployment usage.

References

v0.6.1

1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/crawlab-team/crawlab/compare/v0.6.0...v0.6.1

v0.6.0-1

1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/crawlab-team/crawlab/compare/v0.6.0...v0.6.0-1

v0.6.0

1 year ago

Change Log (v0.6.0)

Overview

As a major release, v0.6.0 is consisted of a number of large changes to enhance the performance, scalability, robustness and usability of Crawlab. This beta version is theoretically more robust than older versions mainly in task execution, files synchronization and node management, yet we still recommend users to thoroughly run tests with various samples.

Enhancements

Backend

  • File Synchronization. Migrated file sync from MongoDB GridFS to SeaweedFS for better stability and robustness.
  • Node Communication. Migrated node communication from Redis-based RPC to gRPC. Worker nodes indirectly interact with MongoDB by making gRPC calls to the master node.
  • Task Queue. Migrated task queue from Redis list to MongoDB collection to allow more flexibility (e.g. priority queue).
  • Logging. Migrated logging storage system to SeaweedFS to resolve performance issue in MongoDB.
  • SDK Integration. Migrated results data ingestion from native SDK to task handler side.
  • Task Related. Abstracted task related logics into Task Scheduler, Task Handler and Task Runners to increase decoupling and improve scalability and maintainability.
  • Compotenization. Introduced DI (dependency injection) framework and componentized modules, services and sub-systems.
  • Plugin Framework. Crawlab Plugin Framework (CPF) has been released. See more info [here](https://docs.crawlab.cn/en/guide/plugin/).
  • Git Integration. Git integration is implemented as a built-in feature.
  • Scrapy Integration. Scrapy integration is implemented as a plugin [spider-assistant](https://docs.crawlab.cn/en/guide/plugin/plugin-spider-assistant).
  • Dependency Integration. Dependency integration is implemented as a plugin [dependency](https://docs.crawlab.cn/en/guide/plugin/plugin-dependency).
  • Notifications. Notifications feature is implemented as a plugin [notification](https://docs.crawlab.cn/en/guide/plugin/plugin-notification).

Frontend

  • Vue 3. Migrated to latest version of frontend framework Vue 3 to support more advanced features such as composition API and TypeScript.
  • UI Framework. Built with Vue 3-based UI framework Element-Plus from Vue-Element-Admin, more flexibility and functionality.
  • Advanced File Editor. Support more advanced file editor features including drag-and-drop copying/moving files, renaming, deleting, file editing, code highlight, nav tabs, etc.
  • Customizable Table. Support more advanced built-in operations such as columns adjustment, batch operation, searching, filtering, sorting, etc.
  • Nav Tabs. Support multiple nav tabs for viewing different pages.
  • Batch Creation. Support batch creating objects including spiders, projects, schedules, etc.
  • Detail Navigation. Sidebar navigation in detail pages.
  • Enhanced Dashboard. More stats charts in home page dashboard.

Miscellaneous

v0.6.0-beta.20211224

2 years ago

Change Log (v0.6.0-beta.20211224)

Overview

This is the third beta release for the next major version v0.6.0. With more features and optimization coming in, the release of official version v0.6.0 is approaching soon.

Enhancement

  • Internationalization. Support Chinese.
  • CLI Upload Spider. #1020
  • Official Plugins. Allow users to install official plugins on Crawlab web UI.
  • More Documentation. Added documentation for plugins and CLI.

Bug Fixes

TODOs

  • Associated Tasks. There will be main tasks and their sub-tasks if task mode is "all nodes" or "selected nodes".
  • Crontab Editor. Frontend component that visualize the crontab editing.
  • Results Deduplication.
  • Environment Variables.
  • Frontend Utility Enhancement. Advanced features such as saved table customization.
  • Log Auto Cleanup.
  • More Documentation.
  • E2E Tests.
  • Frontend Output File Size Optimization.

What Next

The next version could the official release of v0.6.0, but not determined yet. There will be more tests running against the current beta version to ensure robustness and production-ready deployment.

v0.6.0-beta.20211120

2 years ago

Change Log (v0.6.0-beta.20211120)

Overview

This is the second beta release for the next major version v0.6.0 after the first beta release. With more features and optimization coming in, the release of official version v0.6.0 is approaching soon.

Enhancement

Backend

  • Plugin Framework. Crawlab Plugin Framework (CPF) has been released. See more info here.
  • Git Integration. Git integration is implemented as a built-in feature.
  • Scrapy Integration. Scrapy integration is implemented as a plugin spider-assistant.
  • Dependency Integration. Dependency integration is implemented as a plugin dependency.
  • Notifications. Notifications feature is implemented as a plugin notification.
  • Documentation Site. Set up documentation site.

Frontend

  • Bug Fixing.

TODOs

  • Associated Tasks. There will be main tasks and their sub-tasks if task mode is "all nodes" or "selected nodes".
  • Crontab Editor. Frontend component that visualize the crontab editing.
  • Results Deduplication.
  • Environment Variables.
  • Internationalization. Support Chinese.
  • Frontend Utility Enhancement. Advanced features such as saved table customization.
  • Log Auto Cleanup.
  • More Documentation.

What Next

The next version could the official release of v0.6.0, but not determined yet. There will be more tests running against the current beta version to ensure robustness and production-ready deployment.

v0.6.0-beta.20210803

2 years ago

Change Log (v0.6.0-beta.20210803)

Overview

This is the beta release for the next major version v0.6.0. It recommended NOT to use it in production as it is not fully tested and thus not stable enough. Futhermore, more features including those not ready in the beta release (e.g. Git, Scrapy, Notification) are planned to be integrated into the live version, in the form of plugins.

Enhancement

As a major release, v0.6 (including beta versions) is consisted of a number of large changes to enhance the performance, scalability, robustness and usability of Crawlab. This beta version is theoretically more robust than older versions mainly in task execution, files synchronization and node management, yet we still recommend users to thoroughly run tests with various samples.

Backend

  • File Synchronization. Migrated file sync from MongoDB GridFS to SeaweedFS for better stability and robustness.
  • Node Communication. Migrated node communication from Redis-based RPC to gRPC. Worker nodes indirectly interact with MongoDB by making gRPC calls to the master node.
  • Task Queue. Migrated task queue from Redis list to MongoDB collection to allow more flexibility (e.g. priority queue).
  • Logging. Migrated logging storage system to SeaweedFS to resolve performance issue in MongoDB.
  • SDK Integration. Migrated results data ingestion from native SDK to task handler side.
  • Task Related. Abstracted task related logics into Task Scheduler, Task Handler and Task Runners to increase decoupling and improve scalability and maintainability.
  • Compotenization. Introduced DI (dependency injection) framework and componentized modules, services and sub-systems.

Frontend

  • Vue 3. Migrated to latest version of frontend framework Vue 3 to support more advanced features such as composition API and TypeScript.
  • UI Framework. Built with Vue 3-based UI framework Element-Plus from Vue-Element-Admin, more flexibility and functionality.
  • Advanced File Editor. Support more advanced file editor features including drag-and-drop copying/moving files, renaming, deleting, file editing, code highlight, nav tabs, etc.
  • Customizable Table. Support more advanced built-in operations such as columns adjustment, batch operation, searching, filtering, sorting, etc.
  • Nav Tabs. Support multiple nav tabs for viewing different pages.
  • Batch Creation. Support batch creating objects including spiders, projects, schedules, etc.
  • Detail Navigation. Sidebar navigation in detail pages.
  • Enhanced Dashboard. More stats charts in home page dashboard.

TODOs

As you may be aware that this is a beta release, some of the existing useful features such as Git and Scrapy integration may not be available. However, we are trying to include them in the official v0.6.0 release, as some of their core functionalities are already ready in the code base, and we will add to the stable version only if they are fully tested.

  • Plugin Framework. Advanced features will exist in the form of plugins, or pluggable modules.
  • Git Integration. To be included as a plugin.
  • Scrapy Integration. To be included as a plugin.
  • Notifications. To be included as a plugin.
  • Associated Tasks. There will be main tasks and their sub-tasks if task mode is "all nodes" or "selected nodes".
  • Crontab Editor. Frontend component that visualize the crontab editing.
  • Results Deduplication.
  • Environment Variables.
  • Internationalization. Support Chinese.
  • Frontend Utility Enhancement. Advanced features such as saved table customization.
  • Log Auto Cleanup.
  • Documentation.

What Next

This beta release is only a preview and a test ground for the core functionalies in Crawlab v0.6. Therefore, we will invite you guys to download and run more tests. The official release is expected to be ready after major issues from the beta version are sorted and Plugin Framework and other key features are developed and fully tested. With that beared in mind, a second beta version before the main release will also be possible.

v0.5.1

3 years ago

Features / Enhancement

  • Added error message details.
  • Added Golang programming language support.
  • Added web driver installation scripts for Chrome Driver and Firefox.
  • Support system tasks. A "system task" is similar to normal spider task, it allows users to view logs of general tasks such as installing languages.
  • Changed methods of installing languages from RPC to system tasks.

Bug Fixes

  • Fixed first download repo 500 error in Spider Market page. #808
  • Fixed some translation issues.
  • Fixed 500 error in task detail page. #810
  • Fixed password reset issue. #811
  • Fixed unable to download CSV issue. #812
  • Fixed unable to install node.js issue. #813
  • Fixed disabled status for batch adding schedules. #814

v0.5.0

3 years ago

Features / Enhancement

  • Spider Market. Allow users to download open-source spiders into Crawlab.
  • Batch actions. Allow users to interact with Crawlab in batch fashions, e.g. batch run tasks, batch delete spiders, ect.
  • Migrate MongoDB driver to MongoDriver.
  • Refactor and optmize node-related logics.
  • Change default task.workers to 16.
  • Change default nginx client_max_body_size to 200m.
  • Support writing logs to ElasticSearch.
  • Display error details in Scrapy page.
  • Removed Challenge page.
  • Moved Feedback and Dislaimer pages to navbar.

Bug Fixes

  • Fixed log not expiring issue because of failure to create TTL index.
  • Set default log expire duration to 1 day.
  • task_id index not created.
  • docker-compose.yml fix.
  • Fixed 404 page.
  • Fixed unable to create worker node before master node issue.