Crawlab Versions Save

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

v0.4.10

4 years ago

Features / Enhancement

  • Enhanced Log Management. Centralizing log storage in MongoDB, reduced the dependency of PubSub, allowing log error detection.
  • API Token. Allow users to generate API tokens and use them to integrate into their own systems.
  • Web Hook. Trigger a Web Hook http request to pre-defined URL when a task starts or finishes.
  • Auto Install Dependencies. Allow installing dependencies automatically from requirements.txt or package.json.
  • Auto Results Collection. Set results collection to results_<spider_name> if it is not set.
  • Optimized Project List. Not display "No Project" item in the project list.
  • Upgrade Node.js. Upgrade Node.js version from v8.12 to v10.19.
  • Add Run Button in Schedule Page. Allow users to manually run task in Schedule Page.

Bug Fixes

  • Cannot register. #670
  • Spider schedule tab cron expression shows second. #678
  • Missing daily stats in spider. #684
  • Results count not update in time. #689

v0.4.9

4 years ago

Features / Enhancement

  • Challenges. Users can achieve different challenges based on their actions.
  • More Advanced Access Control. More granular access control, e.g. normal users can only view/manage their own spiders/projects and admin users can view/manage all spiders/projects.
  • Feedback. Allow users to send feedbacks and ratings to Crawlab team.
  • Better Home Page Metrics. Optimized metrics display on home page.
  • Configurable Spiders Converted to Customized Spiders. Allow users to convert their configurable spiders into customized spiders which are also Scrapy spiders.
  • View Tasks Triggered by Schedule. Allow users to view tasks triggered by a schedule. #648
  • Support Results De-Duplication. Allow users to configure de-duplication of results. #579
  • Support Task Restart. Allow users to re-run historical tasks.

Bug Fixes

  • CLI unable to use on Windows. #580
  • Re-upload error. #643 #640
  • Upload missing folders. #646
  • Unable to add schedules in Spider Page.

v0.4.8

4 years ago

Features / Enhancement

  • Support Installations of More Programming Languages. Now users can install or pre-install more programming languages including Java, .Net Core and PHP.
  • Installation UI Optimization. Users can better view and manage installations on Node List page.
  • More Git Support. Allow users to view Git Commits record, and allow checkout to corresponding commit.
  • Support Hostname Node Registration Type. Users can set hostname as the node key as the unique identifier.
  • RPC Support. Added RPC support to better manage node communication.
  • Run On Master Switch. Users can determine whether to run tasks on master. If not, all tasks will be run only on worker nodes.
  • Disabled Tutorial by Default.
  • Added Related Documentation Sidebar.
  • Loading Page Optimization.

Bug Fixes

  • Duplicated Nodes. #391
  • Duplicated Spider Upload. #603
  • Failure in dependencies installation results in unusable dependency installation functionalities.. #609
  • Create Tasks for Offline Nodes. #622

v0.4.7

4 years ago

Features / Enhancement

  • Better Support for Scrapy. Spiders identification, settings.py configuration, log level selection, spider selection. #435
  • Git Sync. Allow users to sync git projects to Crawlab.
  • Long Task Support. Users can add long-task spiders which is supposed to run without finishing. #425
  • Spider List Optimization. Tasks count by status, tasks detail popup, legend. #425
  • Upgrade Check. Check latest version and notifiy users to upgrade.
  • Spiders Batch Operation. Allow users to run/stop spider tasks and delete spiders in batches.
  • Copy Spiders. Allow users to copy an existing spider to create a new one.
  • Wechat Group QR Code.

Bug Fixes

  • Schedule Spider Selection Issue. Fields not responding to spider change.
  • Cron Jobs Conflict. Possible bug when two spiders set to the same time of their cron jobs. #515 #565
  • Task Log Issue. Different tasks write to the same log file if triggered at the same time. #577
  • Task List Filter Options Incomplete.

v0.4.6

4 years ago

Features / Enhancement

  • SDK for Node.js. Users can apply SDK in their Node.js spiders.
  • Log Management Optimization. Log search, error highlight, auto-scrolling.
  • Task Execution Process Optimization. Allow users to be redirected to task detail page after triggering a task.
  • Task Display Optimization. Added "Param" in the Latest Tasks table in the spider detail page. #295
  • Spider List Optimization. Added "Update Time" and "Create Time" in spider list page.
  • Page Loading Placeholder.

Bug Fixes

  • Lost Focus in Schedule Configuration. #519
  • Unable to Upload Spider using CLI. #524

v0.4.5

4 years ago

Features / Enhancement

  • Interactive Tutorial. Guide users through the main functionalities of Crawlab.
  • Global Environment Variables. Allow users to set global environment variables, which will be passed into all spider programs. #177
  • Project. Allow users to link spiders to projects. #316
  • Demo Spiders. Added demo spiders when Crawlab is initialized. #379
  • User Admin Optimization. Restrict privilleges of admin users. #456
  • Setting Page Optimization.
  • Task Results Optimization.

Bug Fixes

  • Unable to find spider file error. #485
  • Click delete button results in redirect. #480
  • Unable to create files in an empty spider. #479
  • Download results error. #465
  • crawlab-sdk CLI error. #458
  • Page refresh issue. #441
  • Results not support JSON. #202
  • Getting all spider after deleting a spider.
  • i18n warning.

v0.4.4

4 years ago

Features / Enhancement

  • Email Notification. Allow users to send email notifications.
  • DingTalk Robot Notification. Allow users to send DingTalk Robot notifications.
  • Wechat Robot Notification. Allow users to send Wechat Robot notifications.
  • API Address Optimization. Added relative URL path in frontend so that users don't have to specify CRAWLAB_API_ADDRESS explicitly.
  • SDK Compatiblity. Allow users to integrate Scrapy or general spiders with Crawlab SDK.
  • Enhanced File Management. Added tree-like file sidebar to allow users to edit files much more easier.
  • Advanced Schedule Cron. Allow users to edit schedule cron with visualized cron editor.

Bug Fixes

  • nil retuened error.
  • Error when using HTTPS.

v0.4.3

4 years ago

Features / Enhancement

  • Dependency Installation. Allow users to install/uninstall dependencies and add programming languages (Node.js only for now) on the platform web interface.
  • Pre-install Programming Languages in Docker. Allow Docker users to set CRAWLAB_SERVER_LANG_NODE as Y to pre-install Node.js environments.
  • Add Schedule List in Spider Detail Page. Allow users to view / add / edit schedule cron jobs in the spider detail page. #360
  • Align Cron Expression with Linux. Change the expression of 6 elements to 5 elements as aligned in Linux.
  • Enable/Disable Schedule Cron. Allow users to enable/disable the schedule jobs. #297
  • Better Task Management. Allow users to batch delete tasks. #341
  • Better Spider Management. Allow users to sort and filter spiders in the spider list page.
  • Added Chinese CHANGELOG.
  • Added Github Star Button at Nav Bar.

Bug Fixes

  • Schedule Cron Task Issue. #423
  • Upload Spider Zip File Issue. #403 #407
  • Exit due to Network Failure. #340
  • Cron Jobs not Running Correctly
  • Schedule List Columns Mis-positioned
  • Clicking Refresh Button Redirected to 404 Page

v0.4.2

4 years ago

Features / Enhancement

  • Disclaimer. Added page for Disclaimer.
  • Call API to fetch version. #371
  • Configure to allow user registration. #346
  • Allow adding new users.
  • More Advanced File Management. Allow users to add / edit / rename / delete files. #286
  • Optimized Spider Creation Process. Allow users to create an empty customized spider before uploading the zip file.
  • Better Task Management. Allow users to filter tasks by selecting through certian criterions. #341

Bug Fixes

  • Duplicated nodes. #391
  • "mongodb no reachable" error. #373

v0.4.1

4 years ago

Features / Enhancement

  • Spiderfile Optimization. Stages changed from dictionary to array. #358
  • Baidu Tongji Update.

Bug Fixes

  • Unable to display schedule tasks. #353
  • Duplicate node registration. #334