Bomquote Transistor Versions Save

Transistor, a Python web scraping framework for intelligent use cases.

v0.2.4

3 years ago

If following the example in documentation using newt.db then we must use RelStorage<=2.1.1 during install.

v0.2.2

5 years ago

Fixed a bug in BaseWorker.load_items() method which previously resulted in losing scrape data when the number of workers did not equal the number of tasks. Now, using any number of workers or pool size will result in consistent export/save results. While scrape time will change proportional to the number of workers assigned. Wrote tests to ensure the same.

v0.2.1

5 years ago

Added url parameter to the WorkGroup which is a bit more attractive API, instead of including the URL in a kwarg. The reason why the URL was originally included as a kwarg is that depending on how the custom Spider is set up, the URL may already be specified, and it is redundant to specify it again. But for API clarity sake, now we just insist the URL is specified in the WorkGroup. At least, it is easier to read at a quick glance.

v0.2.0

5 years ago

Many API breaking changes. See README at https://github.com/bomquote/transistor/blob/master/CHANGES

v0.1.1

5 years ago
  • standardized SplashScraper attributes: auth, baseurl, browser, cookies, crawlera_user, http_session_timeout, http_session_valid, LUA_SOURCE, max_retries, name, number, referrer, searchurl, splash_args, user_agent.
  • now, nearly all of the SplashScraper attributes can be set via **kwargs if desired
  • when initializing a StatefulBook instance, use a **kwarg called keywords to set the name of the spreadsheet column heading which contains the target search terms. For example: keywords='titles' or keywords='part_numbers'. Defaults to "item".

v0.1.0

5 years ago