Grab Save

Web Scraping Framework

Project README

🇷🇺 Grab Framework Project

Grab Test Status Grab Test Coverage Status Grab Documentation

Project Status

Important notice: pycurl backend is dropped. The only network transport now is urllib3.

The project is being in a slow refactoring stage. It might be possible there will no be new feaures.

Things that are going to happen (no estimation time):

  • Refactoring the source code while keeping most of external API unchanged
  • Fixing bugs
  • Annotating source code with type hints
  • Improving quality of source code to comply with pylint and other linters
  • Moving some features into external packages or moving external dependencies inside Grab
  • Fixing memory leaks
  • Improving test coverage
  • Adding more platforms and python versions to test matrix
  • Releasing new versions on pypi


$ pip install -U grab

See details about installing Grab on different platforms here


Get it here

Telegram chat groups

About Grab (very old description)

Grab is a python web scraping framework. Grab provides a number of helpful methods to perform network requests, scrape web sites and process the scraped content:

  • Automatic cookies (session) support
  • HTTPS/SOCKS proxy support with/without authentication
  • Keep-Alive support
  • IDN support
  • Tools to work with web forms
  • Easy multipart file uploading
  • Flexible customization of HTTP requests
  • Automatic charset detection
  • Powerful API to extract data from DOM tree of HTML documents with XPATH queries

Grab provides interface called Spider to develop multithreaded web-site scrapers:

  • Rules and conventions to organize crawling logic
  • Multiple parallel network requests
  • Automatic processing of network errors (failed tasks go back to task queue)
  • You can create network requests and parse responses with Grab API (see above)
  • Different backends for task queue (in-memory, redis, mongodb)
  • Tools to debug and collect statistics

Grab Example

    import logging

    from grab import Grab


    g = Grab()

    g.doc.set_input('login', '****')
    g.doc.set_input('password', '****')

    g.doc('//ul[@id="user-links"]//button[contains(@class, "signout")]').assert_exists()

    home_url = g.doc('//a[contains(@class, "header-nav-link name")]/@href').text()
    repo_url = home_url + '?tab=repositories'


    for elem in'//h3[@class="repo-list-name"]/a'):
        print('%s: %s' % (elem.text(),

Grab::Spider Example

    import logging

    from grab.spider import Spider, Task


    class ExampleSpider(Spider):
        def task_generator(self):
            for lang in 'python', 'ruby', 'perl':
                url = '' % lang
                yield Task('search', url=url, lang=lang)

        def task_search(self, grab, task):
            print('%s: %s' % (task.lang,

    bot = ExampleSpider(thread_number=2)
Open Source Agenda is not affiliated with "Grab" Project. README Source: lorien/grab

Open Source Agenda Badge

Open Source Agenda Rating