web scraping in parallel with Selenium Grid and Docker
Check out the blog post.
Fork/Clone
Create and activate a virtual environment
Install the requirements
Add the token to your environment:
(env)$ export DIGITAL_OCEAN_ACCESS_TOKEN=[your_token]
Spin up four droplets and deploy Docker Swarm:
(env)$ sh project/create.sh
Run the scraper:
(env)$ docker-machine env node-1
(env)$ eval $(docker-machine env node-1)
(env)$ NODE=$(docker service ps --format "{{.Node}}" selenium_hub)
(env)$ for i in {1..8}; do {
python project/script.py ${i} $(docker-machine ip $NODE) &
};
done
Bring down the resources:
(env)$ sh project/destroy.sh