Skip to content
This repository was archived by the owner on Aug 10, 2018. It is now read-only.

ARCHIVED--A Ruby script that scrapes Versionista's web interface to generate a csv summarizing which websites and pages have had recent changes.

Notifications You must be signed in to change notification settings

edgi-govdata-archiving/versionista-outputter

Repository files navigation

Dependencies

  • Ruby 2.2.3 (check out rbenv for a ruby version manager)
  • phantomjs (assuming you have Hombrew, brew install phantomjs)
  • bundler (gem install bundler)

Getting started

  1. Make sure you have up-to-date gems: bundle
  2. Execute the script by running: EMAIL=<your versionista email> PASSWORD=<your password> N=<number of hours back> INDEX=<starting index of csv> ruby capybara_script.rb
  3. If the script completes successfully, you will have new csvs written in the output/ directory.

Extra

  1. Sometimes the current page the script is scraping does not contain the expected html it is seeking. In these cases, Capybara will wait a set amount of time to see whether the content appears before giving up and throwing an error ( that we gracefully rescue for diff pages). The default time is 2 seconds. This number of seconds can me modified by passing the ENV variable "PAGE_WAIT_TIME" when executing the script. For example: PAGE_WAIT_TIME='1.5' or PAGE_WAIT_TIME=10 Beware that with too little a wait time, pages of the script besides the comparison pages may start failing.

About

ARCHIVED--A Ruby script that scrapes Versionista's web interface to generate a csv summarizing which websites and pages have had recent changes.

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages