0% found this document useful (0 votes)
53 views3 pages

Web Scraping Con Python - Colaboratory

The document shows code for web scraping turnstile data from the New York City Metropolitan Transportation Authority website using Python. It imports libraries for requests, BeautifulSoup and URL retrieval. It finds all the <a> tags on the page which link to turnstile data files and downloads one of the files by constructing the full URL and saving it locally under a new filename. It then adds a 1 second delay before completing.

Uploaded by

Ronaldo Zabalaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views3 pages

Web Scraping Con Python - Colaboratory

The document shows code for web scraping turnstile data from the New York City Metropolitan Transportation Authority website using Python. It imports libraries for requests, BeautifulSoup and URL retrieval. It finds all the <a> tags on the page which link to turnstile data files and downloads one of the files by constructing the full URL and saving it locally under a new filename. It then adds a 1 second delay before completing.

Uploaded by

Ronaldo Zabalaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

30/11/21 15:11 WEB SCRAPING CON PYTHON - Colaboratory

LUIS RONALDO ZABALAGA ESQUIA

import requests
import [Link]
import time
from bs4 import BeautifulSoup

url = '[Link]
response = [Link](url)

soup = BeautifulSoup([Link], "[Link]")

[Link]('a')

<a href="data/nyct/turnstile/turnstile_110604.txt">Saturday, June 04, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110528.txt">Saturday, May 28, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110521.txt">Saturday, May 21, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110514.txt">Saturday, May 14, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110507.txt">Saturday, May 07, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110430.txt">Saturday, April 30, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110423.txt">Saturday, April 23, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110416.txt">Saturday, April 16, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110409.txt">Saturday, April 09, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110402.txt">Saturday, April 02, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110326.txt">Saturday, March 26, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110319.txt">Saturday, March 19, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110312.txt">Saturday, March 12, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110305.txt">Saturday, March 05, 2011</a>,

<a href="data/nyct/turnstile/turnstile_110226.txt">Saturday, February 26, 2011</a


<a href="data/nyct/turnstile/turnstile_110219.txt">Saturday, February 19, 2011</a
<a href="data/nyct/turnstile/turnstile_110212.txt">Saturday, February 12, 2011</a
<a href="data/nyct/turnstile/turnstile_110205.txt">Saturday, February 05, 2011</a
<a href="data/nyct/turnstile/turnstile_110129.txt">Saturday, January 29, 2011</a>
<a href="data/nyct/turnstile/turnstile_110122.txt">Saturday, January 22, 2011</a>
<a href="data/nyct/turnstile/turnstile_110115.txt">Saturday, January 15, 2011</a>
<a href="data/nyct/turnstile/turnstile_110108.txt">Saturday, January 08, 2011</a>
<a href="data/nyct/turnstile/turnstile_110101.txt">Saturday, January 01, 2011</a>
<a href="data/nyct/turnstile/turnstile_101225.txt">Saturday, December 25, 2010</a
<a href="data/nyct/turnstile/turnstile_101218.txt">Saturday, December 18, 2010</a
<a href="data/nyct/turnstile/turnstile_101211.txt">Saturday, December 11, 2010</a
<a href="data/nyct/turnstile/turnstile_101204.txt">Saturday, December 04, 2010</a
<a href="data/nyct/turnstile/turnstile_101127.txt">Saturday, November 27, 2010</a
<a href="data/nyct/turnstile/turnstile_101120.txt">Saturday, November 20, 2010</a
<a href="data/nyct/turnstile/turnstile_101113.txt">Saturday, November 13, 2010</a
<a href="data/nyct/turnstile/turnstile_101106.txt">Saturday, November 06, 2010</a
<a href="data/nyct/turnstile/turnstile_101030.txt">Saturday, October 30, 2010</a>
<a href="data/nyct/turnstile/turnstile_101023.txt">Saturday, October 23, 2010</a>
<a href="data/nyct/turnstile/turnstile_101016.txt">Saturday, October 16, 2010</a>
<a href="data/nyct/turnstile/turnstile_101009.txt">Saturday, October 09, 2010</a>
<a href="data/nyct/turnstile/turnstile_101002.txt">Saturday, October 02, 2010</a>
<a href="data/nyct/turnstile/turnstile_100925.txt">Saturday, September 25, 2010</
<a href="data/nyct/turnstile/turnstile_100918.txt">Saturday, September 18, 2010</
<a href="data/nyct/turnstile/turnstile_100911.txt">Saturday, September 11, 2010</
[Link] 1/3
30/11/21 15:11 WEB SCRAPING CON PYTHON - Colaboratory
y _ y, p ,
<a href="data/nyct/turnstile/turnstile_100904.txt">Saturday, September 04, 2010</
<a href="data/nyct/turnstile/turnstile_100828.txt">Saturday, August 28, 2010</a>,
<a href="data/nyct/turnstile/turnstile_100821.txt">Saturday, August 21, 2010</a>,
<a href="data/nyct/turnstile/turnstile_100814.txt">Saturday, August 14, 2010</a>,
<a href="data/nyct/turnstile/turnstile_100807.txt">Saturday, August 07, 2010</a>,
<a href="data/nyct/turnstile/turnstile_100731.txt">Saturday, July 31, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100724.txt">Saturday, July 24, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100717.txt">Saturday, July 17, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100710.txt">Saturday, July 10, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100703.txt">Saturday, July 03, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100626.txt">Saturday, June 26, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100619.txt">Saturday, June 19, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100612.txt">Saturday, June 12, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100605.txt">Saturday, June 05, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100522.txt">Saturday, May 22, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100515.txt">Saturday, May 15, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100508.txt">Saturday, May 08, 2010</a>,

<a href="data/nyct/turnstile/turnstile_100505.txt">Wednesday, May 05, 2010</a>]

one_a_tag = [Link]("a")[38]
link = one_a_tag['href']

download_url = '[Link]
[Link](download_url,'./'+link[[Link]('/turnstile_')+1:])

('./turnstile_211120.txt', <[Link] at 0x7f56c11adf90>)

[Link](1)

1s
check completado a las 15:09
[Link] 2/3
30/11/21 15:11 WEB SCRAPING CON PYTHON - Colaboratory
check 1 s completado a las 15:09

[Link] 3/3

You might also like