-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Minicrawl dataset #18842
Copy link
Copy link
Open
Labels
comp-documentationDocumentation (docs, examples, READMEs).Documentation (docs, examples, READMEs).dataset
Description
Download front pages of several million websites with curl.
Record all metadata such as: headers, redirects, TLS version, cipher... as well as data (HTTP body).
Create a dataset from it. The dataset will allow to do research similar to https://w3techs.com/
See also: https://commoncrawl.org/
See also: https://www.rukv.ru/ (created and abandoned by Aleksey Tutubalin)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
comp-documentationDocumentation (docs, examples, READMEs).Documentation (docs, examples, READMEs).dataset