Minicrawl dataset

Download front pages of several million websites with curl.
Record all metadata such as: headers, redirects, TLS version, cipher... as well as data (HTTP body).
Create a dataset from it. The dataset will allow to do research similar to https://w3techs.com/

See also: https://commoncrawl.org/
See also: https://www.rukv.ru/ (created and abandoned by Aleksey Tutubalin)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minicrawl dataset #18842

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Minicrawl dataset #18842

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions