Log inSign up
Common Crawl Foundation
1,446 posts
user avatar
Common Crawl Foundation
@CommonCrawl
Common Crawl is a non-profit foundation dedicated to the Open Web.
San Francisco, CA
commoncrawl.org
Joined February 2010
1,591
Following
7,830
Followers
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Dec 21, 2024
    Common Crawl - Blog - Host- and Domain-Level Web Graphs October, November, and December 2024
    From commoncrawl.org
    9.7K
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Mar 31, 2025
    Common Crawl - Blog - Introducing Common Crawl AI Agent by ReadyAI
    From commoncrawl.org
    9K
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Mar 25, 2025
    Our friends at Webrecorder have announced the launch of GovArchive.us, a dedicated site for exploring their US Government Web Archive on Browsertrix. More details in their blog post: webrecorder.net/blog/2025-03-2…
    A card saying "Webrecorder US Government Web Archive — Selected Archives", and "A project by Webrecorder"
    Webrecorder US Government Web Archive
    From govarchive.us
    3.4K
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Feb 23, 2025
    February 2025 Crawl Archive Now Available The data was crawled between February 6th and February 20th, and contains 2.6 billion web pages. Page captures are from 47.6 million hosts or 38.5 million registered domains and include 1 billion new URLs not visited in any of our prior
    1.7K
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Sep 29, 2017
    Need 3 billion web pages in WARC, WAT, and WET? Here you go! #opendata
    Common Crawl - Blog - September 2017 Crawl Archive Now Available
    From commoncrawl.org
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Sep 29, 2017
    Check it out! "Common Crawl And Unlocking Web Archives For Research" via @forbes
    Common Crawl And Unlocking Web Archives For Research
    From forbes.com
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Dec 16, 2011
    MapReduce for the Masses: Zero to Hadoop in Five Minutes with Common Crawl by @stevesalevan bit.ly/vCu8uM
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Oct 20, 2024
    commoncrawl.org/blog/october-2… The data was crawled between October 3rd and October 16th, and contains 2.49 billion web pages . Page captures are from 47.5 million hosts or 38.3 million registered domains and include 1.03 billion new URLs, not visited in any of our prior crawls.
    Common Crawl - Blog - October 2024 Crawl Archive Now Available
    From commoncrawl.org
    2.9K
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Jan 22, 2025
    We are happy to announce cc-downloader, an experimental command-line tool for downloading Common Crawl data via https:
    Common Crawl - Blog - Introducing cc-downloader
    From commoncrawl.org
    1.7K
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Jun 3, 2024
    commoncrawl.org/blog/may-2024-… Our 100th crawl!!
    Common Crawl - Blog - May 2024 Crawl Archive Now Available
    From commoncrawl.org
    3.9K
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Dec 19, 2024
    Common Crawl - Blog - December 2024 Crawl Archive Now Available
    From commoncrawl.org
    1.4K
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Jan 10, 2012
    This is an awesome idea! @stephen_wolfram on a .data TLD bit.ly/w2mwhc HN discussion bit.ly/w2mwhc
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Nov 18, 2024
    Common Crawl - Blog - November 2024 Crawl Archive Now Available
    From commoncrawl.org
    893
  • user avatar
    Common Crawl Foundation
    @CommonCrawl
    Jun 25, 2024
    📷 Check out NVIDIA NeMo Curator - This GPU-accelerated data-curation library includes data download, document deduplication, language identification, filtering, and other features often requested by Common Crawl users. Helpful for preparing large-scale, high-quality datasets for
    2.5K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up