Skip to content

AI-powered tool to analyze company websites for blog presence/post counts via browser-use and Browserbase

Notifications You must be signed in to change notification settings

moritzWa/CompanySEOAnalysis

Repository files navigation

Company SEO Analysis

A TypeScript/Bun tool to analyze websites from CSV files and determine if they have blogs and how many blog posts they contain. This tool leverages ai browser automation through browser-use and Browserbase to achieve high accuracy for complex blog structures and infinite scroll sites.

Setup

  1. Install dependencies:

    bun install
  2. Set up API keys:

    # Create .env file and add your API keys
    echo "GEMINI_API_KEY=your_gemini_api_key_here" > .env
    echo "BROWSER_USE_API_KEY=your_browser_use_api_key_here" >> .env
    echo "BROWSER_BASE_API_KEY=your_browserbase_api_key_here" >> .env
    echo "BROWSER_BASE_PROJECT_ID=your_browserbase_project_id_here" >> .env

    Get API keys from:

Usage

cd scraper

# Put your CSV file in the data/ directory
cp your-leads.csv data/
bun run dev

# Or specify CSV file path
bun run src/index.ts path/to/your/leads.csv

# Run evaluations
bun run test

CSV Format

Your input CSV must have a "Website" column containing the company websites. Example:

Company,Website,Email
Acme Corp,acme.com,[email protected]
Example Inc,https://example.com,[email protected]

Output

The tool will create a new CSV file with the same name but with "-with-blog-analysis" appended. It adds these new columns:

  • hasBlog: Boolean indicating if the website has a blog
  • blogPostCount: Estimated number of blog posts found
  • blogUrl: Direct URL to the blog section
  • hasResources: Boolean indicating if the site has resources/case studies
  • resourcesCount: Number of resources found
  • resourcesUrl: Direct URL to resources section

Example Output

Company,Website,hasBlog,blogPostCount,blogUrl,hasResources,resourcesCount,resourcesUrl
Envoy B2B,https://envoyb2b.com,true,18,https://envoyb2b.com/news,true,4,https://envoyb2b.com/case-studies-and-research
Nalpeiron,https://nalpeiron.com,true,20,https://nalpeiron.com/blog,false,0,

How it works

This tool uses cutting-edge browser automation technology to perform intelligent website analysis:

  1. CSV Processing: Reads your CSV file containing company websites
  2. Browser Automation: Uses browser-use - an AI-powered browser automation framework that can intelligently navigate websites
  3. Cloud Infrastructure: Leverages Browserbase - a headless browser infrastructure service for reliable, scalable web automation
  4. Smart Blog Detection: The browser agent intelligently searches for blog sections by:
    • Trying common blog endpoints (/blog, /news, /insights, etc.)
    • Analyzing page structure and navigation menus
    • Using AI to understand page content and identify blog-like sections
  5. Content Analysis: Uses Gemini AI to analyze discovered pages and accurately count blog posts
  6. Fallback Logic: Implements robust fallback mechanisms if primary detection methods fail
  7. Results Export: Outputs comprehensive analysis to a new CSV file with blog metrics

Technology Stack

  • browser-use: AI-powered browser automation framework
  • Browserbase: Cloud-based headless browser infrastructure
  • Gemini AI: Advanced content analysis and understanding
  • TypeScript/Bun: Fast, modern runtime and type safety

Rate Limiting

The tool processes websites in batches of 5 with 2-second delays between batches to be respectful to target servers and avoid rate limiting.

Advanced Features

  • Multiple Analysis Methods: Combines browser automation, sitemap analysis, and AI-powered content detection
  • Infinite Scroll Support: Handles modern blogs with pagination and infinite scroll
  • Resource Detection: Finds case studies, whitepapers, and downloadable resources
  • Robust Fallbacks: Multiple detection strategies ensure high accuracy across different site architectures

Evaluations & Testing

This tool includes comprehensive testing with 20+ manually verified test cases covering complex scenarios:

# Run evaluations
bun run test

# Results saved to actual-results.json

Test Coverage

  • Complex Blog Structures: Multi-section blogs, subdomains, hidden navigation
  • Infinite Scroll Sites: Modern SPAs with dynamic loading
  • Edge Cases: Sites with email walls, dropdown menus, non-standard URLs
  • Resource Detection: Case studies, whitepapers, downloadable content

Manual Verification Examples

  • Netpresenter: 212 blog posts across paginated sections
  • Craftview: Blog hosted on subdomain (blog.craftview.de)
  • Envoy B2B: No blog, but 8 case studies in dropdown menu
  • Nalpeiron: 38+ posts across two blog sections

Accuracy Metrics: Blog detection with 30% tolerance for post counts, resource detection with 50% tolerance.

Troubleshooting

  • Browser issues: Check browser-use-checker-problems.md for known issues and solutions
  • Debug output: Sitemap analysis debug files are saved to debug-sitemaps/
  • API rate limits: Reduce batch size or increase delays in the source code
  • Missing results: Some sites may block automated access - check manually if results seem incomplete

About

AI-powered tool to analyze company websites for blog presence/post counts via browser-use and Browserbase

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •