A CLI-driven application to scrape and analyze Facebook group posts for insights using Selenium and Google Gemini AI.
This tool helps users identify potential capstone/thesis ideas, student problems, or other valuable insights from university Facebook group discussions by automating data collection (including posts and comments) and AI-powered categorization.
- π Authenticated Facebook Group Scraping: Securely logs into Facebook to scrape posts and comments from private or public groups.
- π€ Flexible AI Analysis:
- Support for Google Gemini (default) and OpenAI-compatible providers (OpenAI, Ollama, LM Studio, etc.)
- Configurable models (e.g., switch between Gemini 2.5 Pro, 2.0 Flash, or local LLMs)
- Customizable Prompts: Override default AI prompts via JSON configuration
- πΎ Local Database Storage: Stores scraped data and AI insights in a local SQLite database.
- π Data Export & Statistics: Export data to CSV/JSON formats and view detailed statistics.
- π» Advanced CLI Interface:
- Dynamic Filtering: Filter posts by category, author, or potential ideas
- Pagination: Limit results with
--limitoption - Interactive Menus: User-friendly command selection
- β‘ Performance Optimizations:
- Parallel processing for faster scraping
- Asynchronous AI batch processing
- Incremental data saving during scraping
- π€ Enhanced Export Capabilities:
- Flexible output paths
- Multiple export formats (CSV/JSON)
- Automatic directory creation
The application collects the following data from Facebook group posts and comments:
- Post content
- Post URL
- Post timestamp
- Author name
- Author profile picture URL
- Comment content
- Comment timestamp
- Author name
- Author profile picture URL
- Facebook comment ID
- Category (e.g., "Project Idea", "Problem Statement")
- Sub-category
- Keywords
- Summary
- Potential idea flag
- Sentiment analysis (for comments)
- Language:
Python - Web Scraping:
Seleniumwebdriver-managerBeautifulSoup4
- AI & Machine Learning:
google-generativeai
- Database:
SQLite
- CLI:
click
- Utilities:
python-dotenvgetpass
Before you begin, ensure you have the following:
- Python 3.9+
- Git
- A modern Web Browser (e.g., Chrome, Firefox)
- Google Cloud Project & Gemini API Key
For most users, we recommend using the pre-compiled binaries:
- Download the latest version for your platform from the Releases page.
- Run the application:
- Windows: Double-click
FBScrapeIdeas-windows-x64.exe. - macOS/Linux: Open a terminal, make the file executable (
chmod +x FBScrapeIdeas-*), and run it.
- Windows: Double-click
- Interactive Setup: On the first launch, the application will guide you through an interactive wizard to configure your API keys and credentials. No manual
.envfile creation is required!
- Clone the repository:
git clone https://github.com/MasuRii/FBScrapeIdeas.git cd FBScrapeIdeas - Create and activate a virtual environment:
# For Linux/macOS python3 -m venv venv source venv/bin/activate # For Windows (Command Prompt) python -m venv venv venv\Scripts\activate.bat
- Install dependencies:
pip install -r requirements.txt
If you prefer to configure the application manually (e.g., for automated environments):
-
Set up Environment Variables: Create a
.envfile in the project root:# .env # Provider Selection (gemini or openai) AI_PROVIDER=gemini # Gemini Configuration GOOGLE_API_KEY=YOUR_GEMINI_API_KEY_HERE GEMINI_MODEL=models/gemini-2.5-flash
(See AI Provider Configuration for more details)
Note: Facebook credentials are entered securely during scraping or saved during the first-run interactive session.
-
WebDriver Setup:
webdriver-managerwill handle this automatically on the first run.
FB Scrape Ideas supports multiple AI providers, allowing you to choose between Google's Gemini models, OpenAI's official API, or local LLMs running via tools like Ollama or LM Studio.
You can configure these settings via the .env file or the CLI menu.
This is the default provider. You only need a Google API Key.
Configuration (.env):
AI_PROVIDER=gemini
GOOGLE_API_KEY=your_google_api_key
GEMINI_MODEL=models/gemini-2.0-flash # Optional: Change modelAvailable Gemini Models:
models/gemini-2.0-flash(Fast, efficient)models/gemini-1.5-flashmodels/gemini-1.5-pro(Higher reasoning capability)
You can connect to any service that follows the OpenAI API standard, including local LLMs.
AI_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5oRun Ollama locally (ollama serve) and use the following config:
AI_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama # Value doesn't matter for Ollama, but must be present
OPENAI_MODEL=llama3 # Or any model you have pulledStart the local server in LM Studio and use:
AI_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:1234/v1
OPENAI_API_KEY=lm-studio
OPENAI_MODEL=model-identifierPoint the OPENAI_BASE_URL to the provider's endpoint:
AI_PROVIDER=openai
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_API_KEY=your_openrouter_key
OPENAI_MODEL=anthropic/claude-3-opusYou can customize the instructions given to the AI by creating a custom_prompts.json file in the root directory. This allows you to tailor the categorization logic or sentiment analysis to your specific needs.
To use:
- Copy
custom_prompts.example.jsontocustom_prompts.json. - Edit the prompts in
custom_prompts.json.
Example Structure:
{
"post_categorization": "You are an expert post categorizer. Analyze the following...",
"comment_analysis": "You are an expert comment analyzer..."
}The application is run via the CLI:
python main.py <command> [options]Available Commands:
-
scrape: Scrapes posts and comments from a Facebook group.python main.py scrape --group-url "GROUP_URL" [--num-posts 50] [--headless]You'll be prompted securely for Facebook credentials
-
process-ai: Processes scraped posts and comments with the configured AI provider.python main.py process-ai
-
view: Views categorized posts and comments with filtering options:python main.py view [--category CATEGORY] [--author AUTHOR] [--limit N]
- Interactive field and value selection
- Pagination support
-
export: Exports data to CSV or JSON format:python main.py export --format csv|json [--output-path PATH] [--category CATEGORY]
- Handles both posts and comments
- Automatic directory creation
-
stats: Shows comprehensive statistics about collected data:python main.py stats
This tool is provided for educational purposes only. Users must:
- Comply with Facebook's Terms of Service
- Respect privacy and data protection laws
- Not use scraped data for commercial purposes
- Use responsibly and ethically
The developers assume no liability for misuse of this tool. Scraping may violate Facebook's terms - use at your own risk.
