OFFICIAL (CLOSED) \ NON-SENSITIVE
Open Facebook Crawler based on Python
Requirements and Deliverables: To implement and deliver an Open Facebook
Crawler in python that will allow automatic collection of open Facebook posts
and comments based on a given Open Facebook page according to the
specifications 1) Open Facebook Crawler UI and 2) Open Facebook Python
Crawler. The Open Facebook Scraper based on Python will be able to run on
any Windows Notebook.
This will include: Python source code and necessary python libraries installers
and user instructions to set-up and run the crawler on a windows Notebook.
Specifications of the Open Facebook Crawler GUI
There will be a Graphical User Interface (GUI) where users will be able to
enter their Facebook personal information: User_ID and Password
information and the Open Facebook Page URL for configuring the crawl as
shown in Figure 1.
A SAVE button to save the information. Make an excel file in same where
the application exist. Save username, password and page url in that file.
A START button to start the crawler.
A STOP button to provide hard stop for the crawler.
Status of the Crawler will be updated: “Scraper is Running or Scraper is not
Running”.
Status of the Crawl will be updated every 5-10 secs according to the total
number of posts and number of comments crawled.
Incorrect User_ID or/and Password will invoke a warning prompt to
encourage the user to check and re-enter their personal Facebook
information as shown in Figure 2.
1
OFFICIAL (CLOSED) \ NON-SENSITIVE
Incorrect open Facebook page URL will also invoke a warning prompt to
encourage the user to check and re-enter the URL as shown in Figure 2.
Figure 1: GUI interface for configuring Open Facebook Crawl
Figure 2: Appearance of Prompts dialog boxes when information is not
correctly entered in the GUI interface for configuring Open Facebook Crawl
Specifications of the Open Facebook Python Crawler
2
OFFICIAL (CLOSED) \ NON-SENSITIVE
The Open Facebook Python crawler must be able to crawl the following
information found on the Facebook Pages specifically: all Posts and
Comments found on the page.
The following is a list of items to be extracted and placed in a output excel
file as shown In Figure 3 and Table 1 from the all the posts and comments
found in the Open Facebook Page:
Facebook POSTS: Brand, Post ID, Date, Content, No. Likes, No.
Shares, No. Comments
Facebook Comments: User Names and Comments, Post ID
Table 1: Typical items to be crawled
Figure 3: A typical Open Facebook Post and all the items to be crawled
3
OFFICIAL (CLOSED) \ NON-SENSITIVE
How to get post ID
To get post ID click on date. Post ID will visible in url.
Location of output excel file
4
OFFICIAL (CLOSED) \ NON-SENSITIVE
Output file should be saved in the same folder where application is placed.
Name of output excel file should be the name of crawled Facebook page.
How output file should formated
A sample output file is provided. Follow that
Meaning of open facebook pages
Open pages are those which are visible on facebook and published. If page is
unpublished and cannot be accessed, show error (as discussed above).
Note
For the company Facebook pages let's say they have 300 posts and
maybe 3000 comments. You must ensure your tool is able to scrape all
the posts and comments as much as possible.
Scrap data in a way that facebook should not block account due to
scrapping activities.
Final Delivery
As discussed, please follow the specifications, example output file and deliver
the following: 1) souce code of python facebook posts and comments scraper
2) user-guide on installation of code to run on window platform, 3) necessary
python installers libraries. so that we can run it on my end. Thanks
Example lists of Typical Open Facebook Pages for Potential Crawling
The following are some examples of open-Facebook pages
No Open-Facebook Pages
1 [Link]
2 [Link]
3 [Link]
4 [Link]
5 [Link]
6 [Link]
7 [Link]
5
OFFICIAL (CLOSED) \ NON-SENSITIVE