Question 1
1.1
Organised Data: This refers to data that is structured and arranged in a logical format,
often in tables or databases. It can be easily accessed, processed, and analyzed.
Example: Excel spreadsheets, SQL databases.
Unorganised Data: This refers to data that is not structured or arranged systematically,
making it difficult to process or analyze directly.
Example: text files, videos, images, or raw logs.
1.2 Purpose of the following libraries:
a. Pandas is used for data manipulation and analysis. It provides data structures like
DataFrame and Series that make it easy to clean, filter, group, and analyze structured data.
b. Matplotlib is a data visualization library used to create static, animated, and interactive
plots in Python. It is commonly used for line charts, bar graphs, scatter plots, etc.
c. NumPy is used for numerical computing in Python. It provides support for multi-
dimensional arrays and matrices, along with a collection of mathematical functions to operate
on these arrays efficiently.
1.3
Purpose:
BeautifulSoup is a Python library used for web scraping. It helps in parsing HTML and XML
documents, allowing users to extract data from websites easily by navigating the DOM
(Document Object Model).
from bs4 import BeautifulSoup
import requests
# Send a request to the website
response = requests.get("https://example.com")
# Parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")
# Extract and print all the paragraph texts
for paragraph in soup.find_all('p'):
print(paragraph.get_text())
requests.get() fetches the web page.
BeautifulSoup() parses the HTML content.
find_all('p') is used to locate all <p> tags on the page.
get_text() extracts the text inside each paragraph.
In this example, BeautifulSoup is used to extract all paragraph texts from a webpage.
Question 2
import numpy as np
# Create an array from the given data
daily_steps = np.array([6532, 8741, 5403, 7829, 9126, 6087,
7324, 8560, 5972, 7645, 6891, 8102, 7456, 6213, 9034])
# Sort the array in descending order
sorted_steps = np.sort(daily_steps)[::-1]
print("Sorted Steps (descending order):", sorted_steps)
# Calculate the mean and standard deviation of the daily steps
mean_steps = round(np.mean(daily_steps))
std_dev_steps = round(np.std(daily_steps))
print("Mean Steps:", mean_steps)
print("Standard Deviation of Steps:", std_dev_steps)
# Determine the 25th, 50th (median), and 75th percentiles of
the data
percentiles = np.percentile(daily_steps, [25, 50, 75])
print("25th Percentile:", percentiles[0])
print("Median (50th Percentile):", percentiles[1])
print("75th Percentile:", percentiles[2])
# Find how many participants averaged more than 7500 steps
daily
participants_above_7500 = np.sum(daily_steps > 7500)
print("Participants averaging more than 7500 steps daily:",
participants_above_7500)
Output:
1. Sorted Steps (descending order): [9126 9034 8741 8560
8102 7829 7645 7456 7324 6891 6532 6213 6087 5972 5403]
2. Mean Steps: 7323
Standard Deviation of Steps: 1017
3. 25th Percentile: 6213.0
Median (50th Percentile): 7324.0
75th Percentile: 7829.0
4. Participants averaging more than 7500 steps daily: 8
Question 3
a. P(Small and Service) = 10/170
= 1/17 ≈ 0.059
b. P(Small and Medium) = 48/170 ≈ 0.282
c. P(Small or Service or Both) = 36 + 24 - 10/170
= 50/170 ≈ 0.294
d. P(Retail | Medium) = 13/48 ≈ 0.271
e. P(Small and Retail) = 14/170 ≈ 0.082
Question 4
Question 5
1.
2.