0% found this document useful (0 votes)
45 views5 pages

Data Science

The document discusses organized and unorganized data, highlighting their characteristics and examples. It explains the purposes of various Python libraries such as Pandas for data manipulation, Matplotlib for visualization, NumPy for numerical computing, and BeautifulSoup for web scraping. Additionally, it includes code examples demonstrating data analysis with NumPy and probability calculations.

Uploaded by

thuto1017
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views5 pages

Data Science

The document discusses organized and unorganized data, highlighting their characteristics and examples. It explains the purposes of various Python libraries such as Pandas for data manipulation, Matplotlib for visualization, NumPy for numerical computing, and BeautifulSoup for web scraping. Additionally, it includes code examples demonstrating data analysis with NumPy and probability calculations.

Uploaded by

thuto1017
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Question 1

1.1

 Organised Data: This refers to data that is structured and arranged in a logical format,
often in tables or databases. It can be easily accessed, processed, and analyzed.

Example: Excel spreadsheets, SQL databases.

 Unorganised Data: This refers to data that is not structured or arranged systematically,
making it difficult to process or analyze directly.

Example: text files, videos, images, or raw logs.

1.2 Purpose of the following libraries:

a. Pandas is used for data manipulation and analysis. It provides data structures like
DataFrame and Series that make it easy to clean, filter, group, and analyze structured data.

b. Matplotlib is a data visualization library used to create static, animated, and interactive
plots in Python. It is commonly used for line charts, bar graphs, scatter plots, etc.

c. NumPy is used for numerical computing in Python. It provides support for multi-
dimensional arrays and matrices, along with a collection of mathematical functions to operate
on these arrays efficiently.

1.3

Purpose:

BeautifulSoup is a Python library used for web scraping. It helps in parsing HTML and XML
documents, allowing users to extract data from websites easily by navigating the DOM
(Document Object Model).
from bs4 import BeautifulSoup
import requests

# Send a request to the website


response = requests.get("https://example.com")

# Parse the HTML content


soup = BeautifulSoup(response.content, "html.parser")

# Extract and print all the paragraph texts


for paragraph in soup.find_all('p'):
print(paragraph.get_text())

requests.get() fetches the web page.

BeautifulSoup() parses the HTML content.

find_all('p') is used to locate all <p> tags on the page.

get_text() extracts the text inside each paragraph.

In this example, BeautifulSoup is used to extract all paragraph texts from a webpage.
Question 2

import numpy as np

# Create an array from the given data


daily_steps = np.array([6532, 8741, 5403, 7829, 9126, 6087,
7324, 8560, 5972, 7645, 6891, 8102, 7456, 6213, 9034])

# Sort the array in descending order


sorted_steps = np.sort(daily_steps)[::-1]
print("Sorted Steps (descending order):", sorted_steps)

# Calculate the mean and standard deviation of the daily steps


mean_steps = round(np.mean(daily_steps))
std_dev_steps = round(np.std(daily_steps))
print("Mean Steps:", mean_steps)
print("Standard Deviation of Steps:", std_dev_steps)

# Determine the 25th, 50th (median), and 75th percentiles of


the data
percentiles = np.percentile(daily_steps, [25, 50, 75])
print("25th Percentile:", percentiles[0])
print("Median (50th Percentile):", percentiles[1])
print("75th Percentile:", percentiles[2])

# Find how many participants averaged more than 7500 steps


daily
participants_above_7500 = np.sum(daily_steps > 7500)
print("Participants averaging more than 7500 steps daily:",
participants_above_7500)

Output:
1. Sorted Steps (descending order): [9126 9034 8741 8560
8102 7829 7645 7456 7324 6891 6532 6213 6087 5972 5403]

2. Mean Steps: 7323


Standard Deviation of Steps: 1017

3. 25th Percentile: 6213.0


Median (50th Percentile): 7324.0
75th Percentile: 7829.0

4. Participants averaging more than 7500 steps daily: 8


Question 3

a. P(Small and Service) = 10/170


= 1/17 ≈ 0.059

b. P(Small and Medium) = 48/170 ≈ 0.282

c. P(Small or Service or Both) = 36 + 24 - 10/170


= 50/170 ≈ 0.294

d. P(Retail | Medium) = 13/48 ≈ 0.271

e. P(Small and Retail) = 14/170 ≈ 0.082

Question 4
Question 5

1.

2.

You might also like