Machine Exercise : 1
What is Data Science?
Data science is an interdisciplinary field that involves extracting insights and knowledge
from large volumes of data. It combines statistical methods, machine learning algorithms,
and domain expertise to solve complex problems. Data scientists play a crucial role in
transforming raw data into actionable information that drives decision-making.
Impact of Data Science
Data science has revolutionized industries across the globe. Its applications range from
healthcare and finance to marketing and e-commerce. Some key impacts include:
- Improved decision-making through data-driven insights
- Enhanced customer experience through personalized recommendations
- Fraud detection and prevention
- Optimization of processes and resource allocation
- Advancements in scientific research and discovery
Essential Python Libraries for Data Science
Python has emerged as the preferred language for data scientists due to its simplicity,
readability, and extensive libraries.
Objective
This exercise aims to demonstrate basic data manipulation techniques using Python's
Pandas library.
Dataset Overview
The Raw Housing Prices dataset provides detailed information about housing sales. This
dataset is useful for understanding pricing trends, property characteristics, and market
behaviors.
Data Description
- Date House was Sold: The date when the house was sold.
- Sale Price: The price at which the house was sold.
- Zipcode: The area code of the property location.
- Bedrooms: The number of bedrooms in the house.
- Bathrooms: The number of bathrooms in the house.
- Living Area (sqft): The living space size in square feet.
- Lot Area (sqft): The size of the lot in square feet.
- Floors: The number of floors in the house.
- Waterfront View: Whether the house has a view of the waterfront.
- Condition: The overall condition of the property.
Purpose
- Price Trend Analysis: Identifying pricing trends over time and across locations.
- Property Segmentation: Analyzing features that affect property prices.
- Location Insights: Understanding how location impacts housing prices.
- Market Behavior: Evaluating market behaviors to assist in real estate decision-making.
Q 1.1) Basic Data Manipulation Tasks
# Import Libraries
import pandas as pd
# Load Data from the provided CSV file
df = pd.read_csv('/content/Raw_Housing_Prices3.csv')
# Display the Data
print([Link]())
Q 1.2) Selecting Multiple Columns
# Selecting relevant columns
selected_columns = df[['Date House was Sold', 'Sale Price', 'Zipcode', 'Waterfront View']]
print(selected_columns)
Q 1.3) Displaying a Concise Summary of the DataFrame
[Link]()
Q 1.4) Generating Descriptive Statistics
[Link]()
Q 1.5) Display the Rows and Columns of the Dataset
[Link]
Q 2) Exporting Data
user_data = {'Uniroll': [2234219], 'Name': ['meet'], 'Percentage': [80]}
user_df = [Link](user_data)
user_df.to_csv('user_data.csv', index=False)
Q 3) Filtering Data
filtered_data = df[df['Sale Price'] > 500000][['Zipcode', 'Sale Price']]
print(filtered_data)
Q 3) Sorting Data
sorted_df = df.sort_values(by='Sale Price', ascending=False)
print(sorted_df[['Zipcode', 'Sale Price']])
Q 3) Grouping Data
grouped_df = [Link]('Zipcode')['Sale Price'].sum()
print(grouped_df)