0% found this document useful (0 votes)
41 views29 pages

Unik Intern Report

This report details a 12-week Data Analytics internship at Global Fast Enterprises Pvt. Ltd., where the intern, Unik Shyaula, gained practical experience in data preprocessing, analysis, and visualization using tools like Python and Power BI. Key tasks included data cleaning, exploratory data analysis, and dashboard development, which enhanced technical skills and bridged academic learning with real-world applications. The report outlines the internship's objectives, activities, and the organizational context, highlighting the intern's contributions and learning outcomes.

Uploaded by

unikshyaula7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views29 pages

Unik Intern Report

This report details a 12-week Data Analytics internship at Global Fast Enterprises Pvt. Ltd., where the intern, Unik Shyaula, gained practical experience in data preprocessing, analysis, and visualization using tools like Python and Power BI. Key tasks included data cleaning, exploratory data analysis, and dashboard development, which enhanced technical skills and bridged academic learning with real-world applications. The report outlines the internship's objectives, activities, and the organizational context, highlighting the intern's contributions and learning outcomes.

Uploaded by

unikshyaula7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Tribhuvan University Institute of Science and

Technology

A Final Year Internship Report


On
“Data Analyst”
At
Global Fast Enterprises Pvt. Ltd.

Submitted to:
Department of Computer Science and Information Technology
Himalaya College of Engineering
Chyasal, Lalitpur

In Partial fulfillment of the requirements


for the Bachelor’s Degree in Computer Science and Information Technology

Submitted by:
Unik Shyaula (T.U. Symbol No. 26565/077)

Date: 28th June, 2025


Acknowledgement
I want to express my sincere gratitude to the supervisor and Head of Department of CSIT,
Er. Himal Chand Thapa for his guidance, advice, help, support and supervision
throughout the internship period. As the internship progressed, it was thanks to his inputs
and expert comments that I was able to respond to the increasing hurdles.

I would also like to thank Er. Mahesh Kr. Yadav for having faith in me and guiding me
while I was an intern at Global Fast Enterprises Pvt. Ltd. Being a member of such a nice
and vibrant team was a wonderful and amazing experience for me.

I am very thankful to entire at Global Fast Enterprises Solutions team for providing the
internship opportunity with full support and collaboration.

I sincerely appreciate the assistance of these people, and I do so with a great deal of pleasure
and thanks.

I also want to express my gratitude to the entire teaching team at the Department of
Computer Science and Information Technology for their ongoing support, encouragement,
and advice, all of which contributed to the smooth progress of my internship.

I am really appreciative of the department staff members and friends that assisted me in
finishing this internship successfully.

Unik Shyaula

T.U Exam Roll No: 26565/077

i
Abstract
This report presents an overview of my 12-week Data Analytics internship at Global Fast
Enterprises Pvt. Ltd., where I gained hands-on experience in data preprocessing, analysis,
and visualization. Using tools such as Python (pandas, NumPy, matplotlib, seaborn), Excel,
and Power BI, I worked on real-world financial datasets to extract actionable insights. The
internship focused on key tasks like data cleaning, exploratory data analysis, linear
regression modeling, and dashboard development.

Throughout the internship, I enhanced my technical and analytical skills while learning to
communicate insights effectively within a collaborative, Agile-based work environment.
The experience helped me bridge academic learning with practical applications in the field
of business analytics. This report highlights my learning journey, major project
contributions, and the essential skills developed during this valuable professional
experience.

Keywords: Data Analytics, Python Programming, Power BI, Data Visualization,


Statistical Analysis, Machine Learning, Data Cleaning, Exploratory Data Analysis (EDA)

ii
LIST OF ABBREVIATIONS

BI Business Intelligence

CSV Comma Seperated Values

DAX Data Analysis Expression

EDA Exploratory Data Analysis

JSON JavaScript Object Notation

KPI Key Perfromance Indicator

iii
Table of Contents
Acknowledgement ................................................................................................................ i
Abstract ................................................................................................................................ii
LIST OF ABBREVIATIONS ........................................................................................... iii
LIST OF FIGURES ............................................................................................................ vi
LIST OF TABLES .............................................................................................................vii
Chapter 1: Introduction ........................................................................................................ 1
1.1 Introduction ................................................................................................................ 1
1.2 Problem Statement ..................................................................................................... 1
1.3 Objectives .................................................................................................................. 2
1.4 Report Organization ................................................................................................... 2
Chapter 2: Organization Details and Literature Review...................................................... 3
2.1 Introduction to Organization ...................................................................................... 3
2.1.1 Contact Information ............................................................................................ 3
2.2 Organizational Hierarchy ........................................................................................... 4
2.3 Working Domains of Organization ............................................................................ 5
2.4 Description of Intern Department/Unit ...................................................................... 5
2.5 Literature Review....................................................................................................... 6
Chapter 3: Internship Activities ........................................................................................... 7
3.1 Roles and Responsibilities ......................................................................................... 7
3.2 Weekly log ................................................................................................................. 7
3.3 Description of the Project(s) Involved During Internship ......................................... 9
3.4 Tasks / Activities Performed .................................................................................... 10
3.4.1 Data Collection and Import .............................................................................. 10
3.4.2 Data Cleaning and Preprocessing ..................................................................... 10
3.4.3 Feature Engineering .......................................................................................... 11
3.4.4 Exploratory Data Analysis (EDA) .................................................................... 12
3.4.5 Data Visualization ............................................................................................ 12
3.4.6 Outlier Detection and Removal ........................................................................ 15
3.4.7 Regression Modeling ........................................................................................ 16
3.4.8 Model Interpretation and Visualization ............................................................ 17
3.4.9 Modeling with Random Forest ......................................................................... 18
3.4.10 Power BI Dashboard Creation ........................................................................ 19
Chapter 4: Conclusion and Learning Outcomes ................................................................ 20
4.1 Conclusion ............................................................................................................... 20
4.2 Learning Outcome ................................................................................................... 20
iv
Reference ........................................................................................................................... 21

v
LIST OF FIGURES
Figure 2.1: Organizational Hierarchy……………………………………………………...4
Figure 3.1: Data Collection and Import…………………………………………………..10
Figure 3.2: Column Renaming………………….…………………………………….......11
Figure 3.3: Drop and Fill N/A…………………………………………………………….11
Figure 3.4: Feature Engineering….……………………………………………………….11
Figure 3.5: Top Product Sales…………………………………………………………….12
Figure 3.6: Total Sales Per Month…………….………………………………….……….13
Figure 3.7: Heatmap Visualization……………………………………………………….13
Figure 3.8: Pair Plot of months………………..………………………………………….14
Figure 3.9: Removing Outliers………………...………………………………………….15
Figure 3.10: Box Plot after removing Outliers……………………………………………15
Figure 3.11: Linear Regression……………………...……………………………………16
Figure 3.12: Visualization Of Linear Regression…………………………………………17
Figure 3.13: Random Forest Regressor………..………………………………………….18
Figure 3.14: Visualization Of Random Forest Regression……………………………….18
Figure 3.15: PowerBI Dashboard…………………………………………………………19

vi
LIST OF TABLES
Table 2.1: Internship Details ................................................................................................ 5
Table 3.1: Weekly log .......................................................................................................... 7

vii
Chapter 1: Introduction
1.1 Introduction
The internship provided a practical foundation in data analytics, with a strong focus on
working with real-world financial data and gaining hands-on experience in tools widely
used across the industry. It aimed to bridge the gap between theoretical knowledge and
professional application by guiding interns through end-to-end data analysis workflows.
Core tasks such as data wrangling, exploratory data analysis (EDA), visualization, and
linear regression modeling were emphasized to help participants develop problem-solving
skills relevant to business analytics.

Throughout the program, interns worked extensively with Python libraries such as pandas,
NumPy, matplotlib, and seaborn. These tools were used to clean, analyze, and visualize
monthly budget data, identify trends and outliers, and build predictive models. The
internship emphasized structured project execution—from data acquisition and
preprocessing to insight generation and report presentation—ensuring that participants not
only mastered technical concepts but also understood how to communicate their findings
effectively. This experience served as a strong introduction to real-world data analytics
projects in a corporate setting.

1.2 Problem Statement


The internship equipped participants with strategies to integrate these disparate data sources
through standardization and automation techniques. Interns learned how to use Python
scripts to merge datasets from different systems while maintaining data integrity.
Additionally, they practiced implementing validation checks and error-handling procedures
to ensure the reliability of processed data before analysis. These skills are crucial for
organizations dealing with complex, multi-source data environments where accuracy and
consistency directly impact decision-making.

1
1.3 Objectives
The objectives of this project are:

1. To clean and prepare budget data using Python and pandas.

2. To explore data patterns through visualizations and EDA.

3. To build and evaluate a linear regression model.

4. To analyze correlations and identify outliers in sales data.

1.4 Report Organization


The report is organized as follows:

Chapter 1 provides a brief description of the internship on data analyst and the project
focused on the report. It provides an overview of the Objectives, Problem Statement, Scope
and Limitation of the internship.

Chapter 2 explains about the organization, its hierarchy, the working domains, and a
description of the internship department. It also focuses on the Literature Reviews done on
the various research papers related to the internship scope and project.

Chapter 3 contains the roles and responsibilities of the internship. It also contains the
weekly log, description of the project involved and various activities performed during the
internship period.

Chapter 4 summarizes everything about the internship and the various outcomes learned
and knowledge gained from this internship.

2
Chapter 2: Organization Details and Literature Review
2.1 Introduction to Organization
Established in 2078 B.S., Global Fast Enterprises Pvt. Ltd. is an emerging IT solutions
company driven by a team of passionate and dynamic professionals. With a strong
commitment to innovation and excellence, we focus on delivering strategic and customized
IT solutions to address a wide range of business challenges.

Currently, our team consists of 15 dedicated experts who bring a collaborative spirit and
technical expertise to every project. In just 4 years, we have expanded our footprint across
various sectors of the IT industry, offering high-quality services in mobile, web, desktop,
and network solutions. Our diverse portfolio reflects our adaptability and focus on client
satisfaction.

At Global Fast Enterprises, we specialize in Mobile Development, Software Solutions, IT


Consulting, Digital Transformation, and System Integration—empowering businesses to
stay ahead in the fast-paced digital world.

2.1.1 Contact Information

Organization: Global Fast Enterprises Pvt.Ltd.

Organization Type: Private Limited

Address: Madhyapur Thimi, Lokanthali, Bhaktapur

Telephone Number: +977-9768748616

Email: [email protected]

3
2.2 Organizational Hierarchy

Global Fast Enterprises Pvt. Ltd. operates with a well-defined hierarchical structure to
ensure efficient project execution and continuous innovation. At the top of the hierarchy is
the Chief Executive Officer (CEO), who provides strategic direction for the organization.
Supporting the CEO are key roles such as the Chief Technology Officer (CTO) and Project
Manager, who oversee technological advancement and project delivery respectively. The
organization is structured into several core departments led by specialized Team Leads in
Data, Mobile Development, and AI/ML.

A dedicated Quality Assurance (QA) team ensures that all applications meet high standards
of performance and reliability. Additionally, the company employs Human Resources and
Administrative staff to manage internal operations and employee welfare. Interns are also
integrated into the development teams across different departments, gaining practical
experience and contributing to real-world projects under the guidance of experienced
professionals.

Figure 2.1: Organizational Hierarchy

4
2.3 Working Domains of Organization

Global Fast Enterprises Pvt. Ltd. operates in multiple technology-driven domains, including
software development (mobile and web applications), data management and analytics, and
artificial intelligence/machine learning (AI/ML) solutions. The company emphasizes
quality assurance through a dedicated QA team, efficient project management, and
continuous innovation led by its CTO. Additionally, it focuses on human resources, internal
operations, and talent development by integrating interns into its core departments for
hands-on experience in real-world projects.

2.4 Description of Intern Department/Unit


I interned in the Data Analytics at Global Fast Enterprises Solutions, a unit focused on
transforming raw data into business insights. My role involved data cleaning, analysis, and
dashboard development using Python and Power BI. The team followed agile workflows,
allowing me to contribute to real projects like financial and sales analytics while
collaborating with cross-functional teams. This experience provided hands-on exposure to
the end-to-end data pipeline and its impact on organizational decision-making.

Table 2.1: Internship Details


Position Data Analytics Intern
Project Supervisor Er. Himal Chand Thapa
Mentor Er. Mahesh Kr. Yadav
Start Date 17th March, 2025
End Date 16th June, 2025
Working Hours 10 am to 6pm
Working Days Monday to Friday
Internship Duration 3 months

5
2.5 Literature Review
Established fundamental principles for organizing datasets through his concept of "tidy
data," which became instrumental in modern data analysis workflows. These principles,
particularly relevant for structured data formats like CSV and JSON, were directly applied
during the internship's data wrangling tasks to ensure efficient processing and analysis. The
systematic approach to data structuring proposed by Wickham proved invaluable when
working with the internship's financial and sales datasets. (Wickham,2014)

Revolutionized data manipulation through the development of pandas, a Python library that
became central to the internship's analytical work. The text "Python for Data Analysis"
provided essential methodologies for handling common data challenges, particularly in
managing missing values and outliers - frequent issues encountered when processing the
budget.xlsx and sales.xlsx datasets during the internship.(McKinney,2017)

Offered practical guidance on creating effective data visualizations that informed the
internship's dashboard development. The principles outlined in this work were implemented
in both Power BI dashboards and Python-generated visualizations (using Matplotlib and
Seaborn), ensuring clear communication of key insights from the analyzed datasets. These
visualization techniques enhanced the presentation of KPIs such as regional sales
performance and budget utilization.(Healy,2018)

6
Chapter 3: Internship Activities
3.1 Roles and Responsibilities
As a Data Analytics intern at Global Fast Enterprises Solutions, my primary responsibilities
included learning organizational workflows in data-driven decision making and mastering
the end-to-end data analysis process. I gained hands-on experience with key technologies
including Python (pandas, NumPy) and Power BI to clean, analyze, and visualize data from
various business domains. Working within an Agile framework, I participated in daily
stand-ups to report progress on tasks like data preprocessing, exploratory analysis, and
dashboard development, while collaborating with team members on projects such as sales
performance tracking and budget analysis.

A core focus of the internship was developing practical data solutions from raw data to
actionable insights. I designed interactive Power BI dashboards to communicate KPIs,
performed statistical analysis to identify trends, and documented my methodologies to
ensure reproducibility. Through projects like financial expenditure analysis and regional
sales pattern identification, I applied data wrangling techniques, created visualizations, and
presented findings to stakeholders - all while adhering to best practices in data validation
and quality assurance. This experience strengthened both my technical skills in data
analytics and my ability to work effectively in a professional team environment.

3.2 Weekly log


Table 3.1: Weekly log
Week 1 • Attended internship orientation and understood
company objectives.
• Explored the provided sales dataset in Excel format.
• Loaded the dataset into Python using pandas.
• Identified data structure, key columns and total sale

Week 2 • Renamed all column headers for uniformity and


readability.
• Converted data types appropriately: strings for
categorical data, numeric for monthly values.
• Removed duplicate entries and dropped irrelevant or
fully empty columns.
• Handled missing data by filling in zeros for sales and
dropping rows with missing key identifiers.

7
Week 3 • Applied IQR method to detect and remove monthly
outliers.
• Created boxplots (before and after cleaning) to
visualize the effect of outlier removal.
• Finalized a cleaned dataset for further analysis.

Week 4 • Created a line chart showing monthly sales trend


(Jan–Dec 2016).
• Aggregated sales across all products to analyze total
monthly performance.
• Developed bar plots showing total sales by category
and top 5 products.

Week 5 • Built a correlation matrix to analyze relationships


between months.
• Visualized it using a heatmap and identified strongest
and weakest correlated month pairs.
• Created pair plots to examine relationships in
selected months.
• Gained insights into seasonal trends and product
performance variability.

Week 6 • Engineered a new feature: Total Sales (sum of


monthly sales Jan–Dec).
• Added this as a new column to the cleaned dataset.
• This served as the target variable for modeling and
also supported summary analysis.

Week 7 • Built a basic Linear Regression model to predict


Total Sales using limited monthly data.
• Evaluated performance.
• Identified that a simple model with few features had
low predictive power due to insufficient training data.

Week 8 • Used all 12 months (Jan–Dec 2016) as input features


to predict Total Sales.
• Trained a Random Forest Regressor, which
significantly improved accuracy:
• Visualized actual vs predicted total sales to evaluate
the model's predictive performance.

Week 9 • Filtered and exported cleaned data for dashboarding


in Power BI.
• Defined key metrics and charts to include in the
dashboard:

8
• Structured data to allow slicing by Category,
Subcategory, and Month.

Week 10 • Developed visuals in Power BI:


o Line chart: Monthly Sales Trend
o Bar charts: Sales by Category/Subcategory
o Cards: Highest, Lowest, Average Sales
• Implemented filters and slicers for interactivity
(Category, Month, etc.)

Week 11 • Removed Grand Total from visuals to ensure clean


chart outputs.
• Enhanced visuals by adjusting axis titles, fonts,
labels, and layout.
• Tested filter combinations (multi-select) and
interactions between visuals.

Week 12 • Compiled a final internship report summarizing:


• Data cleaning steps
• Visual analysis in Python and Power BI
• Predictive modeling (Random Forest)
• Key insights and business recommendations
• Presented dashboard and findings to
mentor/supervisor.

3.3 Description of the Project(s) Involved During Internship


During my internship, I worked on a project titled “Budget Data Analysis and Forecasting
Using Python” under the Mathematical Modeling department. The objective was to clean,
explore, and analyze a real-world budget dataset containing monthly sales figures for
various product categories. I began by importing the data from an Excel file and conducted
data wrangling tasks such as renaming columns, converting data types, and handling
missing values. I created a new Total Sales column by aggregating monthly sales and used
summary statistics and visualizations to understand the distribution and relationships within
the data. Outliers were detected and removed using statistical techniques like the
Interquartile Range (IQR) method.

As the project progressed, I applied exploratory data analysis (EDA) techniques and
developed a simple linear regression model to predict total sales based on individual
monthly sales inputs. I split the data into training and testing sets, evaluated the model using
MAE and RMSE, and visualized predictions against actual values. Throughout the project,

9
I used Python tools including pandas, matplotlib, seaborn, and scikit-learn within a Jupyter
Notebook environment. The project helped me gain hands-on experience in data
preprocessing, visualization, statistical modeling, and interpretation—demonstrating how
mathematical and analytical techniques can be applied to business datasets for insight and
forecasting.

3.4 Tasks / Activities Performed

3.4.1 Data Collection and Import


The first step in the project involved collecting and importing the dataset into the Python
environment. The dataset was provided in Excel format and contained monthly sales data
for various product categories. Using the pandas library, the Excel file was read into a
DataFrame, allowing for structured access and manipulation of the data. During the import
process, the data was explored to understand its initial structure, which revealed that several
columns were unnamed or not formatted properly.

Figure 3.1: Data Collection and Import

3.4.2 Data Cleaning and Preprocessing


Once the dataset was imported, a comprehensive data cleaning process was conducted. This
included renaming ambiguous or default column headers such as “Unnamed: 1” to
meaningful names like "Category" or month labels such as "Jan, 2016". Data types were
converted appropriately—for example, sales data was converted from object or string types
to numeric types for accurate calculations. Missing values were addressed either by

10
imputing zeros (where appropriate) or by removing rows or columns with excessive null
entries.

Figure 3.2: Column Renaming

Figure 3.3: Drop and Fill N/A

3.4.3 Feature Engineering


After cleaning, the dataset was enhanced by introducing new features. One important feature
was the “Total Sales” column, which was created by summing the monthly sales values
across each row. This new feature allowed for more effective grouping, sorting, and
modeling later in the analysis. Additional lists, such as month_columns, were also defined
in the code to manage operations involving multiple months simultaneously and streamline
the analysis process.

Figure 3.4: Feature Engineering

11
3.4.4 Exploratory Data Analysis (EDA)

Exploratory Data Analysis was a crucial phase, where the focus was on understanding
patterns and distributions within the dataset. Using descriptive statistics and data
visualization tools, the analysis uncovered trends such as seasonality in sales, high-
performing categories, and underperforming products pandas functions like .describe() and
.groupby() provided quantitative insights, while visualizations gave a clearer view of how
data was spread across different dimensions.

3.4.5 Data Visualization


A wide range of data visualization techniques were employed using matplotlib and seaborn.
Bar plots were created to compare total sales across products and categories. Boxplots and
violin plots were used to identify the presence of outliers in monthly sales. Correlation
heatmaps were used to assess the relationships between different months, while pair plots
helped to explore potential multivariate relationships..

Figure 3.5: Top Product Sales

12
Figure 3.6:Total Sales Per Month

Figure 3.7: Heatmap Visualization

13
Figure 3.8: Pair Plot of months

14
3.4.6 Outlier Detection and Removal
Outliers were detected using the Interquartile Range (IQR) method. This statistical approach
involves calculating the first and third quartiles and identifying any data points that fall
significantly outside this range. The rows containing such extreme values were removed to
improve the robustness and accuracy of the analysis. This step was essential before moving
to modeling, as outliers can skew results and reduce the predictive power of regression
models.

Figure 3.9: Removing Outliers

Figure 3.10: Box Plot after removing Outliers

15
3.4.7 Regression Modeling
The cleaned dataset was used to build a multiple linear regression model aimed at predicting
Total Sales using all monthly sales values (January to December) as input features. The
scikit-learn library was used to split the dataset into training and testing sets. The
LinearRegression model was then fitted on the training data. After training, predictions were
made on the test set. The model’s performance was evaluated using metrics such as Mean
Absolute Error (MAE), Root Mean Squared Error (RMSE), and R² Score, which helped
assess how accurately the model predicted the total sales.

Figure 3.11 : Linear Regression

16
3.4.8 Model Interpretation and Visualization

To interpret the model’s performance, the actual versus predicted total sales were visualized
using a line plot. This comparison helped in identifying how closely the predictions matched
the actual values across test samples. Additionally, the model’s regression coefficients and
intercept were analyzed to understand the contribution of each month’s sales towards the
overall annual total. This analysis revealed the relative importance of different months in
influencing total product sales over the year.

Figure 3.12: Visualization Of Linear Regression

17
3.4.9 Modeling with Random Forest

To improve accuracy, a Random Forest Regressor was implemented using all monthly sales
as input features. The model achieved higher accuracy and generalization compared to the
linear model. The Random Forest model achieved an R² score of 0.89, confirming that total
sales could be predicted effectively using monthly data. Feature importance also showed
that certain months had a stronger influence on total sales than others.

Figure 3.13: Random Forest Regressor

Figure 3.14: Visualization Of Random Forest Regression

18
In this analysis, Linear Regression proved to be the better-performing model. It
demonstrated significantly lower prediction error, as reflected in both the Mean Absolute
Error (MAE) and Root Mean Squared Error (RMSE), compared to the Random Forest
model. Additionally, Linear Regression achieved a very high R² score of 0.9998, indicating
that it was able to explain 99.98% of the variance in total sales—almost a perfect fit. On the
other hand, although Random Forest Regression is a more complex and powerful model in
general, it underperformed in this case. This could be due to the fact that the relationship
between monthly sales and total sales in the dataset is highly linear, which aligns well with
the assumptions of Linear Regression. Moreover, Random Forest may have struggled to
generalize effectively, possibly due to the limited sample size or the structure of the data,
leading to overfitting or suboptimal predictions.

3.4.10 Power BI Dashboard Creation


The cleaned dataset was imported into Power BI, where interactive dashboards were
developed to effectively visualize key aspects of the sales data. These included monthly
sales trends, category and subcategory performance, and key performance indicators (KPIs)
using card visuals. To enhance user experience and data exploration, slicers and filters were
added, allowing users to interact with the visuals dynamically. This made it easier to drill
down into specific time periods, categories, or products.

Figure 3.15: PowerBI Dashboard

19
Chapter 4: Conclusion and Learning Outcomes
4.1 Conclusion
During this internship, I gained hands-on experience in real-world data analysis, modeling,
and visualization using Python. By working with a raw and unstructured budget dataset, I
learned how to handle typical data quality issues such as missing values, incorrect data
types, and ambiguous column names. I successfully performed data cleaning and
transformation, followed by exploratory data analysis (EDA) to extract meaningful patterns,
trends, and insights from the data. Visualizing the dataset using libraries like matplotlib and
seaborn helped in better understanding sales behavior across months, products, and
categories.

In addition to data analysis, I also developed a basic understanding of statistical modeling


through the implementation of a simple linear regression model. This process taught me
how to split datasets, train models, make predictions, and evaluate model performance using
MAE and RMSE metrics. I also learned how to interpret model results and visually compare
predicted outcomes with actual data. Overall, this internship strengthened my skills in
Python programming, data manipulation with pandas, and data-driven decision-making
through mathematical modeling. It gave me valuable exposure to how mathematical and
computational techniques can be applied in business and analytics scenarios.

4.2 Learning Outcome


Various learning outcomes were attained over the internship period. Some of them are listed
below:

⚫ Gained hands-on experience in data cleaning and preprocessing using Python (pandas).

⚫ Learned to handle missing values, incorrect data types, and outliers effectively.

⚫ Developed skills in exploratory data analysis (EDA) and feature engineering.

⚫ Created various data visualizations using matplotlib and seaborn.

⚫ Understood and implemented simple linear regression models using scikit-learn.

⚫ Improved ability to interpret statistical results and visualize model predictions.

20
Reference

Healy, K. (2018). Data visualization: A practical introduction. Princeton University Press.


https://socviz.co/

McKinney, W. (2017). Python for data analysis (2nd ed.). O'Reilly Media.
https://wesmckinney.com/book/

Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.


https://doi.org/10.18637/jss.v059.i10

21

You might also like