Unik Intern Report
Unik Intern Report
Technology
Submitted to:
Department of Computer Science and Information Technology
Himalaya College of Engineering
Chyasal, Lalitpur
Submitted by:
Unik Shyaula (T.U. Symbol No. 26565/077)
I would also like to thank Er. Mahesh Kr. Yadav for having faith in me and guiding me
while I was an intern at Global Fast Enterprises Pvt. Ltd. Being a member of such a nice
and vibrant team was a wonderful and amazing experience for me.
I am very thankful to entire at Global Fast Enterprises Solutions team for providing the
internship opportunity with full support and collaboration.
I sincerely appreciate the assistance of these people, and I do so with a great deal of pleasure
and thanks.
I also want to express my gratitude to the entire teaching team at the Department of
Computer Science and Information Technology for their ongoing support, encouragement,
and advice, all of which contributed to the smooth progress of my internship.
I am really appreciative of the department staff members and friends that assisted me in
finishing this internship successfully.
Unik Shyaula
i
Abstract
This report presents an overview of my 12-week Data Analytics internship at Global Fast
Enterprises Pvt. Ltd., where I gained hands-on experience in data preprocessing, analysis,
and visualization. Using tools such as Python (pandas, NumPy, matplotlib, seaborn), Excel,
and Power BI, I worked on real-world financial datasets to extract actionable insights. The
internship focused on key tasks like data cleaning, exploratory data analysis, linear
regression modeling, and dashboard development.
Throughout the internship, I enhanced my technical and analytical skills while learning to
communicate insights effectively within a collaborative, Agile-based work environment.
The experience helped me bridge academic learning with practical applications in the field
of business analytics. This report highlights my learning journey, major project
contributions, and the essential skills developed during this valuable professional
experience.
ii
LIST OF ABBREVIATIONS
BI Business Intelligence
iii
Table of Contents
Acknowledgement ................................................................................................................ i
Abstract ................................................................................................................................ii
LIST OF ABBREVIATIONS ........................................................................................... iii
LIST OF FIGURES ............................................................................................................ vi
LIST OF TABLES .............................................................................................................vii
Chapter 1: Introduction ........................................................................................................ 1
1.1 Introduction ................................................................................................................ 1
1.2 Problem Statement ..................................................................................................... 1
1.3 Objectives .................................................................................................................. 2
1.4 Report Organization ................................................................................................... 2
Chapter 2: Organization Details and Literature Review...................................................... 3
2.1 Introduction to Organization ...................................................................................... 3
2.1.1 Contact Information ............................................................................................ 3
2.2 Organizational Hierarchy ........................................................................................... 4
2.3 Working Domains of Organization ............................................................................ 5
2.4 Description of Intern Department/Unit ...................................................................... 5
2.5 Literature Review....................................................................................................... 6
Chapter 3: Internship Activities ........................................................................................... 7
3.1 Roles and Responsibilities ......................................................................................... 7
3.2 Weekly log ................................................................................................................. 7
3.3 Description of the Project(s) Involved During Internship ......................................... 9
3.4 Tasks / Activities Performed .................................................................................... 10
3.4.1 Data Collection and Import .............................................................................. 10
3.4.2 Data Cleaning and Preprocessing ..................................................................... 10
3.4.3 Feature Engineering .......................................................................................... 11
3.4.4 Exploratory Data Analysis (EDA) .................................................................... 12
3.4.5 Data Visualization ............................................................................................ 12
3.4.6 Outlier Detection and Removal ........................................................................ 15
3.4.7 Regression Modeling ........................................................................................ 16
3.4.8 Model Interpretation and Visualization ............................................................ 17
3.4.9 Modeling with Random Forest ......................................................................... 18
3.4.10 Power BI Dashboard Creation ........................................................................ 19
Chapter 4: Conclusion and Learning Outcomes ................................................................ 20
4.1 Conclusion ............................................................................................................... 20
4.2 Learning Outcome ................................................................................................... 20
iv
Reference ........................................................................................................................... 21
v
LIST OF FIGURES
Figure 2.1: Organizational Hierarchy……………………………………………………...4
Figure 3.1: Data Collection and Import…………………………………………………..10
Figure 3.2: Column Renaming………………….…………………………………….......11
Figure 3.3: Drop and Fill N/A…………………………………………………………….11
Figure 3.4: Feature Engineering….……………………………………………………….11
Figure 3.5: Top Product Sales…………………………………………………………….12
Figure 3.6: Total Sales Per Month…………….………………………………….……….13
Figure 3.7: Heatmap Visualization……………………………………………………….13
Figure 3.8: Pair Plot of months………………..………………………………………….14
Figure 3.9: Removing Outliers………………...………………………………………….15
Figure 3.10: Box Plot after removing Outliers……………………………………………15
Figure 3.11: Linear Regression……………………...……………………………………16
Figure 3.12: Visualization Of Linear Regression…………………………………………17
Figure 3.13: Random Forest Regressor………..………………………………………….18
Figure 3.14: Visualization Of Random Forest Regression……………………………….18
Figure 3.15: PowerBI Dashboard…………………………………………………………19
vi
LIST OF TABLES
Table 2.1: Internship Details ................................................................................................ 5
Table 3.1: Weekly log .......................................................................................................... 7
vii
Chapter 1: Introduction
1.1 Introduction
The internship provided a practical foundation in data analytics, with a strong focus on
working with real-world financial data and gaining hands-on experience in tools widely
used across the industry. It aimed to bridge the gap between theoretical knowledge and
professional application by guiding interns through end-to-end data analysis workflows.
Core tasks such as data wrangling, exploratory data analysis (EDA), visualization, and
linear regression modeling were emphasized to help participants develop problem-solving
skills relevant to business analytics.
Throughout the program, interns worked extensively with Python libraries such as pandas,
NumPy, matplotlib, and seaborn. These tools were used to clean, analyze, and visualize
monthly budget data, identify trends and outliers, and build predictive models. The
internship emphasized structured project execution—from data acquisition and
preprocessing to insight generation and report presentation—ensuring that participants not
only mastered technical concepts but also understood how to communicate their findings
effectively. This experience served as a strong introduction to real-world data analytics
projects in a corporate setting.
1
1.3 Objectives
The objectives of this project are:
Chapter 1 provides a brief description of the internship on data analyst and the project
focused on the report. It provides an overview of the Objectives, Problem Statement, Scope
and Limitation of the internship.
Chapter 2 explains about the organization, its hierarchy, the working domains, and a
description of the internship department. It also focuses on the Literature Reviews done on
the various research papers related to the internship scope and project.
Chapter 3 contains the roles and responsibilities of the internship. It also contains the
weekly log, description of the project involved and various activities performed during the
internship period.
Chapter 4 summarizes everything about the internship and the various outcomes learned
and knowledge gained from this internship.
2
Chapter 2: Organization Details and Literature Review
2.1 Introduction to Organization
Established in 2078 B.S., Global Fast Enterprises Pvt. Ltd. is an emerging IT solutions
company driven by a team of passionate and dynamic professionals. With a strong
commitment to innovation and excellence, we focus on delivering strategic and customized
IT solutions to address a wide range of business challenges.
Currently, our team consists of 15 dedicated experts who bring a collaborative spirit and
technical expertise to every project. In just 4 years, we have expanded our footprint across
various sectors of the IT industry, offering high-quality services in mobile, web, desktop,
and network solutions. Our diverse portfolio reflects our adaptability and focus on client
satisfaction.
Email: [email protected]
3
2.2 Organizational Hierarchy
Global Fast Enterprises Pvt. Ltd. operates with a well-defined hierarchical structure to
ensure efficient project execution and continuous innovation. At the top of the hierarchy is
the Chief Executive Officer (CEO), who provides strategic direction for the organization.
Supporting the CEO are key roles such as the Chief Technology Officer (CTO) and Project
Manager, who oversee technological advancement and project delivery respectively. The
organization is structured into several core departments led by specialized Team Leads in
Data, Mobile Development, and AI/ML.
A dedicated Quality Assurance (QA) team ensures that all applications meet high standards
of performance and reliability. Additionally, the company employs Human Resources and
Administrative staff to manage internal operations and employee welfare. Interns are also
integrated into the development teams across different departments, gaining practical
experience and contributing to real-world projects under the guidance of experienced
professionals.
4
2.3 Working Domains of Organization
Global Fast Enterprises Pvt. Ltd. operates in multiple technology-driven domains, including
software development (mobile and web applications), data management and analytics, and
artificial intelligence/machine learning (AI/ML) solutions. The company emphasizes
quality assurance through a dedicated QA team, efficient project management, and
continuous innovation led by its CTO. Additionally, it focuses on human resources, internal
operations, and talent development by integrating interns into its core departments for
hands-on experience in real-world projects.
5
2.5 Literature Review
Established fundamental principles for organizing datasets through his concept of "tidy
data," which became instrumental in modern data analysis workflows. These principles,
particularly relevant for structured data formats like CSV and JSON, were directly applied
during the internship's data wrangling tasks to ensure efficient processing and analysis. The
systematic approach to data structuring proposed by Wickham proved invaluable when
working with the internship's financial and sales datasets. (Wickham,2014)
Revolutionized data manipulation through the development of pandas, a Python library that
became central to the internship's analytical work. The text "Python for Data Analysis"
provided essential methodologies for handling common data challenges, particularly in
managing missing values and outliers - frequent issues encountered when processing the
budget.xlsx and sales.xlsx datasets during the internship.(McKinney,2017)
Offered practical guidance on creating effective data visualizations that informed the
internship's dashboard development. The principles outlined in this work were implemented
in both Power BI dashboards and Python-generated visualizations (using Matplotlib and
Seaborn), ensuring clear communication of key insights from the analyzed datasets. These
visualization techniques enhanced the presentation of KPIs such as regional sales
performance and budget utilization.(Healy,2018)
6
Chapter 3: Internship Activities
3.1 Roles and Responsibilities
As a Data Analytics intern at Global Fast Enterprises Solutions, my primary responsibilities
included learning organizational workflows in data-driven decision making and mastering
the end-to-end data analysis process. I gained hands-on experience with key technologies
including Python (pandas, NumPy) and Power BI to clean, analyze, and visualize data from
various business domains. Working within an Agile framework, I participated in daily
stand-ups to report progress on tasks like data preprocessing, exploratory analysis, and
dashboard development, while collaborating with team members on projects such as sales
performance tracking and budget analysis.
A core focus of the internship was developing practical data solutions from raw data to
actionable insights. I designed interactive Power BI dashboards to communicate KPIs,
performed statistical analysis to identify trends, and documented my methodologies to
ensure reproducibility. Through projects like financial expenditure analysis and regional
sales pattern identification, I applied data wrangling techniques, created visualizations, and
presented findings to stakeholders - all while adhering to best practices in data validation
and quality assurance. This experience strengthened both my technical skills in data
analytics and my ability to work effectively in a professional team environment.
7
Week 3 • Applied IQR method to detect and remove monthly
outliers.
• Created boxplots (before and after cleaning) to
visualize the effect of outlier removal.
• Finalized a cleaned dataset for further analysis.
8
• Structured data to allow slicing by Category,
Subcategory, and Month.
As the project progressed, I applied exploratory data analysis (EDA) techniques and
developed a simple linear regression model to predict total sales based on individual
monthly sales inputs. I split the data into training and testing sets, evaluated the model using
MAE and RMSE, and visualized predictions against actual values. Throughout the project,
9
I used Python tools including pandas, matplotlib, seaborn, and scikit-learn within a Jupyter
Notebook environment. The project helped me gain hands-on experience in data
preprocessing, visualization, statistical modeling, and interpretation—demonstrating how
mathematical and analytical techniques can be applied to business datasets for insight and
forecasting.
10
imputing zeros (where appropriate) or by removing rows or columns with excessive null
entries.
11
3.4.4 Exploratory Data Analysis (EDA)
Exploratory Data Analysis was a crucial phase, where the focus was on understanding
patterns and distributions within the dataset. Using descriptive statistics and data
visualization tools, the analysis uncovered trends such as seasonality in sales, high-
performing categories, and underperforming products pandas functions like .describe() and
.groupby() provided quantitative insights, while visualizations gave a clearer view of how
data was spread across different dimensions.
12
Figure 3.6:Total Sales Per Month
13
Figure 3.8: Pair Plot of months
14
3.4.6 Outlier Detection and Removal
Outliers were detected using the Interquartile Range (IQR) method. This statistical approach
involves calculating the first and third quartiles and identifying any data points that fall
significantly outside this range. The rows containing such extreme values were removed to
improve the robustness and accuracy of the analysis. This step was essential before moving
to modeling, as outliers can skew results and reduce the predictive power of regression
models.
15
3.4.7 Regression Modeling
The cleaned dataset was used to build a multiple linear regression model aimed at predicting
Total Sales using all monthly sales values (January to December) as input features. The
scikit-learn library was used to split the dataset into training and testing sets. The
LinearRegression model was then fitted on the training data. After training, predictions were
made on the test set. The model’s performance was evaluated using metrics such as Mean
Absolute Error (MAE), Root Mean Squared Error (RMSE), and R² Score, which helped
assess how accurately the model predicted the total sales.
16
3.4.8 Model Interpretation and Visualization
To interpret the model’s performance, the actual versus predicted total sales were visualized
using a line plot. This comparison helped in identifying how closely the predictions matched
the actual values across test samples. Additionally, the model’s regression coefficients and
intercept were analyzed to understand the contribution of each month’s sales towards the
overall annual total. This analysis revealed the relative importance of different months in
influencing total product sales over the year.
17
3.4.9 Modeling with Random Forest
To improve accuracy, a Random Forest Regressor was implemented using all monthly sales
as input features. The model achieved higher accuracy and generalization compared to the
linear model. The Random Forest model achieved an R² score of 0.89, confirming that total
sales could be predicted effectively using monthly data. Feature importance also showed
that certain months had a stronger influence on total sales than others.
18
In this analysis, Linear Regression proved to be the better-performing model. It
demonstrated significantly lower prediction error, as reflected in both the Mean Absolute
Error (MAE) and Root Mean Squared Error (RMSE), compared to the Random Forest
model. Additionally, Linear Regression achieved a very high R² score of 0.9998, indicating
that it was able to explain 99.98% of the variance in total sales—almost a perfect fit. On the
other hand, although Random Forest Regression is a more complex and powerful model in
general, it underperformed in this case. This could be due to the fact that the relationship
between monthly sales and total sales in the dataset is highly linear, which aligns well with
the assumptions of Linear Regression. Moreover, Random Forest may have struggled to
generalize effectively, possibly due to the limited sample size or the structure of the data,
leading to overfitting or suboptimal predictions.
19
Chapter 4: Conclusion and Learning Outcomes
4.1 Conclusion
During this internship, I gained hands-on experience in real-world data analysis, modeling,
and visualization using Python. By working with a raw and unstructured budget dataset, I
learned how to handle typical data quality issues such as missing values, incorrect data
types, and ambiguous column names. I successfully performed data cleaning and
transformation, followed by exploratory data analysis (EDA) to extract meaningful patterns,
trends, and insights from the data. Visualizing the dataset using libraries like matplotlib and
seaborn helped in better understanding sales behavior across months, products, and
categories.
⚫ Gained hands-on experience in data cleaning and preprocessing using Python (pandas).
⚫ Learned to handle missing values, incorrect data types, and outliers effectively.
20
Reference
McKinney, W. (2017). Python for data analysis (2nd ed.). O'Reilly Media.
https://wesmckinney.com/book/
21