Pawan
Pawan
CONTENTS
● Declaration
● Acknowledgement
● Internship Certificate
● Abstract
● Introduction
● Task Descriptions
● Learning Outcomes
● Key Contributions
● Conclusion
3
DECLARATION
This internship was carried out as part of my academic curriculum under the esteemed guidance
of Ms. Prerna, and it provided me with hands-on experience in the domain of Structured Query
Language (SQL). The report outlines the various tasks I completed, which included solving
real-world SQL problems, writing optimized queries, working with relational database schemas,
and understanding the principles of data retrieval and manipulation.
Throughout the internship, I worked on a series of structured tasks and a capstone project that
enabled me to apply theoretical knowledge in practical scenarios, enhancing both my technical
skills and logical reasoning capabilities. This experience also strengthened my foundation in
database concepts and query performance optimization.
Date: ________________
Place: ____________________
4
Acknowledgement
I would like to express my heartfelt gratitude to Celebal Technologies for providing me with the
invaluable opportunity to complete my internship in the field of SQL and Database Management.
This internship has played a significant role in enhancing my technical capabilities and
understanding of real-world database systems.
I am especially thankful to my mentor, Ms. Prerna, for her consistent support, encouragement,
and expert guidance throughout the internship. Her mentorship helped me deepen my knowledge
of SQL, write more efficient queries, and develop a structured approach to solving
database-related problems.
I would also like to extend my sincere thanks to my faculty members and the Amity Institute of
Information Technology (AIIT), Amity University Rajasthan, for their unwavering support and
for facilitating this enriching internship opportunity. Their academic guidance laid the strong
foundation upon which I could build my technical experience.
This internship has significantly contributed to my growth as a student and as an aspiring data
professional. It has not only strengthened my skills in SQL, relational databases, and data analysis
but also helped me develop discipline, independence, and a professional work ethic.
I am truly grateful for the lessons learned and the confidence gained during this journey, which I
believe will serve as a stepping stone in my future career.
Abstract
Throughout the internship period, I focused on solving SQL problems involving data retrieval,
filtering, aggregation, subqueries, joins, and performance optimization. I worked on a structured
series of tasks, each designed to enhance specific technical skills such as writing efficient queries,
understanding relational database schemas, and applying logical problem-solving techniques.
These tasks simulated real business scenarios and helped me understand how data is handled and
processed in professional environments.
In addition to the individual tasks, I also developed a capstone SQL project that consolidated the
skills I acquired. This project involved designing a clean database schema, writing complex
queries, and hosting the entire solution on GitHub for collaborative and academic review.
The tools and technologies I used included MySQL, PostgreSQL, and GitHub, along with basic
command-line and IDE support. The experience significantly improved my fluency in SQL
syntax, logical reasoning, and understanding of how efficient querying plays a crucial role in
data-centric industries.
This internship was not only a technical learning experience but also a step forward in developing
professionalism, self-discipline, and accountability. It gave me valuable insight into the
workflows, expectations, and best practices followed in the industry and further fueled my
interest in data and backend development.
8
Introduction
In the digital age, data has become one of the most valuable assets, and the ability to efficiently
store, retrieve, and analyze this data is critical for any organization. At the core of this process lies
Structured Query Language (SQL) — the industry-standard language for interacting with
relational databases. Mastery of SQL is essential for developers, analysts, and data professionals
across various domains.
The primary goal of this internship was to bridge the gap between academic knowledge and
practical industry requirements by working on real-time SQL challenges and tasks. I was given
the chance to explore SQL operations in depth — including SELECT, JOIN, GROUP BY, nested
subqueries, data filtering, and performance tuning — while understanding how these apply in the
context of professional data workflows.
In addition to solving structured SQL problems, I also designed and implemented a mini-project
that allowed me to integrate everything I learned into a single, cohesive application. This
experience helped me improve my logical thinking, code structuring, and query optimization
skills.
This report details my learning journey during the internship, the tasks I performed, tools I used,
and the skills I developed. It also includes a description of the project I completed, which served
as a capstone to my experience and growth in the SQL domain.
9
● Data Engineering
With a strong focus on innovation, Celebal has built several AI-powered products and platforms
that help businesses leverage their data for smarter decisions. The company is also a Microsoft
Gold Partner, demonstrating its excellence in delivering Microsoft-based enterprise solutions.
My responsibilities were structured around learning and applying the core principles of
Structured Query Language (SQL) through hands-on tasks, research, and independent
problem-solving. I was expected to complete technical assignments, develop clean and efficient
query logic, and follow best practices in database handling and data retrieval.
TASK DESCRIPTION
This task focused on improving proficiency in SQL for business intelligence, including filtering,
joining tables, performing aggregations, applying grouping, and handling schema-specific
constraints.
Overview:
This task consisted of solving 42 SQL questions based on the AdventureWorksDW2014
database — a Microsoft-provided sample data warehouse used for analytics and reporting
practice. The queries addressed real-world business scenarios, such as customer analysis, sales
trends, product performance, and regional revenue distribution.
● The project used the AdventureWorksDW2014 schema, which differs from the
transactional (OLTP) version.
● Certain columns available in the OLTP version (e.g., FaxNumber in DimCustomer) were
not present in the DW schema. In such cases, assumptions were made, or alternate fields
were used.
● The queries were designed to work specifically within the data warehouse context,
focusing on dimensions and fact tables.
How to Perform:
1. Set up the SQL Server environment and restore the AdventureWorksDW2014
database.
2. Explore the schema, understand key tables such as FactResellerSales, DimCustomer,
DimProduct, DimGeography, and others.
Tools Used:
Outcome:
Successfully wrote and documented 42 queries that cover various business analytics scenarios.
This task improved my skills in querying large schemas, handling real-world data warehouse
design, and using SQL as a data reporting tool.
14
Objective:
The objective of this task was to solve intermediate-level SQL problems focused on real-world
analytics using the AdventureWorksDW2014 data warehouse. This task aimed to strengthen my
ability to perform business reporting and draw insights using SQL functions like aggregation,
filtering, joins, and grouping.
Overview:
As part of this task, I wrote 9 SQL queries to address various data reporting and analytical
questions, such as identifying top customers, unsold products, customer demographics, and
revenue-based ranking. These queries simulated practical use cases that a data analyst might
encounter in a business intelligence environment.
Questions Covered:
1. List the names and emails of all customers who purchased more than 5 products in a
single order.
2. List the names and emails of customers who have placed more than 3 orders.
3. List customers who have placed orders in the year 2010.
4. List the top 5 customers with the highest total purchase amount.
8. List countries and the number of customers from each country.
How to Perform:
5. Document the queries with comments for clarity and version control via GitHub.
Tools Used:
Outcome:
Completed 9 analytical SQL queries that solved reporting-based use cases. This task significantly
improved my understanding of intermediate SQL logic, data aggregation, and practical business
analysis using structured queries.
16
Overview:
Each SQL problem was crafted to represent a unique challenge frequently encountered in
real-world enterprise databases. From salary analysis and binary trees to user creation and cost
distribution, the diversity of queries tested my ability to think analytically and code efficiently
without relying on basic techniques like ORDER BY or joins in some cases.
1. Departmental Average Salary Analysis – Find departments with above-average salaries.
2. Top Earners by Department – Retrieve the highest-paid employee in each department.
3. Employees Without Projects – Identify employees not working on any project.
4. Duplicate Emails Detection – Detect repeated email addresses in employee records.
5. Hackers Contest Statistics – Daily participant count and top submitters.
7. Prime Numbers Up to 1000 – Print all primes ≤1000 separated by &.
8. Occupation Pivot Table – Transform row data into columns grouped by occupation.
9. Binary Tree Node Type – Classify nodes as Root, Inner, or Leaf.
11.Best Friend Salary Comparison – Students whose best friends got higher offers.
12.Cost Distribution: India vs International – Compare job family costs across locations.
15.Top 5 Salaries (No ORDER BY) – Retrieve top salaries without ORDER BY.
17.Create User and Assign DB Owner Role – SQL user creation and permission
assignment.
20.Copy New Data Without Identifiers – Transfer only new records between tables without
using unique indicators.
How to Perform:
3. Break down each problem into logical components before coding.
5. Optimize query performance and validate accuracy with sample inputs.
Tools Used:
Outcome:
Completed 20 high-level SQL problems spanning analytics, admin operations, and database
logic. These challenges significantly improved my fluency in writing advanced queries and taught
me how to think beyond traditional SQL patterns.
19
The objective of this task was to create an automated subject allotment system using a stored
procedure in SQL. The system simulates a real-world use case where students are allotted
subjects based on their preferences and academic performance (GPA), while considering seat
availability constraints. This task enhanced my understanding of stored procedures, control flow
logic, cursors, and dynamic decision-making in SQL.
Problem Statement:
Each student selects five preferred subjects, ranked in order.
Each subject has a limited number of available seats.
Students must be allotted a subject based on the following rules:
Database Design:
Execution Steps:
Outcome:
Successfully implemented a functional and intelligent subject allotment system in SQL. The
task demonstrated the power of stored procedures and logic-driven SQL operations in solving
real-world automation problems involving conditional allocation and resource limits.
21
Objective:
The objective of this task was to design and implement a subject allotment system in SQL that
automatically assigns students to their preferred open elective subjects based on their GPA and
seat availability. The task involved developing a stored procedure to replicate a realistic scenario
where limited resources must be fairly distributed according to academic merit and preference
hierarchy.
Problem Statement:
Each student submits five subject preferences. The system must allot subjects using the
following logic:
● A student is allotted the first subject from their preferences that still has remaining
seats.
● If none of the five preferred subjects are available, the student is added to an unallotted
list
Outcome:
This task successfully simulated a real-world allocation problem using SQL. It deepened my
understanding of stored procedures, cursor control flow, seat management logic, and data integrity
handling within multi-step transactions. The project also tested my ability to design clean
database schemas aligned with business logic.
23
Overview:
This assignment consisted of solving diverse SQL problems covering a range of foundational
topics such as SELECT, WHERE, JOINs, GROUP BY, aggregations, subqueries, and CASE
statements. Each query was written using standard SQL syntax, properly commented, and
successfully tested on LeetCode’s platform..
Outcome:
Successfully solved 15 real-world inspired SQL problems, which helped improve my ability to
write clean, readable, and logical SQL queries. This task significantly enhanced my confidence in
core SQL operations and served as valuable practice for both interviews and real-time backend
problem-solving.
24
Outcome:
This task provided deep insight into enterprise-level data warehouse maintenance, especially how
to manage historical data effectively using different SCD strategies. It also reinforced my ability
to write reusable, logic-driven stored procedures for scalable ETL and reporting systems.
25
Objective:
The objective of this task was to develop a Stored Procedure in MySQL that dynamically
populates a Time Dimension table for an entire year based on a single input date. This task is
crucial in the context of data warehousing where a fully populated and richly attributed date
dimension is used in reporting, dashboards, and analytical models.
Task Description:
When a user passes any date (e.g., '2020-07-14') to the stored procedure, it extracts the year
(2020) and populates all 366 dates (including leap year logic) with complete date-related
attributes.
This includes calendar details (day, month, quarter), fiscal information, ordinal suffixes, and
week details — making the Time Dimension a complete reference dataset.
Implementation Breakdown:
1. Script: create_time_dimension_table.sql
Defines the schema of the TimeDimension table using appropriate data types, with Date or
INT fields depending on column meaning. SKDate is set as the Primary Key.
2. Script: populate_time_dimension_procedure.sql
Contains the stored procedure:
Outcome:
Successfully created a reusable, dynamic Time Dimension builder using only one efficient insert
query and rich SQL logic. This task enhanced my skills in dimensional modeling, date-based
analytics, and procedure-driven automation — which are key skills in modern data engineering
and reporting.
26
27
Minor Project
Departmental Average Salary Analysis with Threshold Comparison
Objective:
The purpose of this project was to analyze and compare department-wise average salaries
against the overall company-wide average salary using SQL. The goal was to identify
departments that maintain higher salary standards than the organizational average, which can help
in workforce planning, budgeting, and retention strategies.
Overview:
This project demonstrates how SQL can be used not just for data retrieval, but also for analytical
comparison and threshold-based filtering. Using aggregation and filtering techniques, I
retrieved departments whose average employee salary exceeds the overall company average.
The final output includes:
● Department Name
● Average Salary
● Number of Employees
Sample Data:
○ Average salary
○ Employee count
28
Tools Used:
● MySQL
Outcome:
This project enhanced my ability to use aggregate functions, subqueries, and comparative
logic in SQL. It also taught me how to perform department-level business intelligence analysis —
a common requirement in HR and financial reporting systems.
29
One of the most intellectually demanding tasks was building a Stored Procedure to populate a
Time Dimension table using only a single INSERT INTO ... SELECT statement. The traditional
approach would have involved inserting records one by one or using loops — however, these
were restricted in the problem constraints.
Challenge:
I had to dynamically generate 365 (or 366) rows based on a single input date and extract multiple
attributes like day names, fiscal periods, quarters, suffixes, and more — without using multiple
insert statements.
Solution:
After researching advanced SQL patterns, I implemented a Recursive Common Table Expression
(CTE) — a powerful feature in MySQL 8.0+. I used this to dynamically generate a sequence of
dates for the entire year. Within that single recursive result set, I calculated all required fields
using built-in date functions like DATE_FORMAT(), DAYOFYEAR(), WEEK(), QUARTER(),
and conditional logic for suffixes and fiscal periods. This was my first time writing such a
complex recursive query and it taught me how much logic SQL can handle when written with
care.
Another key challenge was building stored procedures for all six types of Slowly Changing
Dimensions (SCD) — an essential but rarely explored concept at the college level.
Challenge:
Each SCD type (0, 1, 2, 3, 4, and 6) had a distinct purpose, requiring careful understanding of
how to track changes, retain history, update live records, and structure versioned tables. Some
types required additional columns like start_date, end_date, current_flag, and even a separate
history table.
30
Solution:
I studied data warehousing concepts from Microsoft documentation, blog articles, and GitHub
repositories to understand the best practices. For each type:
● I wrote modular stored procedures with MERGE-like logic using IF EXISTS, UPDATE,
and INSERT.
● I built logic to set current_flag and versioning columns based on the update strategy.
● I used temporary staging tables (stg_customer) and designed the dim_customer schema in
a way that supported rollback-friendly testing.
Implementing SCD Type 6, which combines features of Type 1, 2, and 3, was the most
complicated. But solving this taught me to carefully plan schema structures and logic flows in
SQL — something I had never done before this internship.
Early in the internship, while solving LeetCode SQL problems and project-level queries, I
frequently encountered issues with wrong output, Cartesian products, and slow-running queries.
Challenge:
At times, the results seemed correct at first glance, but subtle logic flaws in JOINs, GROUP BY,
or missing WHERE clauses led to incorrect aggregations or duplication.
🛠 Solution:
I started writing queries with a "debugging mindset." I learned to:
This helped me not just fix existing problems, but also write more confident and scalable
queries in my later tasks.
31
During my internship at Celebal Technologies, I worked with a range of tools, platforms, and
technologies that supported both learning and task execution. These tools enabled me to write,
test, deploy, and document SQL queries and stored procedures in real-world data environments.
Here is a comprehensive list of the technologies used during the course of the internship:
The primary environment used for writing and executing SQL queries. SSMS provided a
user-friendly interface to connect with SQL Server, manage databases, and debug queries in
real-time. I used it extensively for:
Used in scenarios where tasks were required to be tested using MySQL-specific syntax, especially
for stored procedures, dynamic date functions, and recursive Common Table Expressions (CTEs).
MySQL Workbench was particularly useful in building the Time Dimension generator.
3. GitHub
Used for version control and project documentation. All SQL tasks and project files were hosted
in a public GitHub repository, where I maintained clean formatting, comments, and structure.
GitHub also allowed me to demonstrate my work to mentors and future recruiters.
Used to solve real-world SQL problems and validate solutions against test cases. It helped me
enhance my logical thinking and apply theoretical concepts in structured scenarios, especially for
beginner and intermediate-level SQL logic.
Used only as a reference sheet for columns and data format in the Time Dimension task. While no
static data was copied, it guided the schema design for various date-related attributes like fiscal
periods, day suffixes, and calendar values.
An advanced SQL feature used to dynamically generate a sequence of dates without using loops.
This was key in solving the Time Dimension task with a single INSERT INTO ... SELECT
statement, respecting the constraints.
Through regular commit messages, README files, and structured comments, I learned how to
document SQL code professionally, which made my work reusable and easy to understand.
These tools, combined with a logical mindset and structured learning, helped me develop both
confidence and competency in SQL development and data warehousing practices.
34
Here are the most important learnings and takeaways from my internship:
Before the internship, my SQL knowledge was limited to basic queries and simple use cases.
Over the course of this internship:
● I mastered advanced SQL topics like joins, subqueries, window functions, and
aggregations.
One of the highlights of the internship was learning about Slowly Changing Dimensions (SCD
Types) — a core part of dimensional modeling. I implemented:
● Type 1 (overwrite)
● Type 2 (versioning)
This gave me clarity on how businesses track historical changes in data and structure ETL
pipelines.
For the first time, I wrote dynamic stored procedures involving loops, cursors, conditional logic,
and recursive CTEs. This helped me:
35
From subject allotment systems to salary analysis and time dimension generation — every task
was rooted in real-world logic. I learned how to:
Using GitHub taught me the importance of organizing code, writing meaningful commit
messages, and maintaining clean documentation. This is a crucial skill in collaborative software
development.
At times, queries didn’t work as expected. Joins broke, procedures failed, or logic was wrong. But
I learned to:
I will carry these lessons forward into my academic projects, future internships, and eventually
my career in software engineering and data systems. The experience made me more disciplined,
confident, and inspired to keep learning.
37
Conclusion
The internship at Celebal Technologies has been a transformative experience in my academic and
professional journey. Over the course of two months, I gained hands-on exposure to real-world
SQL tasks, database optimization techniques, and data analysis workflows. Each assignment —
from writing basic SELECT queries to developing stored procedures for subject allotment and
handling Slowly Changing Dimensions — challenged me to think analytically and apply
theoretical concepts in practical scenarios.
Working with tools like SQL Server Management Studio (SSMS), MySQL, and GitHub helped
me understand the standard practices followed in the industry. The projects and challenges pushed
me to explore advanced SQL topics such as window functions, pivoting, recursive queries, and
dynamic stored procedures, significantly strengthening my logical reasoning and problem-solving
abilities.
Beyond the technical skills, this internship also helped me grow as a professional. I learned the
importance of writing clean, maintainable code, managing time effectively, documenting work
properly, and communicating within a structured workflow. Collaborating with mentors and
adhering to deadlines taught me discipline, attention to detail, and how to align with real industry
expectations.
In conclusion, this internship has not only equipped me with a solid foundation in SQL and
database systems but also inspired me to continue exploring the fields of data engineering,
analytics, and backend development. I am truly grateful for this opportunity and confident that the
skills acquired will play a pivotal role in shaping my future career in the tech industry.