INTERN
I N T E L L I G E N C E
DATA
ANALYTICS
Intern Intelligence
ABOUT US
Intern Intelligence is a
pioneering organization
dedicated to nurturing interns
and achieving ambitious goals.
Our mission is to support interns
in building successful careers
through innovative approaches
and unwavering support. We
aim to make significant strides
and set new benchmarks in the
industry, fostering the next
generation of leaders.
INSTRUCTION
ID: Your unique ID, provided in the offer letter, is crucial. Keep it safe as you will
need it for task submission.
Task Submission Link: The link for task submission will be emailed to you within
approximately one week.
Submission
Task Submission: You will need to complete at least 2 tasks for successful
completion of internship.
Task Completion: As part of your internship, you will be assigned several tasks.
After completing each task, please record a video demonstrating your work and
share it on LinkedIn, again using the hashtag #internintelligence and tagging
@InternIntelligence.
GitHub Repository: Upload all completed tasks to GitHub. Name your repository in
the format InternIntelligence_ProjectName.
About Internship
COMPLETION PLACEMENT NETWORK
CERTIFICATE SUPPORT OPPORTUNITY
DATA ANALYTICS
Task list
You will need to complete at least 2 tasks for successful completion of internship.
TASK 1
Perform Patient Risk Analysis Using Open Health Datasets
Task : The objective of this assignment is to conduct comprehensive patient data analysis within the healthcare sector. The process includes sourcing high-quality healthcare
datasets, performing data cleaning and preprocessing, conducting exploratory data analysis to identify key insights, and developing predictive models to support clinical
decision-making. This project aims to enhance practical skills in data analytics, visualization, and machine learning applied to real-world healthcare data.
1. Data Acquisition and Preparation
Identify and source reliable, open-access healthcare datasets relevant to patient demographics, diagnoses, treatments, and outcomes.
Perform initial data cleaning, including handling missing values, eliminating duplicates, and ensuring data consistency.
Analyze dataset attributes such as patient demographics, diagnosis codes, treatment types, and laboratory results.
Compile a summary report detailing data sources, quality assessments, and preliminary observations.
Recommended Data Sources:
MIMIC-III Clinical Database ([Link]
Kaggle Healthcare Datasets ([Link]
CDC Data & Statistics ([Link]
[Link] ([Link]
2. Data Processing and Exploration
Execute data preprocessing steps such as normalization, encoding categorical variables, and feature engineering where applicable.
Apply descriptive statistical analysis and create visualizations including histograms, boxplots, and correlation matrices to identify trends and anomalies.
Conduct exploratory data analysis focusing on patient outcomes, disease prevalence, and treatment effectiveness.
Develop visual reports or dashboards to effectively communicate analytical findings.
3. Predictive Modeling and Reporting
Select relevant features based on exploratory analysis to develop predictive models.
Build and evaluate machine learning models such as logistic regression, decision trees, and random forests to predict patient outcomes.
Validate models through cross-validation techniques and assess performance using accuracy, precision, recall, and AUC-ROC metrics.
Deliver a comprehensive report detailing methodology, results, limitations, and actionable recommendations for healthcare stakeholders.
Tools: Python (Scikit-learn), Jupyter Notebook, Excel, PowerPoint
TASK 2
Energy Consumption and Savings Analysis
Task : This assignment focuses on analyzing energy consumption data to identify patterns, inefficiencies, and potential savings opportunities. The tasks involve
acquiring relevant datasets, performing data cleaning and preprocessing, conducting exploratory analysis to understand consumption trends, and developing
predictive models to forecast energy use and optimize savings. The project aims to provide actionable insights to support energy management and conservation
efforts
1. Data Acquisition and Preparation
Identify and source reliable, open-access datasets related to energy consumption, production, and savings.
Perform data cleaning including handling missing values, removing duplicates, and ensuring data consistency.
Examine dataset features such as consumption by sector, time period, geographic location, and energy types.
Prepare a summary report outlining data sources, quality evaluation, and initial observations.
Recommended Data Sources:
U.S. Energy Information Administration (EIA) ([Link]
OpenEI (Open Energy Information) ([Link]
Kaggle Energy Datasets ([Link]
International Energy Agency (IEA) ([Link]
2. Data Processing and Exploration
Conduct preprocessing including normalization, encoding categorical variables, and feature engineering as needed.
Use statistical methods and visualizations such as time series plots, histograms, and correlation matrices to analyze consumption patterns and anomalies.
Explore relationships between energy consumption and factors such as seasonality, sector, and geography.
Develop reports and dashboards to effectively present insights derived from the data.
3. Predictive Modeling and Reporting
Select key features for predictive modeling based on exploratory analysis.
Build and validate machine learning models (e.g., regression models, decision trees, random forests) to forecast energy consumption and identify savings potential.
Evaluate model performance using cross-validation and metrics such as RMSE, MAE, and R².
Prepare a detailed report summarizing methodology, findings, limitations, and recommendations for energy efficiency improvements.
Tools: Python (Scikit-learn), Jupyter Notebook, Excel, PowerPoint
TASK 3
Social Media and Customer Sentiment Analysis
Task : Analyze and interpret customer emotions and opinions expressed on social media channels. The project involves collecting relevant data, processing textual content to
extract sentiment information, and delivering insights that support marketing and customer service improvements.
Phase 1: Data Sourcing and Verification
Locate and retrieve social media datasets via public repositories or by using platform-specific APIs (e.g., Twitter, Reddit).
Conduct an initial examination to identify and filter out irrelevant entries, duplicates, and noisy data.
Evaluate dataset features, including message content, timing, user profiles, and engagement statistics.
Summarize findings in a report describing data origin, quality checks, and initial impressions.
Suggested Platforms and Libraries:
Twitter API, Reddit API (Tweepy, PRAW)
Kaggle Social Media Data Collections
Tools: Python (Pandas, NumPy), Excel, Jupyter Notebook
Phase 2: Textual Analysis and Sentiment Detection
Preprocess the text data with techniques such as tokenization, stop word elimination, and normalization (stemming or lemmatization).
Transform text into machine-readable formats using vectorization techniques like TF-IDF or word embeddings.
Apply sentiment analysis models to classify customer messages by emotional tone (positive, negative, neutral).
Visualize sentiment distributions, time-based sentiment shifts, and highlight frequent themes using charts or dashboards.
Produce a detailed analytical report outlining sentiment patterns and significant observations.
Recommended Tools:
Python (NLTK, TextBlob, SpaCy, Scikit-learn)
Optional Visualization: Tableau, Power BI
Phase 3: Modeling, Evaluation, and Presentation
Develop and train predictive models to enhance sentiment classification or uncover trending topics with machine learning or deep learning algorithms.
Validate models rigorously using metrics like accuracy, precision, recall, and F1-score.
Create comprehensive visual dashboards or slide presentations to communicate results and insights to relevant stakeholders.
Compile a final report documenting methodologies, outcomes, limitations, and actionable recommendations for business use.
TASK 4
Logistics and Delivery Performance Analysis
Task : Evaluate and improve logistics and delivery operations by analyzing performance data. This project focuses on collecting shipment and delivery records, processing
key metrics, identifying bottlenecks, and providing actionable insights to optimize supply chain efficiency.
Phase 1: Data Collection and Quality Assessment
Source logistics datasets from publicly available repositories or company-provided shipment and delivery records.
Perform data cleaning to remove inconsistencies, duplicates, and incomplete entries.
Analyze key variables such as shipment dates, delivery times, transportation methods, and order statuses.
Document a data quality report summarizing dataset characteristics and any issues encountered.
Suggested Data Sources and Tools:
Public logistics datasets on platforms like Kaggle ([Link]
Company shipment and delivery logs (if available)
Python libraries: Pandas, NumPy
Phase 2: Performance Metrics Analysis
Calculate critical logistics KPIs such as on-time delivery rate, average delivery time, transit delays, and order fulfillment accuracy.
Identify patterns and trends in delivery performance across different regions, carriers, or product types.
Visualize data through charts and dashboards to highlight areas of inefficiency or success.
Prepare an analytical report summarizing key performance indicators and notable trends.
Phase 3: Optimization Modeling and Reporting
Develop predictive models to forecast delivery times or detect potential delays using machine learning techniques.
Evaluate model accuracy and reliability with appropriate metrics such as RMSE, precision, and recall.
Create clear and concise presentations or dashboards to communicate findings and recommendations to logistics teams and management.
Compile a comprehensive final report detailing data sources, analysis methods, model outcomes, and strategic suggestions for operational
improvements.
TASK 5
E-commerce Customer Segmentation and Purchasing Behavior Analysis
Task : Conduct a comprehensive analysis of e-commerce customers to identify distinct segments and understand purchasing behaviors. The project involves acquiring
relevant transactional data, performing segmentation using clustering techniques, analyzing purchase patterns, and delivering insights to support targeted marketing
strategies.
Phase 1: Data Acquisition and Preprocessing
Obtain e-commerce customer transaction datasets from publicly accessible platforms or company-provided data.
Clean the data by handling missing values, duplicates, and inconsistencies.
Explore dataset attributes such as customer demographics, transaction history, product categories, and purchase timestamps.
Prepare an initial report outlining the data sources, quality assessments, and summary statistics.
Suggested Data Sources:
UCI Machine Learning Repository - Online Retail Data Set: [Link]
Kaggle - E-commerce Customer Behavior Datasets: [Link]
Amazon Customer Reviews Dataset: [Link]
Recommended Tools:
Python (Pandas, NumPy), Jupyter Notebook, Excel
Phase 2: Customer Segmentation and Behavioral Analysis
Apply clustering algorithms (e.g., K-Means, Hierarchical Clustering) to segment customers based on purchasing frequency, monetary value, and recency.
Analyze buying behaviors across segments to identify preferences, high-value customers, and churn risks.
Visualize segment distributions and behavioral trends using charts and dashboards.
Generate a detailed report summarizing segmentation results and key customer insights.
Phase 3: Predictive Modeling and Strategic Recommendations
Build predictive models to forecast customer lifetime value, purchase likelihood, or product preferences using machine learning techniques.
Evaluate model performance through metrics such as accuracy, ROC-AUC, and precision-recall scores.
Create presentations or dashboards that clearly communicate findings and actionable recommendations to business stakeholders.
Submit a comprehensive final report covering data acquisition, methodology, analytical results, and strategic marketing suggestions.
Connect
with us.
Youtube
Intern Intelligence
Website
[Link]
E-mail
[Link]@[Link]
LinkedIn
@Intern Intelligence
Instagram
InternIntelligence
Telegram
InternIntelligence