Computer Applications in Manufacturing
Submitted To: Col Dr Imran Shafi
Submitted By:
Noman Ali
Reg. No.369524
ME-DE-43
Syndicate-C
Assignment # 02
DEPARTMENT OF MECHANICAL ENGINEERING
COLLEGE OF E&ME, NUST, RAWALPINDI
Task 1: Datasets Exploration at various sites.
Kaggle:
Kaggle is a popular online platform for data science and machine learning enthusiasts. It offers a
variety of datasets contributed by individuals, researchers, companies, and organizations. These
datasets cover a wide range of topics including finance, healthcare, sports, and social sciences.
Datasets on Kaggle are crowdsourced from multiple contributors around the world, including
businesses like Google, government bodies, universities, and individuals. The data is gathered
from various sources such as research studies, surveys, public databases, sensors, social media
platforms, and web scraping. Kaggle datasets are widely used for educational purposes,
research, competitions, and industrial applications. Companies and individuals use them to build
and test machine learning models, with some datasets contributing to real-world solutions like
fraud detection, sentiment analysis, and predictive analytics. Users can access Kaggle datasets
to practice data manipulation, apply machine learning techniques, and participate in competitions.
The platform also offers kernels (scripts) to help beginners and advanced users work with the
data easily. The few datasets found on Kaggle are shown below:
Sr. Dataset Explanation
No.
1. Student Performance The “Student Performance Factors” dataset could be found on Kaggle
Factors which includes student demographic information, academic support info
and personal factors related with a model for the prediction of performance.
The study includes demographic data of a child (e.g., gender, parental
education), school support or number of activities on subjects,
extracurricular and family background. It can be used to train models that
predict academic success, or study how different factors such as
demographics and financial aid are correlated with student outcomes. It
serves as a base for educators and scholars on which they can recognize
principal factors of academic success. It contains about 6600 samples of
data.
2. Mobile Device Usage This dataset has data on mobile phone usage patterns and how
and User Behaviour users interact with their phones. Some of these features include
Dataset demographics, app usage frequency, mobile device type, battery
drain, data usage, OS of the device and the time spent on different
activities. The data set is useful for understanding various segments
of mobile users, and informing trends in app usage, device
ownership and user behaviour on a mobile. Most common
applications are in Marketing, Behavioural studies and Mobile app
development. It contains 700 samples of user data. Each entry is
categorized into one of five user behaviour classes, ranging from
light to extreme usage, allowing for insightful analysis and modeling.
3. Electric Vehicle Sales by This dataset presents detailed data on electric vehicle sales across
State in India various Indian states. It includes information on the number of EVs
sold, the distribution of sales by state, and possibly other
demographic or geographic indicators influencing sales trends. This
dataset is useful for analysing the adoption rate of electric vehicles
across different regions in India and identifying patterns related to
state-wise EV penetration. It contains almost 98000 sample data.
4. 3D Printer Dataset for This dataset focuses on data relevant to 3D printing, including key
Mechanical Engineers parameters such as print speed, layer height, material used, and
resulting quality of prints. It offers insights into the impact of various
3D printing settings on the final output, making it useful for engineers
working on optimizing 3D printing processes. The dataset can be
applied to performance analysis, material efficiency, and quality
improvement in additive manufacturing. It contains almost 50 sample
data.
5. Materials and their This dataset has details for tensile strength, hardness, ductility and
Mechanical Properties elasticity with respect to different materials. This is perfect for
materials science research, engineering projects as well analysis of
material performance across various applications. The dataset aids
researchers and engineers to comprehend material behaviour in
different states, thus being beneficial for mechanical designing as
well selecting the best fit materials. It contains almost 1500 sample
data.
GitHub:
GitHub is a version control platform where developers can share code, collaborate on software
projects, and host open-source projects. It has repositories with datasets and machine learning
models contributed by users. Datasets on GitHub are created by developers, data scientists,
researchers, and organizations. The data hosted on GitHub varies in source. Some may be the
results of scraping, experiments, API integrations, or manually gathered data. It’s a mix of code-
generated datasets and manually curated data files. GitHub datasets are used by developers and
data scientists to test algorithms, run machine learning models, or build web and mobile
applications. GitHub hosts a wide variety of code repositories that include ready-to-use datasets.
Developers can clone these repositories, access the datasets, and integrate them into their
projects. GitHub’s collaborative features also enable contributions to ongoing datasets, improving
or expanding the available data. The few datasets found on GitHub are shown below:
Sr. Dataset Explanation
No.
1. Pipe Specification The dataset provides a user interface for industrial pipe material
Selection specifications, including material grade, material type, pressure
rating (flange), and corrosion allowance based on a specific fluid. It
is designed to help engineers and industry professionals in selecting
the appropriate piping materials and configurations for various fluid
transportation systems, ensuring safety and efficiency based on
environmental and operational conditions.
2. Car Simulations The dataset aims to simulate the multibody dynamics behavior of a
car body's motion as it moves over uneven terrain. It includes code
for simulating suspension systems, body movement, and interaction
with irregular surfaces. This dataset is useful for automotive
engineers and researchers focused on vehicle dynamics and
suspension design, providing a foundation to model and analyze
real-world driving scenarios.
3. Introduction to Mechanical The dataset offers resources related to the fundamentals of
Manufacturing mechanical manufacturing. It includes lecture notes, assignments,
and potentially data or examples on manufacturing processes such
as machining, casting, forming, and additive manufacturing. The
dataset is beneficial for students and professionals looking to
understand key concepts in manufacturing engineering and enhance
their practical skills in mechanical production.
4. Structural Analysis This dataset is a Julia package for topology optimization. It provides
tools for optimizing material layouts within a given design space,
under specified constraints and boundary conditions, to achieve the
best performance. This package is useful for mechanical engineers
and researchers working on structural optimization, offering
customizable simulations to design lightweight and efficient
structures.
5. Composite Materials The dataset is a Python package designed to solve problems related
to laminated composite materials using classical laminate theory
(CLT). It provides tools for computing mechanical properties,
stresses, strains, and failure criteria for composite laminates. The
dataset and code are particularly useful for mechanical and
aerospace engineers working on the analysis and optimization of
composite structures.
UCI: Machine Learning Repository:
The UCI Machine Learning Repository is a collection of databases, domain theories, and datasets
used by the machine learning community for empirical research in algorithms and data
exploration. Created by the University of California, Irvine (UCI), this repository has contributions
from academic researchers, engineers, and scientists. The datasets were collected from various
sources such as academic research papers, experiments, web scraping, and public records.
Some datasets are also results from competitions. These datasets are widely used in academic
research to test machine learning algorithms, explore data analysis techniques, and benchmark
performance. They're frequently cited in papers and research projects. Researchers, students,
and practitioners can access these datasets to validate algorithms, conduct data analysis, or for
teaching purposes. Many use the UCI datasets for educational projects and tutorials to learn data
preprocessing and algorithm development. The few datasets found on UCI are shown below:
Sr. Dataset Explanation
No.
1. Heart Disease The Heart Disease dataset from the UCI Machine Learning
Repository contains medical records used to diagnose heart
disease. It includes features such as age, gender, blood pressure,
cholesterol levels, and results from various medical tests. The target
variable indicates the presence or absence of heart disease. This
dataset is widely used for classification and prediction tasks in
medical diagnostics, particularly for predicting the likelihood of heart
disease based on patient data.
2. Car Evaluation The dataset contains data used to evaluate car acceptability based
on several features like buying price, maintenance cost, number of
doors, passenger capacity, trunk size, and safety features. The
dataset is useful for classification tasks, where the target variable
indicates the overall evaluation of the car (unacceptable, acceptable,
good, or very good). It's widely used for decision-making models in
the automotive industry.
3. Online Retail The dataset consists of transactional data from a UK-based online
retailer. It includes features such as invoice numbers, product
descriptions, quantities, prices, and customer identifiers. This
dataset is useful for analyzing customer behaviour, sales patterns,
and inventory management. It's commonly used for tasks like market
basket analysis, customer segmentation, and sales forecasting.
4. Individual Household The dataset contains measurements of electric power consumption
Electric Power in one household over a period. It includes features like date, time,
Consumption global active power, voltage, and energy sub-metering. This dataset
is valuable for time-series analysis and research on energy usage
patterns, efficiency, and forecasting in household settings.
5. Concrete Compressive The dataset contains data on concrete samples with varying
Strength compositions and curing times. It includes features such as the
amount of cement, water, and aggregates, as well as the
compressive strength of the concrete. This dataset is useful for
regression analysis and modelling the strength of concrete based on
its constituents, aiding in material science and construction
engineering studies.
Deep Learning Datasets with features:
The datasets of deep learning are shown below:
Sr. Dataset Explanation
No.
1. MNIST It's a dataset that has numbers, which's useful for AI to learn and
recognize patterns.
2. MS COCO A big dataset which helps AI with tasks, like detecting objects and
segmenting images. It has a range of contents, making it very
diverse.
3. ImageNet This collection contains several images that are categorized based
on WordNet. It's helpful for AI to learn concepts.
4. VisualQA It's a dataset that focuses on image related questions challenging AI
to use both vision and language skills.
5. CIFAR 10 This dataset includes ten categories of images. Its main purpose is
to train AI models in recognizing images.
6. Fashion MNIST Focusing on numbers this alternative data set concentrates on
fashion-related images. It helps AI fashion items effectively.
7. Street View House Like MNIST this one helps in recognizing street scenes.
Numbers
8. Sentiment140 This dataset is specifically designed for sentiment analysis helping
AI understand emotions expressed in text data.
9. WordNet An extensive database containing synonym words and their
associated concepts. It greatly assists AI in understanding language
comprehension.
10. Wikipedia Corpus A repository of information sourced from various articles. It serves as
a resource, for AI learning purposes.
11. Free Spoken Digit A dataset created for identifying spoken digits using samples.
12. Free Music Archive A vast music analysis dataset that consists of high-quality audio
features and metadata.
13. Ballroom A collection of audio excerpts representing various dance styles,
enabling AI to analyze musical patterns.
14. Million Song A repository of audio features and metadata for a million music
tracks, ideal for AI research.
15. LibriSpeech A dataset containing a thousand hours of English speech, to train AI
listening models.
16. VoxCeleb A speaker identification dataset derived from YouTube, featuring
famous voices.
17. Urban Sound A set of urban sound clips for AI to classify into different categories.
Classification
18. IMDB reviews A valuable dataset for AI, used in analysis, providing movies public
reviews.
19. Twenty Newsgroups A collection of a thousand Usenet articles from twenty newspapers,
assisting AI in text analysis.
20. Yelp Reviews A dataset of user reviews with images and varying file sizes, a fruitful
playground for AI study.