0% found this document useful (0 votes)
33 views28 pages

Data Analytics Handouts-Open Course

The document is an introduction to data analytics presented by Professor Mohsen Yahyaei at the Julius Kruttschnitt Minerals Research Centre. It highlights the University of Queensland's commitment to research and interdisciplinary collaboration, particularly in the mining and minerals sector. The content covers the Sustainable Minerals Institute's research programs, services, and the importance of data analytics in making informed decisions in the industry.

Uploaded by

Elmferrek Mourad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views28 pages

Data Analytics Handouts-Open Course

The document is an introduction to data analytics presented by Professor Mohsen Yahyaei at the Julius Kruttschnitt Minerals Research Centre. It highlights the University of Queensland's commitment to research and interdisciplinary collaboration, particularly in the mining and minerals sector. The content covers the Sustainable Minerals Institute's research programs, services, and the importance of data analytics in making informed decisions in the industry.

Uploaded by

Elmferrek Mourad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

13/09/2022

Introduction to data analytics

Professor Mohsen Yahyaei

Julius Kruttschnitt Minerals Research Centre (JKMRC)

Acknowledgement
of Country
• The University of Queensland acknowledges
the Traditional Owners and their custodianship
of the lands on which we meet.
• We pay our respects to their Ancestors and
their descendants, who continue cultural and
spiritual connections to Country.
• We recognise their valuable contributions to
Australian and global society.

CRICOS code 00025B

1
13/09/2022

The University of Queensland


The University of Queensland specialises in
research that creates meaningful impact
Global to 200+ active US$16 billion
50 University License agreement Gross product sales
UQ continually builds on its global reputation in key
areas of national and international significance through
interdisciplinary collaboration with more than 400
international industry partners.

100+ companies 87 US patents 400+ institutional


UQ’s world-leading research is delivered by an created granted partners in 50+ countries
interdisciplinary research community of more than
1500 scientists across six faculties, four research
institutes and 100+ research centres.

#1 in the world 4 world-leading 100 per cent of UQ research


For mining and minerals research Institutes At or above world standard
engineering (AIBN, IMB, SMI, QBI) (ERA assessment 2015)

CRICOS code 00025B 3


Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B

Sustainable Minerals Institute (SMI)


SMI Centres

Julius Kruttschnitt Minerals Industry W.H.Bryan Mining Centre for Mined Centre for Social Centre for Water
Mineral Research Safety and and Geology Land Rehabilitation Responsibility in in the Minerals
Centre (JKMRC) Health Centre Research Centre (CMLR) Mining (CSRM) Industry (CWiMI)
(MISHC) (BRC)

Technology and Knowledge


Strategic Research Programs
Transfer

JKTech International
Unlocking Complex Future Autonomous Transforming the Development Governance and Centre of
Orebodies Systems and Mine Lifecycle Minerals Leadership Excellence in Chile
Technologies (SMI-ICE Chile)

CRICOS code 00025B 4


Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B

2
13/09/2022

Sustainable Minerals Institute (SMI)

CRICOS code 00025B 5


Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B

Sustainable Minerals Institute

Research groups
• BRC
- Mine waste transformation through
Julius Kruttschnitt Minerals Industry W.H.Bryan Mining
characterisation
Mineral Research Safety and and Geology - Total deposit knowledge
Centre (JKMRC) Health Centre Research Centre - Deep mining geoscience
(MISHC) (BRC)
• JKMRC
Production research centres - Novel separation
- Flotation chemistry
- High-Temperature Processing
Technological innovation for the mines of future - Advanced Process Prediction and Control
- Mine energy transformation and integration
• MISHC
- Artisanal and small-scale mining
- Risk
- Mining automation human system integration

CRICOS code 00025B 6


Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B

3
13/09/2022

Sustainable Minerals Institute


Research groups
• CMLR
- Ecological engineering of mine waste
- Ecosystem assessment, restoration &
resilience Centre for Mined
Land Rehabilitation
Centre for Social Centre for Water
in the Minerals
Responsibility in
- Industrial ecology & circular economy (CMLR) Mining (CSRM) Industry (CWiMI)
- Environmental Geochemistry
• CSRM research consortia Sustainability research centres
- Social aspects of mine closure
- Mining and resettlement
- Area of expertise: Communities, governance, Enterprise transformation for the mines of
Agreement, Development, cultural heritage, future
Resettlement, ASM, Human rights, Indigenous
peoples, mine closure, conflict, gendre
• CWiMI
- Regional water and land resources

CRICOS code 00025B 7


Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B

Sustainable Minerals Institute

• Unlocking resources of
the future
• Transforming the
industry through
Strategic Research Program
automation
Agile structure to address emerging challenges of mining
Facilitate cross-disciplinary collaboration between domain exerts
• Circular economy for
mining
• Highly reliable
organisations
Unlocking Complex
Orebodies
Future Autonomous
Systems and
Transforming the
Mine Lifecycle
Development
Minerals
Governance and
Leadership
• Repurposing mine
Technologies
waste
CRICOS code 00025B 8
Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B

4
13/09/2022

Sustainable Minerals Institute


• Services
- Consulting
- Laboratory services

• Products
• Testing equipment
• Software solutions Knowledge and
Technology Transfer

• Professional development
• Short courses
• Online courses
• Webinars
• Tailored courses JKTech International
Centre of
Excellence in Chile
(SMI-ICE Chile)

CRICOS code 00025B 9


Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B

Sustainable Minerals Institute


Future-ready workforce
Industry leaders

CRICOS code 00025B 10


Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B

10

5
13/09/2022

SMI snapshot

CRICOS code 00025B 11


Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B

11

Julius Kruttschnitt Mineral Research Centre


Groups
Projects

Advanced Process Mine Energy Separation Flotation Chemistry High Temperature


Prediction and Transformation and Processing (HTP)
Control Integration
(APPCo) (METI)

Facilities Highlighted Projects

Indooroopilly Pilot Mineral Model-Informed Flexi-lab Testing Novel flotation Coarse Particle High Voltage Pulse
Plant Characterisation Process Control Facility reagents Flotation Comminution
Research Facility

Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B
CRICOS code 00025B 12

12

6
13/09/2022

Pilot Plant and Laboratories

Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B
CRICOS code 00025B 13

13

Professor Mohsen Yahyaei


B.E. (Mining), M. Phil., Ph. D. (Mineral processing), MBA

• Process Autonomy
• Sustainable Minerals Institute (SMI)
• Director
Julius Kruttschnitt Minerals Research Centre (JKMRC)
• Program leader
Future Autonomous Systems and Technologies (FAST)

• Research leader and researcher (JKMRC)


• Plant Superintendent – Coal processing – Interkarbon

Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B
CRICOS code 00025B 14

14

7
13/09/2022

Practical experience

Process modelling and simulation | Lecture 1-Introduction © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B
CRICOS code 00025B 15

15

Students
• Name

• Organisation

• Role

• Experience in mineral processing

• Expectation from the module

CRICOS code 00025B 16

16

8
13/09/2022

Sustainable Minerals Institute (SMI)


Julius Kruttschnitt Mineral Research Centre (JKMRC)

What is data analytics

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 17

17

Why data analytics?


What is data analytics?

What do you understand for data analytics?

And, why? Why should we data analytics?

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 18

18

9
13/09/2022

Asking the best question

 What is the best car?


o Cheapest?
o Fastest?
o Colour?
mpg cylinders displacement horsepower weight acceleration year origin name

0 18 8 307 130 3504 12 70 usa chevrolet chevelle malibu

1 15 8 350 165 3693 11.5 70 usa buick skylark 320

2 18 8 318 150 3436 11 70 usa plymouth satellite

3 16 8 304 150 3433 12 70 usa amc rebel sst

4 17 8 302 140 3449 10.5 70 usa ford torino

5 15 8 429 198 4341 10 70 usa ford galaxie 500

6 14 8 454 220 4354 9 70 usa chevrolet impala

7 14 8 440 215 4312 8.5 70 usa plymouth fury iii

8 14 8 455 225 4425 10 70 usa pontiac catalina

9 15 8 390 190 3850 8.5 70 usa amc ambassador dpl

10 15 8 383 170 3563 10 70 usa dodge challenger se

Photo Attribution: https://commons.wikimedia.org/wiki/File:1970_ford_torino_cobra_sportsroof_chiolero.jpg


https://commons.wikimedia.org/wiki/File:66Sat.jpg
Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) https://commons.wikimedia.org/wiki/File:Pontiac_Catalina_front.jpg
CRICOS code 00025B 19 1
9
19

Asking the best question

 What is the best car?


o Cheapest?
 To buy?
 To run?
 Lifetime costs? mpg cylinders displacement horsepower weight acceleration year origin name

0 18 8 307 130 3504 12 70 usa chevrolet chevelle malibu

1 15 8 350 165 3693 11.5 70 usa buick skylark 320

o Fastest? 2 18 8 318 150 3436 11 70 usa plymouth satellite

3 16 8 304 150 3433 12 70 usa amc rebel sst

4 17 8 302 140 3449 10.5 70 usa ford torino

o Colour? 5 15 8 429 198 4341 10 70 usa ford galaxie 500

6 14 8 454 220 4354 9 70 usa chevrolet impala

7 14 8 440 215 4312 8.5 70 usa plymouth fury iii

8 14 8 455 225 4425 10 70 usa pontiac catalina

9 15 8 390 190 3850 8.5 70 usa amc ambassador dpl

10 15 8 383 170 3563 10 70 usa dodge challenger se

Photo Attribution: https://commons.wikimedia.org/wiki/File:1970_ford_torino_cobra_sportsroof_chiolero.jpg


https://commons.wikimedia.org/wiki/File:66Sat.jpg
Process modelling and simulation | Lecture 19-Data Analytics https://commons.wikimedia.org/wiki/File:Pontiac_Catalina_front.jpg
© 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B
2 20

20

10
13/09/2022

Asking the best question


 What is the best car?
o Cheapest?
 To buy?
 To run?
 Lifetime costs?
 What is expected lifetime?
 What about changing fuel prices? mpg cylinders displacement horsepower weight acceleration year origin name

 What level of insurance? 0 18 8 307 130 3504 12 70 usa chevrolet chevelle malibu

 Who will be driving it? 1 15 8 350 165 3693 11.5 70 usa buick skylark 320

 How often will they be driving it? 2 18 8 318 150 3436 11 70 usa plymouth satellite

 How far will they be driving it? 3 16 8 304 150 3433 12 70 usa amc rebel sst

4 17 8 302 140 3449 10.5 70 usa ford torino

5 15 8 429 198 4341 10 70 usa ford galaxie 500


o Fastest?
6 14 8 454 220 4354 9 70 usa chevrolet impala

7 14 8 440 215 4312 8.5 70 usa plymouth fury iii

o Colour? 8 14 8 455 225 4425 10 70 usa pontiac catalina

9 15 8 390 190 3850 8.5 70 usa amc ambassador dpl

10 15 8 383 170 3563 10 70 usa dodge challenger se

Photo Attribution: https://commons.wikimedia.org/wiki/File:1970_ford_torino_cobra_sportsroof_chiolero.jpg


https://commons.wikimedia.org/wiki/File:66Sat.jpg
Process modelling and simulation | Lecture 19-Data Analytics https://commons.wikimedia.org/wiki/File:Pontiac_Catalina_front.jpg
© 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B
2 21

21

Deployment
What does the answer look like?

Buy! 42

Image Attribution: https://commons.wikimedia.org/wiki/File:Sotheby%27s_auctioneer_Adrian_Biddell_(6440873251).jpg


https://commons.wikimedia.org/wiki/File:Reading_metrics_dashboard_in_Superset.png

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 22

22

11
13/09/2022

What is data and where do we get it?


How do we get data?
• Measurement
• Observation
• Query
• Analysis

Image Attribution: https://commons.wikimedia.org/wiki/File:Data_types_-_en.svg


https://commons.wikimedia.org/wiki/File:Tape_measure_colored.jpeg
https://commons.wikimedia.org/wiki/File:Postgres_Query.jpg
https://commons.wikimedia.org/wiki/File:Voltmeter_and_ammeter.svg
https://commons.wikimedia.org/wiki/File:Reliance_Web_Client_-_example_of_the_visualization_of_the_tannery_technology.jpg
https://commons.wikimedia.org/wiki/File:FourMetricInstruments.JPG
https://commons.wikimedia.org/wiki/File:The_Argo_Merchant_oil_spill_-_a_preliminary_scientific_report_-_edited_by_Peter_L._Grose_and_James_S._Mattson;_NOAA_Environmental_Data_Service,_Center_for_Experiment_Design_and_Data
CRICOS code 00025B 23
_Analysis_(1977)_(19711748004).jpg

23

What is data and where do we get it?

Types of data:
• Quantitative vs Qualitative
• Numeric vs text
• Fact vs opinion
• Analysis

Image Attribution: https://commons.wikimedia.org/wiki/File:Average_daily_maximum_temperature_in_July_in_Australia.svg

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 24

24

12
13/09/2022

Typical mining data

Image Attribution: http://www.ccgalberta.com/pygeostat/plotting.html

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 25

25

Why data analytics?


Data generated by instrumentation and saved in databases is inherently a time series data.

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 26

26

13
13/09/2022

Why data analytics?


Data generated by instrumentation and saved in databases is inherently a time series data.

What are the implications of dealing with timeseries?

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 27

27

Why data analytics?


Data generated by instrumentation and saved in databases is inherently a time series data.

This data generation is a good example of the “data deluge”, and what is termed as the four V’s of Big Data:
volume, velocity, variety, and veracity. (The Four V's of Big Data, 2013; Laney, 2001)

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 28

28

14
13/09/2022

Why data analytics?


Data analytics is the science of analyzing raw data in order to make conclusions about that information
Value

Data Extraction

Data requirements, structure


Access to data and data collation

Time / Effort / Cost


Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 29

29

Why data analytics?


Data analytics is the science of analyzing raw data in order to make conclusions about that information
Value

Data Preparation

Cleaning bad data


Removing duplicates and errors
Data Extraction Curation of corrupted data
Checking accuracy of data
Data units and ranges
Data requirements, structure Validity of the data
Access to data and data collation Grouping relevant data

Time / Effort / Cost


Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 30

30

15
13/09/2022

Why data analytics?


Data analytics is the science of analyzing raw data in order to make conclusions about that information

Data Analysis
Value

Purpose of the analysis


Definition of the problem and
Data Preparation objectives
Defining appropriate analytical
technique
Cleaning bad data Analysing and interpreting data
Removing duplicates and errors Rescoping and manipulating
Data Extraction Curation of corrupted data analysing tools
Checking accuracy of data Generating insight from analysis
Data units and ranges
Data requirements, structure Validity of the data
Access to data and data collation Grouping relevant data

Time / Effort / Cost


Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 31

31

Why data analytics?


Data analytics is the science of analyzing raw data in order to make conclusions about that information

Application/Learning

Implementation of insights
Transferring the knowledge and
Data Analysis learning
Value

Embedding the knowledge by


adapting the processes
Purpose of the analysis Restart the data analytics process
Definition of the problem and
Data Preparation objectives
Defining appropriate analytical
technique
Cleaning bad data Analysing and interpreting data
Removing duplicates and errors Rescoping and manipulating
Data Extraction Curation of corrupted data analysing tools
Checking accuracy of data Generating insight from analysis
Data units and ranges
Data requirements, structure Validity of the data
Access to data and data collation Grouping relevant data

Time / Effort / Cost


Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 32

32

16
13/09/2022

Types of data analytics


Basic types of data analytics based on purpose

Descriptive
What did happen?

When did happen?

Time / Effort / Cost


Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 33

33

Types of data analytics


Basic types of data analytics based on purpose

Descriptive Diagnostic
What did happen? Why did it happened?

When did happen? How did it happened?

Time / Effort / Cost


Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 34

34

17
13/09/2022

Types of data analytics


Basic types of data analytics based on purpose

Descriptive Diagnostic Predictive


What did happen? Why did it happened? When is going to
happen?

When did happen? How did it happened? How it will happen?

Time / Effort / Cost


Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 35

35

Types of data analytics


Basic types of data analytics based on purpose

Descriptive Diagnostic Predictive Prescriptive


What did happen? Why did it happened? When is going to How to make it
happen? happen?

When did happen? How did it happened? How it will happen? How to manage it?

Time / Effort / Cost


Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 36

36

18
13/09/2022

Data analytics tools


Many software options available
• Excel + Vba
• Matlab
• R
• Python (SciPy, NumPy, pandas, etc)
• Power BI

All figures and analysis in these slides were created using Python code, but similar packages and tools are
available for any of the previously mentioned software

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 37

37

Cross-industry standard process for data mining

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 38

38

19
13/09/2022

APPCo’s Data analytics approach


Data
Modelling Visualisation

Machine Learning Expert Review

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 39

39

Step 0 – Data wrangling


Process of cleaning, validating, organising and combining one or more data sets so that the subsequent
processing steps are easily done.

Typical steps, but not limited to:


• Identification of goals
- Have a clear understanding of what your goal is. Without a clear goal, one is likely to struggle to make
a definitive conclusion or to get lost in the data

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 40

40

20
13/09/2022

Step 0 – Data wrangling


Process of cleaning, validating, organising and combining one or more data sets so that the subsequent
processing steps are easily done.

Typical steps, but not limited to:


• Identification of goals
• Validation
- Never assume that the data you have received is correct. Ideally each variable should be
systematically analysed, even with simple checks such as ensuring that percentage values lie in the
range 0 to 100

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 41

41

Step 0 – Data wrangling


Process of cleaning, validating, organising and combining one or more data sets so that the subsequent
processing steps are easily done.

Typical steps, but not limited to:


• Identification of goals
• Validation
• Conversion
- Standardising units of measurement, conversion of the data type (numeric, text, etc). For time series
data take care of time zones and daylight saving to ensure consistent time steps

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 42

42

21
13/09/2022

Step 0 – Data wrangling


Process of cleaning, validating, organising and combining one or more data sets so that the subsequent
processing steps are easily done.

Typical steps, but not limited to:


• Identification of goals
• Validation
• Conversion
• Missing Data
- For time series data with a fixed interval it is easy enough to check that data exists for each expected
time step, if not, report this (fill with standard value, 0, NaN, empty string, etc)

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 43

43

Step 0 – Data wrangling


Process of cleaning, validating, organising and combining one or more data sets so that the subsequent
processing steps are easily done.

Typical steps, but not limited to:


• Identification of goals
• Validation
• Conversion
• Missing Data
• Combining and organising
- Combined multiple files into a single dataset, ensuring a consistent master index. Organise your
dataset following a standard, we recommend following Tidy data format (Wickham 2014), one
observation per row, and single columns for each variable

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 44

44

22
13/09/2022

Step 0 – Data wrangling


Process of cleaning, validating, organising and combining one or more data
sets so that the subsequent processing steps are easily done.

Typical steps, but not limited to:


• Identification of goals
• Validation
• Conversion
• Missing Data
• Combining and organising
- Combined multiple files into a single dataset, ensuring a consistent
master index. Organise your dataset following a standard, we
recommend following Tidy data format (Wickham 2014), one
observation per row, and single columns for each variable

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 45

45

Sustainable Minerals Institute (SMI)


Julius Kruttschnitt Mineral Research Centre (JKMRC)

Basics of data cleaning

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 46

46

23
13/09/2022

Basics of data wrangling and cleaning


Why clean your data?
• Being prepared
• Ensure data is high quality
• Ensure all necessary data is available
• Helps to understand data
• Does the data make sense?

www.welldatalabs.com

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 47

47

Basics of data wrangling and cleaning


How to clean your data?
• Duplicate records
• Erroneous records
• Rectify inconsistencies
• Missing data
• Filtering data and outliers

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 48

48

24
13/09/2022

Basics of data wrangling and cleaning


Data Wrangling
• Merging multiple files
• Transforming variables
• Converting formats
• Structuring data
• Organising files
• Renaming file
https://www.twoeggz.com/int/5775819.html

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 50

50

Basics of data wrangling and cleaning


import pandas as pd
df = pd.read_excel(r"C:\data.xlsx")

Index <Values>
Columns <Variables>
mpg cylinders displacement horsepower weight acceleration model_year origin name

0 18 8 307 130 3504 12 70 usa chevrolet chevelle malibu

1 15 8 350 165 3693 11.5 70 usa buick skylark 320


Rows <Observations>

2 18 8 318 150 3436 11 70 usa plymouth satellite

3 16 8 304 150 3433 12 70 usa amc rebel sst

4 17 8 302 140 3449 10.5 70 usa ford torino

5 15 8 429 198 4341 10 70 usa ford galaxie 500

6 14 8 454 220 4354 9 70 usa chevrolet impala

7 14 8 440 215 4312 8.5 70 usa plymouth fury iii

8 14 8 455 225 4425 10 70 usa pontiac catalina

9 15 8 390 190 3850 8.5 70 usa amc ambassador dpl

10 15 8 383 170 3563 10 70 usa dodge challenger se


Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 52

52

25
13/09/2022

Basics of data wrangling and cleaning


Exploratory Data Analysis - Understand your data

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 53

53

Basics of data wrangling and cleaning

df_daily = df.resample('1D').mean()

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 54

54

26
13/09/2022

Basics of data wrangling and cleaning


Different graphs allow for investigating the data

• Histograms
- Removes time dependence, so be careful
- Breakdown into chunks to check consistency

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 57

57

Basics of data wrangling and cleaning


What question(s) am I trying to answer?
• Is A better than B?
• Which month was best?
• Is X related to Y?
• Which equipment performs the worst?

What am I trying to show /convey in my visualisation?


• Should be evident at a glance
• Clear and correct labels
• Shouldn’t need someone to explain it

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 58

58

27
13/09/2022

Data visualisation
Seaborn example gallery

https://seaborn.pydata.org/examples/index.html

Process modelling and simulation | Lecture 19-Data Analytics © 2022 Julius Kruttschnitt Minerals Research Centre (JKMRC) CRICOS code 00025B 59

59

Thank you
Mohsen Yahyaei | Professor
Director
Julius Kruttschnitt Mineral Research Centre (JKMRC)
Program Leader
Future Autonomous Systems and Technologies Research Program
[email protected]
T: 07 3346 5989

Sustainable Minerals Institute (SMI)


Julius Kruttschnitt Mineral Research Centre (JKMRC)

facebook.com/uqsmi

twitter.com/smi_uq

linkedin/school/sustainable-minerals-institute

65

28

You might also like