0% found this document useful (0 votes)
94 views26 pages

Report On Wavelabs Pipeline

This document provides an overview of data science and its application to renewable energy. It discusses the stages of data science including problem identification, data collection, data preparation, modeling, and evaluation. It then describes how data science can be used in renewable energy for improving technology, predicting energy consumption and production, optimizing production based on weather data, reducing costs through forecasting, and providing efficient backup facilities for power plants. Python is identified as a key programming language used in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views26 pages

Report On Wavelabs Pipeline

This document provides an overview of data science and its application to renewable energy. It discusses the stages of data science including problem identification, data collection, data preparation, modeling, and evaluation. It then describes how data science can be used in renewable energy for improving technology, predicting energy consumption and production, optimizing production based on weather data, reducing costs through forecasting, and providing efficient backup facilities for power plants. Python is identified as a key programming language used in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

A REPORT

ON

PIPELINE

YAGANTI SIVAKRISHNA
2017A7PS0045P

AT

WAVELABS TECHONOLOGY, HYDERABAD

A Practice School-I station of

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE,


PILANI
Acknowledgements

We would like to acknowledge the support rendered by our instructor Prof.


Mohammad Saleem Bagewadi
We thank wavelabs sincerely for initiating this report and for motivating us to
successfully complete it. We would like to express our gratitude to all the guides of
the respective departments, who were allotted to us, for providing us with the
necessary documents to prepare the report. I would also like to thank Venkateswara
Rao (Senior Manager at wavelabs technology, Hyderabad ) for assigning us this
project and guiding us through the initial stages of the project

1
BIRLA INSTITUTE OF TECHNOLOGY &
SCIENCE
PILANI (RAJASTHAN)
Practice School Division
Station : WAVELABS TECHNOLOGY, HYDERABAd
Date of submission : 10th March 2021
Title of the project : Learning Pipeline using python and AWS

NAME: YAGANTI SIVAKRISHNA ID:2017A7PS0045P

Name of PS Faculty : Dr. Mohammad Saleem Bagewadi

2
Contents
1 Data Science Introduction...............................................................................................................4
1.1 Data Science................................................................................................................................4
1.2 Stages of Data Science.................................................................................................................4
1.3 Use of Data Science in Renewable energy..................................................................................6
2 Python Programming language basics............................................................................................8
2.1 Why only Python?.......................................................................................................................8
2.2 Data structures in Python.............................................................................................................9
2.3 Operators....................................................................................................................................13
2.4 Condition statements..................................................................................................................17
2.4.1 If else statements.....................................................................................................................17
2.4.2 elif statements.........................................................................................................................18
2.5 Loops in python.........................................................................................................................20
2.5.1 For loop...................................................................................................................................20
2.5.2 While loop...............................................................................................................................21
2.6 Module, Package and Functions................................................................................................22
3 Libraries in Python........................................................................................................................24
3.1 Matplotlib...................................................................................................................................25
3.2 Pandas........................................................................................................................................26
3.3 NumPy.......................................................................................................................................28
4 Data collection..............................................................................................................................29
5 Global Wind and Solar Production from 1990 to 2014................................................................31
6 Ethanol production from 1981 to 2019 in USA............................................................................44
Bibliography....................................................................................................................................56
List of Figures..................................................................................................................................57
List of Tables...................................................................................................................................59

3
Chapter 1

Data Science Introduction

1.1 Data Science


Data science is the field of data analytics and data visualization in which raw data or the
unstructured data is cleaned and made ready for the anlysis purpose. Data scientists use this
data to get the required information for the future purpose.[1] ”Data science uses many
processes and methods on the big data, the data may be structured or unstructured”. Data
frames available on the internet is the raw data we get. It may be either in unstructured or
semi structured format.This data is further filtered, cleaned and then number of required task
are performed for the analysis with the use of the high programming language. This data is
further analyzed and then presented for our better understanding and evaluation.

One must be clear that data science is not about making complicated models or making
awesome visualization neither it is about writing code but about using the data to create an
impact for your company, for this impact we need tools like complicated data models and data
visualization.

1.2 Stages of Data Science


There are many tools used to handle the big data available to us. [2] ”Data scientists use
programming tools such as Python, R, SAS, Java, Perl, and C/C++ to extract knowledge from
prepared data”.
Data scientists use many algorithms and mathematical models on the data.
Following are the stages and their cycle performed on the unstructured data.[3]

• Identifying the problem.

• Identify available data sources

• Identify available data sources

• Identify if additional data sources are needed.

• Statistical analysis

• Implementation, development

• Communicate results • Maintenance

4
Figure 1.1: 7 steps that together constitute this life-cycle model of Data science[3]

Data science finds its application in many fields. With the assistance of data science it is
easy to get the search query on search engines in plenty of time. A role of the data scientist is
to have a deep understanding of the data as well as a good command on the programming
language, he should also know how to work with the raw data extracted from the data source.
Many programming languages are used to analyze and evaluate the data such as Python, Java,
MATLAB, Scala, Julia, R., SQL and TensorFlow. Among which python is the most user
friendly and vastly used programming language in the field of data science.
This life cycle is applied in each and every field, in this project we will be considering all this
seven stages of data science to analyze the data. The process will be starting from data
collection, data preparation, data modeling and finally data evaluation. For instance, As we
have huge amount of data we can create an energy model for a particular country by
collecting its previous energy data, we can also predict the future requirement of it with the
same data.

1.3 Use of Data Science in Renewable energy


As the number of renewable energy systems are increasing the renewable energy data is
increasing through sensors and other aspects of energy systems. So again this big data can be
helpful in not even understanding the current scenario of the renewable energy sector but can
also be helpful in forecasting the renewable energy consumption as well as production both.

5
Following are the applications of data science which plays a major role in the filed of
renewable energy.

• Improving the current technology This is mostly used in the field of solar energy. Data
of solar panels are collected using sensors and by analysing that data pattern we can
improve the efficiency as well as life span of the particular solar panel.

• Renewable energy consumption prediction The consumption of renewable energy by


the customers can also be predicted with the help of past data of energy consumption by
the customers.This can be so helpful in fulfilling customers requirement in future.

• Renewable energy production forecasting Solar energy and wind energy production can
be optimized by considering the weather condition and environmental condition
data.With this data, forecasting can be easily done.

• Reducing Renewable Energy Production Costs With the help of the big energy data
available to us we are able to predict the production cost of renewable energy easily
from forecasting model. The price of the energies are declining just because of the big
data and forecasting model available to us.[4] Renewable energy will be cost competent
with its conventional counterparts.

• Efficient backup facility for power plants With the help of computational models we
can easily get the high and low power usage and when there is abundant power we can
save the power which may be wasted vice versa when there is shortage of power we can
provide with the help of our renewable energy systems.

Below is the figure shows the role of Data Science and Big Data Analytics in the
Renewable Energy Sector.

6
Figure 1.2: Role of Data Science and Big Data Analytics in the Renewable Energy Sector [5]

For instance in a wind energy project, we need a specific location where all the demands
of projects are fulfilled, and this demands can be captured by considering multiple data as well
as factors which will help in setting up the project. For example - weather data and wind data.

Chapter 2

Python Programming language basics

2.1 Why only Python?


”Python is an interpreted, object-oriented, high-level programming language with dynamic
semantics”.[6] This language consist of mainly data structures which make it very easy for the
data scientists to analyse the data very effectively. It does not only help in forecasting and
analysis it also helps in connecting the two different languages.Two best features of this
programming language is that it does not have any compilation step as compared to the other
programming language in which compilation is done before the program is being executed
and other one is the reuse of the code, it consist of modules and packages due to which we can
use the previously written code any where in between the program whenever is required.
There are multiple languages for example R., Java, SQL, Julia, Scala, MATLAB available in
market which can be used to analyze and evaluate the data, but due to some outstanding
features python is the most famous language used in the field of data science.

7
Python is mostly used and easy among all other programming languages is due to the
following reasons.

2.2 Data structures in Python


Data structures are the way of storing the data so that we can easily perform different
operations on the data whenever its required. When the data has been collected from the data
source the data is available in different forms. So later it is easy for the data scientists to
perform different operation on the data once it is sorted in to different data structures.
Data structures are mainly classified in to two categories and then further their subcategories
shown below.

1. Primitive Data Structures.


They are also called as basic data structures.This type of data structures contains simple
values of the data.[7]

• Integers - All the whole numbers from negative infinity to positive infinity comes under
integer data types. for example 4,9,-2,-6.

• Float - The decimal figure numbers or rational numbers comes under float data types.
for example 3.1,2.2,8.96

• Strings - Collection of alphabets or characters are called strings. We enclose the string
either in single or double quotes in python. for example ’hello’ and ”bread”.

• Boolean- These are the built in data types which take two values that are ’True’ and
’False’. True represents the 1 and False represents 0 in python.

2. Non-Primitive Data Structures


These are the derived type or reference variables data structures. They are called derived data
structures because they derived from the basic data structures such as integer and float.
Python has mainly five types of data structures. Following are the non primitive data
structures.
Array - Array is the collection of data types of same type. Arrays data structures are used
mostly in the NumPy library of python. In the below example we have first imported the
package array from numpy library and defined the array as variable arr then divided the array
by 7 and we have printed our array to get output.

8
Figure 2.1: Array example

List - ”A list is a value that contains multiple values in an ordered sequence”.[8]. Values in the
list referred to list itself, that is the value can be stored in a variable or passed to a function.
List are changeable and values in the list are enclosed inside a square bracket, we can perform
multiple operations such as indexing, slicing, adding and multiplying.

Figure 2.2: list example

Tuple - A tuple is a list of non changeable objects. The differences between tuples and lists
are that the tuples cannot be changed, tuples use parentheses, whereas list uses square
brackets.

9
Figure 2.3: tuple example

Dictionary- These are nothing but a type of data structure which consist of key value pairs
enclosed in the curly brackets. It is same as the any dictionary we use in day to day life in
which we find the meaning of the particular words. So if I compare normal dictionary to this
python dictionary data structure then the a word in a dictionary will be our key and its
meaning will be the value of the dictionary. In the figure name, occupation and hobby are the
keys and Suraj, data analyst and vlogging are the values assigned to the keys.

Figure 2.4: dictionary example

Sets - Set are used for calculating mathematical operation such as union, intersection and
symmetric difference.

Below is the data structure tree which explains the category and sub-category of each data
types.

10
Figure 2.5: A data structure tree at glance
[7]

2.3 Operators
OPERATORS - Operators are the symbols in python that are used to perform Arithmetic or
logical operations. Following are the different types of operators in python.

Arithmetic operators - Arithmetic operators carry out mathematical operations and they are
mostly used with the numeric values.

Arithmetic
operators

Operator Name Example

+ Addition A+B

- Subtraction A-B

* Multiplication A*B

/ Division A/B

% Modulus A%B

** Exponentiation A**B

// Quotient A//B

Table 2.1: Arithmetic operators


11
A and B are the numeric values.

Assignment operators - As the name decides this operators are used for assigning the values to
the variables.

A SSIGNMENT OPERATORS

Operator Example may also be


written

= a=6 a=6

+= a += 3 a=a+3

-= a -= 4 a=a-4

*= a *= 5 a=a*5

/= a /= 6 a=a/6

%= a %= 7 a=a%7

//= a //= 8 a = a // 8

**= a **= 9 a = a ** 9

&= a &= 1 a=a&1


Table 2.2: Assignment Operators

Here a is any value and number of operations are performed on this value.

Logical operators - These operators are used to join conditional statements

Logical Operators

Operator Description Example

and if both statements are true it x <5 and x


returns true <10

or if any of the two statement is x <4 or x


true it returns true <8

not if the result is true it reverses not (x <4


the result and gives false and x <8)
Table 2.3: Logical Operators

12
Here a is any value provided by us and on which multiple operations can be performed.
Comparison operators - These operators are used to compare two different values.

Comparison operators

Operator Name Example

== Equal a == b

!= Not equal a!=b

> Greater than a >b

< less than a <b

>= Greater than a>= b


equal to

<= less than equal to a <=b


Table 2.4: Comparison operators

Here a and b are two different values and these values are compared.

Membership operators - These operators are used to check membership of a particular value.It
is used to check whether a specific value is present in the object or not.

Membership operators

Operator Description Example

in it returns a True if the value a in b


is present inside the object

not in it returns a True if the value a not in b


is not present inside the
object
Table 2.5: Membership operators

13
2.4 Condition statements
2.4.1 If else statements
”The most common type of statement is the if statement. if statement consist of a block which
is called as clause”,[8] it is the block after if statement, it executed the statement if the
condition is true. The statement is omitted if the condition is False. then the statement in the
else part is printed

If statement consist of following -

• If keyword itself

• Condition which may be True or False

• Colon

• If clause or a block of code


Below is the figure shows how If and else statements are used with description inside it.

Figure 2.6: if else statement

2.4.2 elif statements


In this statement only one statement is executed, There are many cases in which there is only
one possibility to execute. ”The elif statement is an else if statement that always follows an if
14
or another elif statement”[8]. The elif statement provides another condition that is checked
only if any of the previous conditions were False. In code, an elif statement always consists of
the following:. The only difference between if else and elif statement is that in elif statement
we have the condition where as in else statement we do not have any condition. elIf statement
consist of following -

• elIf keyword itself

• Condition which may be True or False

• Colon

• elIf clause or a block of code

Below is the figure shows how elIf statement is used with description inside it.

15
Figure 2.7: elif example

2.5 Loops in python


2.5.1 For loop
When do we use for loops ?
for loops are traditionally used when you have a block of code which you want to repeat a
fixed number of times. The Python for statement iterates over the members of a sequence in
order, executing the block each time.[9]

Range statement - This statement ’range()’ is used with for loop statements where you can
specify one value. For example, if you specify 10, the loop statement starts from 1 and ends
with 9, which is n-1. Also, you can specify the start and end values. The following examples
demonstrate loop statements.

16
Figure 2.8: for example with range statement

2.5.2 While loop


While loops are used for repeating the section of code but not same as for loop, the while loop
does not run n times, but until a defined condition is no longer met. If the condition is initially
false, the loop body will not be executed at all.

17
Figure 2.9: While loop example

2.6 Module, Package and Functions


• Module
Modules are Python files which has extension as .py. The name of the module will be
the name of the file. A Python module can have a set of functions, classes or variables
defined and implemented.

Module has some python codes, this codes can define the classes, functions and
variables.The reason behind using the module is that it organizes your python code by
grouping the python code so that it is easier to use.

• Package
A package consist of the collection of modules in which python codes are written with
name init.py. It means that each python code inside of the python path, which contains

18
a file named init.py, will be treated as a package by Python. Packages are used for
organizing the module by using dotted names.

for example -
We have a package named simple package which consist of two modules a and b. We

will import the module from package in following way. from simple package import a,

Figure 2.10: packages example [10]

• Functions

A function is a python code which can be reused at any anytime in the whole python
code. Function performs specific task whenever it is called during the program.With the
help of function the program is divided in to multiple codes.

• Built in functions - The functions which are already in the python programming and
have specific action to perform are called as built in functions. This function are
immutable. Some examples of this functions are chr() - used to get string print() - used
to print an object in terminal min() - used to get minimum value in terminal

• User defined functions - This functions are user to defined functions and it starts with
the key word ’def’ as shown in the example below. We have defined the function names
temperature and its task to be performed when called. Below is the example of it.

19
Figure 2.11: function example

Chapter 3

Libraries in Python
Python library is vast. There are built in functions in the library which are written in C
language. This library provide access to system functionality such as file input output and that
is not accessible to Python programmers. This modules and library provide solution to the
many problems in programming.

Following are some Python libraries.


Matplotlib
Pandas
TensorFlow
Numpy
Keras
PyTorch
LightGBM

20
Eli5
SciPy

3.1 Matplotlib
”Matplotlib is a plotting library for the Python programming language and its numerical
mathematics extension NumPy”[11]. Matlab provides an application that is used in graphical
user interface tool kits. Another such libraby is pylab which is almost same as MATLAB.

It is a library for 2D graphics, it finds its application in web application servers, graphical user
interface toolkit and shell.Below is the example of a basic plot in python.

Figure 3.1: Matplotlib basic example

21
3.2 Pandas
Pandas is also a library or a data analysis tool in python which is written in python
programming language. It is mostly used for data analysis and data manipulation. It is also
used for data structures and time series.
We can see the application of python in many fields such as - Economics, Recommendation
Systems - Spotify, Netflix and Amazon, Stock Prediction, Neuro science, Statistics,
Advertising, Analytics, Natural Language Processing. Data can be analyzed in pandas in two
ways -
Data frames - In this data is two dimensional and consist of multiple series. Data is always
represented in rectangular table.
Series - In this data is one dimensional and consist of single list with index.

22
Figure 3.2: series and data frame in pandas

3.3 NumPy
”NumPy is a library for the Python programming language, adding support for large,
multidimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays”. The previous similar programming of
NumPy is Numeric, and this language was originally created by Jim Hugunin with
contributions from several other developers. In 2005, Travis Oliphant created NumPy by
incorporating features of the competing Numarray into Numeric, with extensive
modifications. [12] It is an open source library and free of cost.

23
Figure 3.3: NumPy basic example

24
Bibliography
[1] Data science https://en.wikipedia.org/wiki/Data_science Accessed on
27-06-2020.

[2] A book on data science by Dr. Ossama Embarak, https://www.academia.edu/


37886932/Data_Analysis_and_Visualization_Using_Python_-_Dr.
_Ossama_Embarak.pdf Accessed on 27-06-2020.

[3] A blog on quora https://www.quora.com Accessed on 27-06-2020.

[4] Smart data collective site https://www.smartdatacollective.com Accessed on 28-06-


2020

[5] A blog on by Grenoble School of business https://www.stoodnt.com/index.php/blog/


Accessed on 28-06-2020.

[6] Python website https://www.python.org/doc/essays/blurb/ Accessed on


29-06-2020.

[7] Data camp tutorial https://www.datacamp.com/community/tutorials/ data-structures-


python#adt Accessed on 29-06-2020

[8] AutomAte the Boring Stuff with Python Practical Programming for total Beginners
(Author AL SWEIGART) Accessed on 29-06-2020

[9] https://wiki.python.org/ Accessed on 03-07-2020

[10] https://www.python-course.eu/python3_packages.php Accessedon


03-07-2020

[11] Matplotlib https://en.wikipedia.org/wiki/Matplotlib Accessed on 0407-2020

[12] Numpy online https://en.wikipedia.org/wiki/NumPy Accessed on 07-07-


2020

25

You might also like