TP LANGAGES DE
PROGRAMMATION ÉVOLUÉ
BUSINESS INTELLIGENCE
Par : Kamel BENRAIS
Top des langages de programmation
pour data science
1. Python
2. R
3. SQL
4. Java
5. Julia
6. Scala
7. C et C++
8. Javascript
9. Swift
10. Go
Python
Python dispose d’un riche écosystème de
bibliothèques. De ce fait, il peut effectuer toutes les
tâches de data science. Cela englobe toutes sortes
d’opérations, de prétraitement des données, de la
visualisation et de l’analyse statistique. Tous les
types de déploiement de modèles d’apprentissage
automatique et d’apprentissage en profondeur
s’ajoutent à cette liste.
The Top 4 ETL Python Frameworks
Bonobo
Bubbles
Pygrametl
Mara
Getting started with Bonobo
Exigences
Vous devez disposer d’un environnement
python3.5+ fonctionnel. Linux et OSX ont un support
premium, tandis que les environnements Windows
sont pris en charge sur la base du meilleur effort.
Installer
C:>pip installer bonobo
Créez votre premier environnement
virtuel
mkdir project
Cd project
python -m venv env
Cd env\Scripts
[Link]
Instruction de base
Structure de donnee Iterations.
x = [20,30,60]
for i in range(len(x)):
y = {'Name':'Osama', 'Age':40}
print(i)
z = {20,30,50}
c = (20,30,60)
Fonction
Contrôle de flux
x=5 def test():
if x < 10: x = "Hello"
print('Ok') return x
else: print(test())
print('Not')
Module
import math as x
data = [Link](3,2)
print(data)
************** import requests
import random x=
data = [Link](0,10) [Link]('[Link]
print(data) [Link]')
*************
import random print([Link])
data = [Link](0,2) ******************
print(data) import subprocess as x
data = [Link]('[Link]')
import pandas as pd
data = pd.read_excel('[Link]')
print(data)
for i in data:
print(i)
print(data['Sal'])
import pandas as pd
df = pd.read_excel('[Link]')
print(df)
import pandas as x import PyPDF2 as x
data = x.read_csv('[Link]') file = open('[Link]', 'br')
print(data) reader = [Link](file)
page1 = [Link](0)
# print([Link]())
linelist = [] data = [Link]()
with open('[Link]', 'r') as f: print(type(data))
for i in f: print([Link]())
print([Link]())
dir(str)
Fichier html
import pandas as pd
data =
pd.read_html(‘[Link]
header=0)
data = data[0]
print(data)
import pandas as pd
data =
pd.read_html('[Link]
header=0)
data = data[8]
[Link]()
Connection Mysql
import pymssql as x
with open('connection_string.txt', 'r') as file:
server = [Link]().strip()
user = [Link]().strip()
password = [Link]().strip()
database = [Link]().strip()
con = [Link](server, user, password, database)
sql = 'select id, name, price from product'
cur = [Link]()
[Link](sql)
for i in cur:
print('============')
for j in i:
print(j)
Connexion au serveur de base de
données
pip install cx_Oracle
import cx_Oracle
dsn_tns = cx_Oracle.makedsn('Host Name', 'Port Number', service_name='Service Name')
conn = cx_Oracle.connect(user=r'User Name', password='Personal Password', dsn=dsn_tns)
c = [Link]()
[Link]('select * from [Link]')
for row in c:
print (row[0], '-', row[1])
[Link]()
VISUALISATION DES
DONNÉES
Visualisation
Graphe en courbe
grade = [70,90,80,65,70]
subject = ['Math', 'Marketing','Production', 'Programming','Accounting']
import [Link] as plt
[Link](subject,grade)
[Link]('Osama Hassan - Student Result')
[Link]()
Graphe en bar
[Link](subject,grade)
[Link]('Osama Hassan - Student Result')
[Link]()
##########
[Link](subject,grade)
[Link]('Osama Hassan - Student Result')
[Link]()
import pandas as pd
data = pd.read_csv('[Link]')
data
Premier projet
c:>pip install markupsafe==2.0.1
bonobo init [Link]
EXTRACTION DONNEE
germany = []
egypt = []
years = []
for i in range(len(data)):
if data['country'][i] == 'Germany':
[Link](data['year'][i])
x = data['population'][i] /1000000
[Link](round(x,0))
elif data['country'][i] == 'Egypt':
x= data['population'][i] / 1000000
[Link](round(x,0))
print(germany)
print(egypt)
print(years)
Visualisation
%matplotlib inline
[Link](years, germany)
[Link]('German Population since 1950')
[Link]()
%matplotlib inline
[Link](years, germany)
[Link](years,egypt)
[Link]('Egyptian Population Compared with Germany')
[Link](['German ', ' Egypt'])
[Link]()
%matplotlib inline
[Link](years, germany)
[Link](years,egypt)
[Link]('Egyptian Population Compared with Germany')
[Link](['German ', ' Egypt'])
[Link]()
%matplotlib inline
[Link](years, germany)
[Link](years,egypt)
[Link]('Egyptian Population Compared with Germany')
[Link](['German ', ' Egypt'])
[Link]()
%matplotlib inline
[Link](years, germany)
[Link](years,egypt)
[Link]('Egyptian Population Compared with Germany')
[Link](['German ', ' Egypt'])
[Link]()
import [Link] as plt
from matplotlib_venn import venn2
# First way to call the 2 group Venn diagram:
venn2(subsets = (10, 5, 2), set_labels = ('Group A', 'Group B'))
[Link]()
# Second way
venn2([set(['A', 'B', 'C', 'D']), set(['D', 'E', 'F'])])
[Link]()
Widescreen Test Pattern (16:9)
Aspect Ratio Test
(Should appear
circular)
4x3
16x9