J’ai grandi en Europe, en France, dans un monde où l’adresse semblait aller de soi. Une maison était d’abord située sur une rue, puis placée dans la rue par un numéro. En ville, on apprenait très tôt à se repérer avec ce qu’on pourrait appeler la grammaire ordinaire de l’espace urbain : les numéros montent depuis le début de la voie, les pairs sont d’un côté, les impairs de l’autre, et l’ordre croissant suit en général une logique d’éloignement du centre. Pour aller du 56 dans une rue au 72, on sait qu’il faudra longer 16 maisons (ou immeubles) entre les deux. Dans les zones plus rurales, j’ai aussi connu une autre logique, moins intuitive au premier abord mais finalement très parlante : celle où le numéro ne dit plus seulement le rang d’une maison, mais une distance. Le 72 est maintenant 16 mètres plus loin, autrement dit, c’est peut être la maison à côté. Les guides français d’adressage parlent ainsi d’une numérotation métrique où le numéro correspond à la distance, en mètres, entre le début de la voie et l’habitation, avec un « point zéro » qui peut être la mairie ou l’église du village (comme le rappelle le Guide pratique pour un bon adressage des communes de Charente).
Continue reading La géométrie de la ville, les lignes et les pôlygones
Category Archives: Maps
Extracting information from a picture, round 1
This week, I wanted to get information I found on the nice map, below. I could not get access to the original dataset, per zip code… and I was wondering, if (assuming that the map was with high resolution) it was actually possible to extract information, using a simple R function…

As we can see, there is red, and green on the map, and I would love to know which are the green and the red cities, in France. One important issue is actually the background. Here it’s nice, it white… but white is a strange color, achromatic and very light. More specifically, if I search red areas, the background is very red. And very green, too. So, to avoid those issues, I did use gimp to change the background, into black. On the opposite, where it’s black, it’s neither red, nor green !

Let us get the map, and extract information from the file
url="https://freakonometrics.hypotheses.org/files/2018/12/inondation3.png" download.file(url,"inondation3.png") image="inondation3.png" library(pixmap) library(png) IMG=readPNG(image)
Information is stored in several matrices – or in arrays. Dimension 1 is the height of the picture (in pixels), dimension 2 is the width, and the third one is either 1 (red), 2 (green) or 3 (blue), based on the rgb decomposition of each pixel. Then, I try to find the border of the map
nl=dim(IMG)[1] nc=dim(IMG)[2] MAT=(IMG[,,1]+IMG[,,2])/2 x=apply(MAT,2,max) plot(x,type="l")

When it’s null, it means no color on the line of the matrix, i.e. completly black (initially, I used the mean function, but the maximum really behaves like a step function)
y=apply(MAT,1,max) plot(y,type="l")

Let us find cutoff values, on the left and on the right, on top and on the bottom
image(1:nc,1:nl,t(MAT)) abline(v=min(which(x>.2)),col="blue") abline(v=max(which(x>.2)),col="blue") abline(h=min(which(y>.2)),col="blue") abline(h=max(which(y>.2)),col="blue")
We obtain the following (forget about the fact that – somehow – France is upside-down)

We can zoom-in, just to make sure that our border are fine
par(mfrow=c(1,2)) image(min(which(x>.2))+(-5):5,1:nl,t(MAT)[min(which(x>.2))+(-5):5,]) abline(v=min(which(x>.2))+(-5):5,col="white") abline(v=min(which(x>.2)),col="blue") x1=min(which(x>.2))-1

and on the vertical range
image(max(which(x>.2))+(-5):5,1:nl,t(MAT)[max(which(x>.2))+(-5):5,]) abline(v=max(which(x>.2))+(-5):5,col="white") abline(v=max(which(x>.2)),col="blue") x2=max(which(x>.2))+1

So far so good. Let us keep the subpart of the picture,
image(x1:x2,y1:y2,t(MAT)[x1:x2,y1:y2])

Now, let us focus on the red part / component of that picture
ROUGE=t(IMG[,,1])[x1:x2,] ROUGE=ROUGE[,y2:y1] library(scales) image(x1:x2,y1:y2,ROUGE,col=alpha(colour=rgb(1,0,0,1), alpha = seq(0,1,by=.01))

That’s not bad, isn’t it ? And get can have a similar graph for the green part
VERT=t(IMG[,,2])[x1:x2,] VERT=VERT[,y2:y1] image(x1:x2,y1:y2,VERT,col=alpha(colour=rgb(0,1,0,1), alpha = seq(0,1,by=.01)))

Now, I wanted to ajust a map of France on that one. Using shapefiles of administrative regions, it would be possible to get the proportion of red and green parts (départements, cantons, etc). As a starting point (before going to ‘départements’), let us use a standard shapefile for France
library(maptools)
library(PBSmapping)
url="http://biogeo.ucdavis.edu/data/gadm2.8/rds/FRA_adm0.rds"
download.file(url,"FRA_adm0.rds")
FR=readRDS("FRA_adm0.rds")
library(maptools)
PP = SpatialPolygons2PolySet(FR)
PP=PP[(PP$X<=8.25)&(PP$Y>=42.2),]
u=(x1:x2)-x1
v=(y1:y2)-y1
ax=min(PP$X)
bx=max(PP$X)-min(PP$X)
ay=min(PP$Y)
by=max(PP$Y)-min(PP$Y)
PP$X=(PP$X-ax)/bx*max(u)
PP$Y=(PP$Y-ay)/by*max(v)
image(u,v,ROUGE,col=alpha(colour=rgb(1,0,0,1), alpha = seq(0,1,by=.01)))
points(PP$X,PP$Y)
We try here to rescale it. The left part should be on the left part of the picture as well as the right part. And the same holds for the top, and the bottom,

Unfortunately, even if we change the projection technique, I could not match perfectly the contour of France. I am quite sure that it’s a projection problem ! But I did try a dozen popular ones, with no success… so if anyone has a clever idea…
Some sort of Otto Neurath (isotype picture) map
Yesterday evening, I was walking in Budapest, and I saw some nice map that was some sort of Otto Neurath style. It was hand-made but I thought it should be possible to do it in R, automatically.
A few years ago, Baptiste Coulmont published a nice blog post on the package osmar, that can be used to import OpenStreetMap objects (polygons, lines, etc) in R. We can start from there. More precisely, consider the city of Douai, in France,

The code to read information from OpenStreetMap is the following
library(osmar) src = osmsource_api() bb = center_bbox(3.07758808135,50.37404355, 1000, 1000) ua = get_osm(bb, source = src) |
We can extract a lot of things, like buildings, parks, churches, roads, etc. There are two kinds of objects so we will use two functions
listek = function(vc,type="polygons"){ nat_ids = find(ua, way(tags(k %in% vc))) nat_ids = find_down(ua, way(nat_ids)) nat = subset(ua, ids = nat_ids) nat_poly = as_sp(nat, type)} listev = function(vc,type="polygons"){ nat_ids = find(ua, way(tags(v %in% vc))) nat_ids = find_down(ua, way(nat_ids)) nat = subset(ua, ids = nat_ids) nat_poly as_sp(nat, type)} |
For instance to get rivers, use
W=listek(c("waterway")) |
and to get buildings
M=listek(c("building")) |
We can also get churches
C=listev(c("church","chapel")) |
but also train stations, airports, universities, hospitals, etc. It is also possible to get streets, or roads
H1=listek(c("highway"),"lines") H2=listev(c("residential","pedestrian","secondary","tertiary"),"lines") |
but it will be more difficult to use afterwards, so let’s forget about those.
We can check that we have everything we need
plot(M) plot(W,add=TRUE,col="blue") plot(P,add=TRUE,col="green") if(!is.null(B)) plot(B,add=TRUE,col="red") if(!is.null(C)) plot(C,add=TRUE,col="purple") if(!is.null(T)) plot(T,add=TRUE,col="red") |

Now, let us consider a rectangular grid. If there is a river in a cell, I want a river. If there is a church, I want a church, etc. Since there will be one (and only one) picture per cell, there will be priorities. But first we have to check intersections with polygons, between our grid, and the OpenStreetMap polygons.
library(sp) library(raster) library(rgdal) library(rgeos) library(maptools) identification = function(xy,h,PLG){ b=data.frame(x=rep(c(xy[1]-h,xy[1]+h),each=2), y=c(c(xy[2]-h,xy[2]+h,xy[2]+h,xy[2]-h))) pb1=Polygon(b) Pb1 = list(Polygons(list(pb1), ID=1)) SPb1 = SpatialPolygons(Pb1, proj4string = CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0")) UC=gUnionCascaded(PLG) return(gIntersection(SPb1,UC)) } |
and then, we identify, as follows
whichidtf = function(xy,h){ h=.7*h label="EMPTY" if(!is.null(identification(xy,h,M))) label="HOUSE" if(!is.null(identification(xy,h,P))) label="PARK" if(!is.null(identification(xy,h,W))) label="WATER" if(!is.null(identification(xy,h,U))) label="UNIVERSITY" if(!is.null(identification(xy,h,C))) label="CHURCH" return(label) } |
Let is use colored rectangle to make sure it works
nx=length(vx) vx=as.numeric((vx[2:nx]+vx[1:(nx-1)])/2) ny=length(vy) vy=as.numeric((vy[2:ny]+vy[1:(ny-1)])/2) plot(M,border="white") for(i in 1:(nx-1)){ for(j in 1:(ny-1)){ lb=whichidtf(c(vx[i],vy[j]),h) if(lb=="HOUSE") rect(vx[i]-h,vy[j]-h,vx[i]+h,vy[j]+h,col="grey") if(lb=="PARK") rect(vx[i]-h,vy[j]-h,vx[i]+h,vy[j]+h,col="green") if(lb=="WATER") rect(vx[i]-h,vy[j]-h,vx[i]+h,vy[j]+h,col="blue") if(lb=="CHURCH") rect(vx[i]-h,vy[j]-h,vx[i]+h,vy[j]+h,col="purple") }} |

As a first start, we us agree that it works. To use pics, I did borrow them from https://fontawesome.com/. For instance, we can have a tree
library(png) library(grid) download.file("http://freakonometrics.hypotheses.org/files/2018/05/tree.png","tree.png") tree = readPNG("tree.png") |
Unfortunatly, the color is not good (it is black), but that’s easy to fix using the RGB decomposition of that package
rev_tree=tree rev_tree[,,2]=tree[,,4] |
We can do the same for houses, churches and water actually
download.file("http://freakonometrics.hypotheses.org/files/2018/05/angle-double-up.png","angle-double-up.png") download.file("http://freakonometrics.hypotheses.org/files/2018/05/home.png","home.png") download.file("http://freakonometrics.hypotheses.org/files/2018/05/church.png","curch.png") water = readPNG("angle-double-up.png") rev_water=water rev_water[,,3]=water[,,4] home = readPNG("home.png") rev_home=home rev_home[,,4]=home[,,4]*.5 church = readPNG("church.png") rev_church=church rev_church[,,1]=church[,,4]*.5 rev_church[,,3]=church[,,4]*.5 |
and that’s almost it. We can then add it on the map
plot(M,border="white") for(i in 1:(nx-1)){ for(j in 1:(ny-1)){ lb=whichidtf(c(vx[i],vy[j]),h) if(lb=="HOUSE") rasterImage(rev_home,vx[i]-h*.8,vy[j]-h*.8,vx[i]+h*.8,vy[j]+h*.8) if(lb=="PARK") rasterImage(rev_tree,vx[i]-h*.9,vy[j]-h*.8,vx[i]+h*.9,vy[j]+h*.8) if(lb=="WATER") rasterImage(rev_water,vx[i]-h*.8,vy[j]-h*.8,vx[i]+h*.8,vy[j]+h*.8) if(lb=="CHURCH") rasterImage(rev_church,vx[i]-h*.8,vy[j]-h*.8,vx[i]+h*.8,vy[j]+h*.8) }} |
Nice, isn’t it? (as least as a first draft, done during the lunch break of the R conference in Budapest, today).
Tous des (potentiels) terroristes ?
En fin de semaine dernière, je commencais surveiller et prévenir: L’ère de la pénalité prédictive de Nicolas Bourgoin, et au début, il évoque des mesures de sécurité renforcées dans un rayon de 20km des gares, des aéroports et des ports, qui pourraient être prises dans le cadre de la réforme de lois sur la sécurité intérieure. Cette information était reprise récemment sur rtl,
Le périmètre est aussi agrandi : les contrôles pourront avoir lieu aux abords des gares internationales (et non plus seulement à l’intérieur) ainsi que dans un rayon de 20 kilomètres autour des aéroports et des ports.
ou sur le site de l’obs
ces opérations de contrôles seront mises en place “aux abords” de 373 gares, ports et aéroports, dans un rayon de 20 kilomètres. Une extension considérable puisque jusqu’à présent, ces contrôles restaient cantonnées à l’intérieur de ces espaces accessibles au public.
En lisant le livre, je trouvais incroyable cette histoire de 20km, car tout le monde habite à moins de 20km d’une gare (ne sachant trop quelles pouvaient être ces 373 gares, je pars du fait que toutes les gares pourraient être concernées). C’est un peu ce que note la cimade, en écrivant
Le projet de loi prévoit ainsi de permettre les contrôles d’identité aux frontières pour une durée de 12 heures (contre 6 aujourd’hui), de les élargir « aux abords » de 373 gares, ports et aéroports, ainsi que dans un rayon de 20 km des 118 points de passages frontaliers. Bien au-delà des simples frontières de l’Hexagone, c’est presque tout le territoire qui est couvert. Le dispositif porte ainsi une atteinte disproportionnée aux droits et libertés des personnes. Des villes entières, comme Paris et toute la région Île-de-France, Lyon, Nantes, Rennes, Bordeaux, Montpellier, Toulouse ou Marseille seraient soumises à un régime de légalisation du contrôle au faciès. Des personnes assimilées par la police comme étant étrangères, quelle que soit leur situation en France, risquent ainsi d’être les victimes de ces contrôles d’identité.
Etant d’un naturel (très) sceptique, j’ai voulu vérifier par moi même, non seulement en terme de surface (ce qui est dit ici) mais surtout en terme de population. La première étape est de récupérer la liste des gares géolocalisées, sur https://ressources.data.sncf.com/. On peut aussi récupérer la liste des aéroports sur https://www.data.gouv.fr/ si on veut aller plus loin. Mais contentons nous des gares pour l’instant. La seconde étape est de récupérer les contours des communes et leur population. En fait, plus que la superficie, ce qui m’intéresse le plus, c’est le nombre de personnes qui y habitent. On peut trouver un tel fichier sur https://www.data.gouv.fr/ là encore. Mais commencons par charger tous nos packages de cartographie,
library(maptools) library(rgeos) library(rgdal) library(ggplot2) library(plyr) library(maptools)
On peut ensuite récupérer les données des gares
loc = "/home/arthur/referentiel-gares-voyageurs.shp" gare = readOGR(loc)
Pour regarder où sont ces gares, on récupère un fond de carte
Proj = CRS("+proj=longlat +datum=WGS84")
France = readShapePoly("departements-20140306-100m.shp", verbose=TRUE, proj4string=Proj)
metropolitaine = as.character(1:95)
metropolitaine[1:9] = paste("0",1:9,sep="")
France = France[France$code_insee%in%metropolitaine,]
On utilise aussi une base avec les contours des communes,
loc2 = "/home/arthur/communes-20150101-5m.shp"
communes_lim = readOGR(loc2)
communes_lim = spTransform(communes_lim, CRS("+proj=longlat +datum=WGS84"))
et une base pour la population des communes
base = read.csv( "http://freakonometrics.free.fr/popfr19752010.csv", header=TRUE) base$insee = base$dep*1000+base$com
Cette fois on est prêt. Considérons un rayon de 20km
r=20
On va constituer tous les polygônes correspondant à une cercle de rayon 20km, centré sur les gares francaises
PL = data.frame(i=1:(3000*120),id=rep(1:3000,each=120),lon=NA,lat=NA)
for(i in 1:nrow(gare)){
x=as.numeric(as.character(gare$longitude_w[i]))
y=as.numeric(as.character(gare$latitude_wg[i]))
vx=c(x+u*r/111,x+rev(u)*r/111)
vy=c(y+v*r/111,y-rev(v)*r/111)
polygon(vx,vy,border=NA,col="blue",pch=19)
PL[PL$id==i,2:4]=data.frame(id=i,lon=vx,lat=vy) }
(comme 1 degré fait environ 111km). On doit alors bricoler un peu pour constituer une collection de polygones, que l’on pourra ensuite manipuler à notre guise
PL=PL[!is.na(PL$lat),]
PLdf=PL[,c(3,4,2)]
PLdf[,3]=as.factor(PLdf[,3])
PL_list <- split(PLdf, PL$id)
PL_list <- lapply(PL_list,
function(x) { x["id"] <- NULL; x })
PPL <- sapply(PL_list, Polygon)
PPL1 <- lapply(seq_along(PPL),
function(i) Polygons(list(PPL[[i]]),
ID = names(PL_list)[i] ))
PPLs <- SpatialPolygons(PPL1,
proj4string = CRS("+proj=longlat +datum=WGS84") )
Si on visualise ces périmètres de sécurité, on obtient
G=gUnionCascaded(PPLs) F=gUnionCascaded(France) FG=gIntersection(F,G) plot(F) plot(G,add=TRUE,col="blue")

soit, si on regarde l’intersection entre les deux,
plot(F) plot(FG,add=TRUE)

Pour trouver la population dans cette vaste région, on va supposer la population uniformément répartie sur une commune. Et on regarde la proportion de la surface de la commune qui est à moins de 20km d’une gare (c’est en gros la technique qu’on utilisait dans Kernel density estimation based on Ripley’s correction pour corriger d’effet de bords dans du lissage spatial, sur des cartes, avec Ewen Gallic). Bref, pour une commune donnée, on a le code suivant, qui permet de récuperer la proportion de la surface située à moins de 20km d’une gare, et sa population
f=function(i){
insee=as.numeric(as.character(communes_lim@data$insee[i]))
POPULATION=base[base$insee==insee,"pop_2010"]
B_list=list()
for(j in 1:length(communes_lim@polygons[[1]]@Polygons)){ B_list[[j]]=data.frame(communes_lim@polygons[[i]]@Polygons[[j]]@coords,id=j)}
B_list <- lapply(B_list,function(x) { x["id"] <- NULL; x })
BL <- sapply(B_list, Polygon)
BL1 <- lapply(seq_along(BL), function(i) Polygons(list(BL[[i]]),ID = names(PL_list)[i] ))
BLs <- SpatialPolygons(BL1, proj4string = CRS("+proj=longlat +datum=WGS84") )
t=try(FGB<-gIntersection(BLs,FG),silent=TRUE) t1=try(l<-length(BLs@pointobj@coords),silent=TRUE)
if((!inherits(t1, "try-error"))){
a_list=list()
for(j in 1:length(BLs@pointobj@coords)){
a_list[[j]]=BLs@polyobj@polygons[[j]]@area}
a1=sum(unlist(a_list))}
if((inherits(t1, "try-error"))){
a1=BLs@polygons[[1]]@area}
a2=0
if(!is.null(FGB)){
t2=try(l<-length(FGB@pointobj@coords),silent=TRUE) if((!inherits(t2, "try-error"))){ a_list=list() for(j in 1:length(FGB@pointobj@coords)){ a_list[[j]]=FGB@polyobj@polygons[[j]]@area} a2=sum(unlist(a_list))} if((inherits(t2, "try-error"))){ a2=FGB@polygons[[1]]@area}} p=c(1,NA,0) if((!inherits(t, "try-error"))&(!is.null(t))&(length(POPULATION)==0)) p=c(a2/a1,a1,0) if((!inherits(t, "try-error"))&(!is.null(t))&(length(POPULATION)>0) ) p=c(a2/a1,a1,POPULATION)
cat(i,insee,sum(unlist(lapply(B_list,nrow))),p,"\n")
return(p)}
On n’a plus qu’à faire tourner sur notre 36000 communes,
F = lapply(unique(communes_lim_df$id),f)
On obtient alors
> F0=F[!is.na(F[,2]),] > sum(F0[,1]*F0[,2])/sum(F0[,2]) [1] 0.9031736 > F=F1 > sum(F[,1]*F[,3])/sum(F[,3]) [1] 0.9661656
Autrement dit, dans un rayon de 20km des gares (seulement), on a 90% du territoire, et plus de 95% de la population. Si on passe à 10km, on note que l’on couvre environ 75% du territoire, et plus de 90% de la population, et en passant à 5km, on couvre moins de 50% de la surface du territoire, et environ 75% de la population.


On retrouve des résultats du même ordre que ceux obtenus dans un précédant billet, qui établissait que 80% de la population francaise était à moins de 3km d’une agence bancaire. On pourrait bien sûr rajouter les ports, et les aéroports, et surprimer quelques gares… mais je doute qu’on ait une conclusion très différente… Après, les codes sont disponibles, il suffit de les adapter sur une autre base de centres de cercles…
Visualiser les résultats du premier tour
Juste quelques lignes de code, pour visualiser les résultats du premier tour en France. L’idée est de faire une carte assez minimaliste, avec des cercles centrés sur les centroïdes des départements. On commence par récupérer les données pour le fond de carte, un fichier 7z sur le site de l’ign.
download.file("https://wxs-telechargement.ign.fr/oikr5jryiph0iwhw36053ptm/telechargement/inspire/GEOFLA_THEME-DEPARTEMENTS_2016$GEOFLA_2-2_DEPARTEMENT_SHP_LAMB93_FXX_2016-06-28/file/GEOFLA_2-2_DEPARTEMENT_SHP_LAMB93_FXX_2016-06-28.7z",destfile = "dpt.7z")
On a dans ce fichier des informations sur les centroïdes
library(maptools)
library(maps)
departements<-readShapeSpatial("DEPARTEMENT.SHP")
plot(departements)
points(departements@data$X_CENTROID,departements@data$Y_CENTROID,pch=19,col="red")

Comme ça ne marche pas très bien, on va le refaire à la main, par exemple pour l’Ille-et-Vilaine,
pos=which(departements@data[,"CODE_DEPT"]==35)
Poly_35=departements[pos,]
plot(departements)
plot(Poly_35,col="yellow",add=TRUE)
departements@data[pos,c("X_CENTROID","Y_CENTROID")]
points(departements@data[pos,c("X_CENTROID","Y_CENTROID")],pch=19,col="red")
library(rgeos)
(ctd=gCentroid(Poly_35,byid=TRUE))
points(ctd,pch=19,col="blue")

Comme ça marche mieux, on va utiliser ces centroïdes.
ctd=as.data.frame(gCentroid(departements,byid=TRUE))
plot(departements)
points(ctd,pch=19,col="blue")

Maintenant, il nous faut les résultats des élections, par département. On peut aller scraper le site du ministère d’ intérieur. On a une page par département, donc c’est facile de parcourir. Par contre, dans l’adresse url, il faut le code région. Comme je suis un peu fainéant, au lieu de faire une table de correspondance, on teste tout, jusqu’à ce que ça marche. L’idée est d’aller cherche le nombre de voix obtenues par un des candidats.
candidat="M. Emmanuel MACRON"
library(XML)
voix=function(no=35){
testurl=FALSE
i=1
vect_reg=c("084","027","053","024","094","044","032","028","075","076","052",
"093","011","001","002","003","004")
region=NA
while((testurl==FALSE)&(i<=20)){
reg=vect_reg[i]
nodpt=paste("0",no,sep="")
# if(!is.na(as.numeric(no))){if(as.numeric(no)<10) nodpt=paste("00",no,sep="")}
url=paste("http://elections.interieur.gouv.fr/presidentielle-2017/",reg,"/",nodpt,"/index.html",sep="")
test=try(htmlParse(url),silent =TRUE)
if(!inherits(test, "try-error")){testurl=TRUE
region=reg}
i=i+1
}
tabs tab=tabs[[2]]
nb=tab[tab[,1]==candidat,"Voix"]
a<-unlist(strsplit(as.character(nb)," "))
as.numeric(paste(a,collapse=""))}
On peut alors tester
> voix(35)
[1] 84648
Comme ça semble marcher, on le fait pour tous les départements
liste_dpt=departements@data$CODE_DEPT
nbvoix=Vectorize(voix)(liste_dpt)
On peut alors visualiser sur une carte.
plot(departements,border=NA)
points(ctd,pch=19,col=rgb(1,0,0,.25),cex=nbvoix/50000)

Et on peut aussi tenter pour une autre candidate,
candidat="Mme Marine LE PEN"
et on obtient la carte suivante
plot(departements,border=NA)
points(ctd,pch=19,col=rgb(0,0,1,.25),cex=nbvoix/50000)

Localisation des accidents corporels
Encore un billet dans le cadre du projet R de la formation en Data Science pour l’Actuariat, avec quelques lignes de codes pour visualiser les accidents corporels, en France. Il y aura d’autres billets sur ces données dans les jours à venir. Le premier s’inspire du projet de Guillaume Gerber. L’idée est de normaliser par la population d’un département, quand on va visualiser le nombre d’accidents.
On va commencer par récupérer notre fond de carte
library(rgdal)
library(sp)
library(data.table)
library(dplyr)
library(plyr)
download.file(url = "http://professionnels.ign.fr/sites/default/files/GEOFLA_1-1_SHP_LAMB93_FR-ED111.tar.gz",
destfile="GEOFLA.tar.gz")
untar("GEOFLA.tar.gz")
Ces données contiennent la population. Mais par commune. Donc on va aggréger pour l’avoir par département
com <- readOGR(dsn = "./GEOFLA_1-1_SHP_LAMB93_FR-ED111/COMMUNES/", layer="COMMUNE", stringsAsFactors=FALSE,encoding = "UTF-8")
com <- readOGR(dsn = "/home/arthur/Téléchargements/projet-R-DSA-2016/GEOFLA_1-1_SHP_LAMB93_FR-ED111/COMMUNES/", layer="COMMUNE")
dep <- readOGR(dsn = "/home/arthur/Téléchargements/projet-R-DSA-2016/GEOFLA_1-1_SHP_LAMB93_FR-ED111/DEPARTEMENTS/", layer="DEPARTEMENT", stringsAsFactors=FALSE,encoding = "UTF-8")
pop <- as.data.table(tapply(X=com$POPULATION, INDEX = com$CODE_DEPT, FUN = sum), keep.rownames=TRUE)
colnames(pop) <- c("CODE_DEPT","POPULATION")
superficie <- as.data.table(tapply(X=com$SUPERFICIE, INDEX = com$CODE_DEPT, FUN = sum), keep.rownames=TRUE)
colnames(superficie) <- c("CODE_DEPT","SUPERFICIE")
dep@data <- inner_join(dep@data, pop)
dep@data <- inner_join(dep@data, superficie)
dep@data$POPULATION <- dep@data$POPULATION * 1000
On va ensuite récupérer les données d’accidents de la route
debut_url_base_accident = "https://www.data.gouv.fr/s/resources/"
acc_caract <- read.csv(file = paste(debut_url_base_accident,"base-de-donnees-accidents-corporels-de-la-circulation-sur-6-annees/20150806-155035/caracteristiques_2010.csv",sep=''),colClasses=c("com"="character","dep"="character"))
acc_caract <- rbind(acc_caract,read.csv(file = paste(debut_url_base_accident,"base-de-donnees-accidents-corporels-de-la-circulation-sur-6-annees/20150806-154723/caracteristiques_2011.csv",sep=""), colClasses=c("com"="character","dep"="character")))
acc_caract <- rbind(acc_caract,read.csv(file = paste(debut_url_base_accident,"base-de-donnees-accidents-corporels-de-la-circulation-sur-6-annees/20150806-154431/caracteristiques_2012.csv",sep=""), colClasses=c("com"="character","dep"="character")))
acc_caract <- rbind(acc_caract,read.csv(file = paste(debut_url_base_accident,"base-de-donnees-accidents-corporels-de-la-circulation-sur-6-annees/20150806-154105/caracteristiques_2013.csv",sep=""), colClasses=c("com"="character","dep"="character")))
acc_caract <- rbind(acc_caract,read.csv(file = paste(debut_url_base_accident,"base-de-donnees-accidents-corporels-de-la-circulation-sur-6-annees/20150806-153701/caracteristiques_2014.csv",sep=""), colClasses=c("com"="character","dep"="character")))
acc_caract <- rbind(acc_caract,read.csv(file = paste(debut_url_base_accident,"base-de-donnees-accidents-corporels-de-la-circulation/20160909-181230/caracteristiques_2015.csv",sep=""), colClasses=c("com"="character","dep"="character")))
acc_caract$dep[which(acc_caract$dep %in% "201")] <- "2A0"
acc_caract$dep[which(acc_caract$dep %in% "202")] <- "2B0"
acc_caract$dep <- substr(acc_caract$dep, 1, 2)
Maintenant, on peut compter, par année, par département (ou en aggrégeant, temporellement)
dep_with_nb_acc <- function(acc_caract, dep,nb_an=1,normalize=FALSE){
acc_nb_par_dep <- count(acc_caract,"dep")
acc_nb_par_dep$dep <- substr(acc_nb_par_dep$dep, 1, 2)
names(acc_nb_par_dep)[names(acc_nb_par_dep) == "dep"] <- "CODE_DEPT"
dep@data <- inner_join(dep@data, acc_nb_par_dep)
dep@data$SUPERFICIE <- dep@data$SUPERFICIE/100 # en km^2
dep@data$freq_par_hab <- (dep@data$freq/nb_an)/dep@data$POPULATION
m <- sum(dep@data$freq/nb_an)/sum(dep@data$POPULATION)
if(normalize) dep@data$freq_par_hab <- log(dep@data$freq_par_hab/m)
return(dep)
}
On va alors constituer nos bases,
data_plot <- c(
"2010_2015" = dep_with_nb_acc(acc_caract, dep,nb_an = 6),
"2010_2015_n" = dep_with_nb_acc(acc_caract, dep,nb_an = 6,normalize=TRUE))
La première correspond juste au nombre d’accident, sur 6 ans, normalisé par la population (ce qui pourraît être vu comme une fréquence d’accident corporel)
zmax = max(data_plot[[1]]@data$freq_par_hab)
spplot(obj = data_plot$'2010_2015',"freq_par_hab",at = seq(0, zmax, by = zmax/10),main = "")

mais on peut aussi normaliser par la fréquence nationale, afin de faire ressortir les départements les plus dangereux. On est également passé sur une échelle logarithmique,
zmin = min(data_plot[[8]]@data$freq_par_hab)
zmax = max(data_plot[[8]]@data$freq_par_hab)
spplot(obj = data_plot$'2010_2015_n',"freq_par_hab",at = seq(zmin, zmax, by = (zmax-zmin)/10),main = "")

Visualiser l’évolution de la taxe d’habitation
Dans le cadre du projet R de la formation en Data Science pour l’Actuariat, on continue à explorer les données sur data.gouv, par exemple https://data.gouv.fr/fr/datasets/impots-locaux/, sur les impôts locaux. C’est ce que Nicolas Jaudel avait proposé,
library(sp)
tax_hab=read.csv("http://freakonometrics.free.fr/F7815_taxe_hab.csv", col.names=c("CODDPT", "LIBDPT","E2001", "E2002","E2003","E2004","E2005", "E2006","E2007","E2008","E2009", "E2010"), header=FALSE, dec=",", sep=";", stringsAsFactors = FALSE)
On va extraire une information intéressante, par exemple la variation du taux moyen pour la taxe d’habitation, sur 10 ans,
tax_hab$taux_evo=tax_hab$E2010/tax_hab$E2001-1
Pour la carte des départements, on peut aller récupérer un fond de carte sur biogeo.ucdavis.edu.
download.file("http://freakonometrics.free.fr/FRA_adm2.rds","FRA_adm2.rds")
FR=readRDS("FRA_adm2.rds")
donnees_carte=data.frame(FR@data)
donnees_carte=cbind(donnees_carte,"taux_evo"=0)
for (i in 1:96){
donnees_carte[i,"taux_evo"]=tax_hab[tax_hab$CODDPT==donnees_carte$CCA_2[i],'taux_evo']*100
}
On a ainsi toutes les données, et on peut faire une carte
library(cartography)
cols <- rev(carto.pal(pal1 = "red.pal",n1 = 10,pal2="green.pal",n2=10))
plot(FR, col = "grey", border = "gray1", bg="#A6CAE0",xlim=c(-5.2,12))
choroLayer(spdf = FR,
df = donnees_carte,
var = "taux_evo",
breaks = seq(-30,70,5),
col = cols,
legend.pos = "topright",
legend.title.txt = "Evolution",
legend.values.rnd = 2,
add = TRUE)

A quelle distance d’une banque habite-t-on ?
Dans le cadre du projet de R de la formation en Data Science pour l’Actuariat, je vais continuer à mettre en ligne des morceaux de codes qui peuvent être utiles, dans un contexte spatial. Le dernier billet, sur cartographier le vote pour le Brexit, avait été repris (et bien amélioré) sur le site des voisins de rgeomatic. Aujourd’hui, je vais m’inspirer du travail d’Etienne Flichy qui mixe répartition de la population sur le territoire, et localisation des agences bancaires.
On parle des banques ici, mais si on a une base avec les coiffeurs, les boulangeries, etc, on peut faire la même chose ! (autant dire qu’on va pouvoir s’amuser quand la base sirene sera rendue ouverte – dans les semaines à venir). On va supposer que l’on a une base avec toutes les banques géocodées. Bon, pour l’exercice, on va utiliser la localisation des agences bancaires, en utilisant les données de cbanque.com. C’est assez facile d’aller scraper le site, quand on regarde la façon dont sont faites les pages, e.g. http://cbanque.com/pratique/agences/credit-cooperatif/35/. Là on récupère les adresses (postales) et on peut utiliser https://adresse.data.gouv.fr/csv/ (ou différents outils) pour géolocaliser les adresses.
Continue reading A quelle distance d’une banque habite-t-on ?
Non-Uniform Population Density in some European Countries
A few months ago, I did mention that France was a country with strong inequalities, especially when you look at higher education, and research teams. Paris has almost 50% of the CNRS researchers, while only 3% of the population lives there.
CNRS, "répartition des chercheurs en SHS" https://t.co/39dcJJBwrF, Paris 47.52% IdF 66.85% (pop 3.39% et 18.18% resp) pic.twitter.com/OsEXiFywPf
— Arthur Charpentier (@freakonometrics) 28 septembre 2015
It looks like Paris is the only city, in France. And I wanted to check that, indeed, France is a country with strong inequalities, when we look at population density.
Using data from sedac.ciesin.columbia.edu, it is possible to get population density on a small granularity level,
> rm(list=ls()) > base=read.table( + "/home/charpentier/glp00ag.asc", + skip=6) > X=t(as.matrix(base,ncol=8640)) > X=X[,ncol(X):1]
The scales for latitudes and longitudes can be obtained from the text file,
> #ncols 8640 > #nrows 3432 > #xllcorner -180 > #yllcorner -58 > #cellsize 0.0416666666667
Hence, we have
> library(maps) > world=map(database="world") > vx=seq(-180,180,length=nrow(X)+1) > vx=(vx[2:length(vx)]+vx[1:(length(vx)-1)])/2 > vy=seq(-58,85,length=ncol(X)+1) > vy=(vy[2:length(vy)]+vy[1:(length(vy)-1)])/2
If we plot our density, as in a previous post, on Where People Live,
> I=seq(1,nrow(X),by=10) > J=seq(1,ncol(X),by=10) > image(vx[I],vy[J],log(1+X[I,J]), + col=rev(heat.colors(101))) > lines(world[[1]],world[[2]])

we can see that we have a match, between the big population matrix, and polygons of countries.
Consider France, for instance. We can download the contour polygon with higher precision,
> library(rgdal) > fra=download.file( "http://biogeo.ucdavis.edu/data/gadm2.8/rds/FRA_adm0.rds", + "fr.rds") > Fra=readRDS("fr.rds") > n=length(Fra@polygons[[1]]@Polygons) > L=rep(NA,n) > for(i in 1:n) L[i]=nrow(Fra@polygons[[1]]@Polygons[[i]]@coords) > idx=which.max(L) > polygon_Fr= + Fra@polygons[[1]]@Polygons[[idx]]@coords > min_poly=apply(polygon_Fr,2,min) > max_poly=apply(polygon_Fr,2,max) > idx_i=which((vx>min_poly[1])&(vx<max_poly[1])) > idx_j=which((vy>min_poly[2])&(vy<max_poly[2])) > sub_X=X[idx_i,idx_j] > image(vx[idx_i],vy[idx_j], + log(sub_X+1),col=rev(heat.colors(101)), + xlab="",ylab="") > lines(polygon_Fr)

We are now able to extract information about population for France, only (actually, it is only mainland France, islands are not considered here… to avoid complicated computations
> library(sp) > xy=expand.grid(x = vx[idx_i], y = vy[idx_j]) > dim(xy) [1] 65730 2
Here, we have 65,730 small squares, in France.
> pip=point.in.polygon(xy[,1],xy[,2], + polygon_Fr[,1],polygon_Fr[,2])>0 > dim(pip)=dim(sub_X) > Fr=sub_X[pip] > sum(Fr) [1] 58105272
Observe that the total population within the French polygon is close to 60 million people, which is consistent with actual figures. Now, if we look more carefully at repartition over the French territory
> library(ineq) > Gini(Fr) [1] 0.7296936
Gini coefficient is rather high (over 70%), but it is also possible to visualize Lorenz curve,
> plot(Lc(Fr))

Observe that in 5% of the territory, we can find almost 54% of the population
> 1-min(LcF$L[LcF$p>.95]) [1] 0.5462632
In order to compare with other countries, consider the
> LC=function(rds="fr.rds"){ + Fra=readRDS(rds) + n=length(Fra@polygons[[1]]@Polygons) + L=rep(NA,n) + for(i in 1:n) L[i]=nrow(Fra@polygons[[1]]@Polygons[[i]]@coords) + idx=which.max(L) + polygon_Fr= + Fra@polygons[[1]]@Polygons[[idx]]@coords + min_poly=apply(polygon_Fr,2,min) + max_poly=apply(polygon_Fr,2,max) + idx_i=which((vx>min_poly[1])&(vx<max_poly[1])) + idx_j=which((vy>min_poly[2])&(vy<max_poly[2])) + sub_X=X[idx_i,idx_j] + xy=expand.grid(x = vx[idx_i], y = vy[idx_j]) + dim(xy) + pip=point.in.polygon(xy[,1],xy[,2], + polygon_Fr[,1],polygon_Fr[,2])>0 + dim(pip)=dim(sub_X) + Fr=sub_X[pip] + return(list(gini=Gini(Fr),LC=Lc(Fr)) + } > FRA=LC()
For instance, consider Germany, or Italy
> deu=download.file( "http://biogeo.ucdavis.edu/data/gadm2.8/rds/DEU_adm0.rds","deu.rds") > DEU=LC("deu.rds") > ita=download.file( "http://biogeo.ucdavis.edu/data/gadm2.8/rds/ITA_adm0.rds","ita.rds") > ITA=LC("ita.rds")
It is possible to plot Lorenz curve, together,
> plot(FRA$LC,col="blue") > lines(DEU$LC,col="black") > lines(ITA$LC,col="red")

Observe that France is clearly below the other ones. Compared with Germany, there is a significant difference
> FRA$gini [1] 0.7296936 > DEU$gini [1] 0.5088853
More precisely, if 54% of French people live in 5% of the territory, only 40% of Italians, and 32% of the Germans,
> 1-min(FRA$LC$L[FRA$LC$p>.95]) [1] 0.5462632 > 1-min(ITA$LC$L[ITA$LC$p>.95]) [1] 0.3933227 > 1-min(DEU$LC$L[DEU$LC$p>.95]) [1] 0.3261124
Where People Live, part 2
Following my previous post, I wanted to use another dataset to visualize where people live, on Earth. The dataset is coming from sedac.ciesin.columbia.edu. We you register, you can download the database
> base=read.table("glp00ag15.asc",skip=6)
The database is a ‘big’ 1440×572 matrix, in each cell (latitude and longitude) we have the population
> X=t(as.matrix(base,ncol=1440)) > dim(X) [1] 1440 572
The dataset looks like
> image(seq(-180,180,length=nrow(X)), + seq(-90,90,length=ncol(X)), + log(X+1)[,ncol(X):1],col=rev(heat.colors(101)), + axes=FALSE,xlab="",ylab="")

Now, if we keep only place where people actually live (i.e. removing cold desert and oceans) we get
> M=X>0 > image(seq(-180,180,length=nrow(X)), + seq(-90,90,length=ncol(X)), + M[,ncol(X):1],col=c("white","light green"), + axes=FALSE,xlab="",ylab="")

Then, we can visualize where 50% of the population lives,
> Order=matrix(rank(X,ties.method="average"), + nrow(X),ncol(X)) > idx=cumsum(sort(as.numeric(X), + decreasing=TRUE))/sum(X) > M=(X>0)+(Order>length(X)-min(which(idx>.5))) > image(seq(-180,180,length=nrow(X)), + seq(-90,90,length=ncol(X)), + M[,ncol(X):1],col=c("white", + "light green",col="red"), + axes=FALSE,xlab="",ylab="")

50% of the population lives in the red area, and 50% in the green area. More precisely, 50% of the population lives on 0.75% of the Earth,
> table(M)/length(X)*100 M 0 1 2 69.6233974 29.6267968 0.7498057
And 90% of the population lives in the red area below (5% of the surface of the Earth)
> M=(X>0)+(Order>length(X)-min(which(idx>.9))) > table(M)/length(X)*100 M 0 1 2 69.623397 25.512335 4.864268 > image(seq(-180,180,length=nrow(X)), + seq(-90,90,length=ncol(X)), + M[,ncol(X):1],col=c("white", + "light green",col="red"), + axes=FALSE,xlab="",ylab="")

Where People Live
There was an interesting map on reddit this morning, with a visualisation of latitude and longituge of where people live, on Earth. So I tried to reproduce it. To compute the density, I used a kernel based approch
> library(maps) > data("world.cities") > X=world.cities[,c("lat","pop")] > liss=function(x,h){ + w=dnorm(x-X[,"lat"],0,h) + sum(X[,"pop"]*w) + } > vx=seq(-80,80) > vy=Vectorize(function(x) liss(x,1))(vx) > vy=vy/max(vy) > plot(world.cities$lon,world.cities$lat,) > for(i in 1:length(vx)) + abline(h=vx[i],col=rgb(1,0,0,vy[i]),lwd=2.7)

For the other axis, we use a miror technique, to ensure that -180 is close the +180
> Y=world.cities[,c("long","pop")] > Ya=Y; Ya[,1]=Y[,1]-360 > Yb=Y; Yb[,1]=Y[,1]+360 > Y=rbind(Y,Ya,Yb) > liss=function(y,h){ + w=dnorm(y-Y[,"long"],0,h) + sum(Y[,"pop"]*w) + } > vx=seq(-180,180) > vy=Vectorize(function(x) liss(x,1))(vx) > vy=vy/max(vy) > plot(world.cities$lon,world.cities$lat,pch=19) > for(i in 1:length(vx)) + abline(v=vx[i],col=rgb(1,0,0,vy[i]),lwd=2.7)

Now we can add the two, on the same graph

Spatial and Temporal Viz of Gas Price, in France
A great think in France, is that we can play with a great database with gas price, in all gas stations, almost eveyday. The file is rather big, so let’s make sure we have enough memory to run our codes,
> rm(list=ls())
To extract the data, first, we should extract the xml file, and then convert it in a more common R object (say a list)
> year=2014 > loc=paste("http://donnees.roulez-eco.fr/opendata/annee/",year,sep="") > download.file(loc,destfile="oil.zip") Content type 'application/zip' length 15248088 bytes (14.5 MB) > unzip("oil.zip", exdir="./") > fichier=paste("PrixCarburants_annuel_",year, ".xml",sep="") > library(plyr) > library(XML) > library(lubridate) > l=xmlToList(fichier)
We have a large dataset, with prices, for various types of gaz, for almost any gas station in France, almost every day, in 2014. It is a 1.4Gb list, with 11,064 elements (each of them being a gas station)
> length(l) [1] 11064
There are two ways to look at the data. A first idea is to consider a gas station, and to extract the time series.
> time_series=function(no,type_gas="Gazole"){ + prix=list() + date=list() + nom=list() + j=0 + for(i in 1:length(l[[no]])){ + v=names(l[[no]]) + if(!is.null(v[i])){ + if(v[i]=="prix"){ + j=j+1 + date[[j]]=as.character(l[[no]][[i]]["maj"]) + prix[[j]]=as.character(l[[no]][[i]]["valeur"]) + nom[[j]]=as.character(l[[no]][[i]]["nom"]) + }} + } + id=which(unlist(nom)==type_gas) + n=length(id) + jour=function(j) as.Date(substr(date[[id[j]]],1,10),"%Y-%m-%d") + jour_heure=function(j) as.POSIXct(substr(date[[id[j]]],1,19), format = "%Y-%m-%d %H:%M:%S", tz = "UTC") + ext_y=function(j) substr(date[[id[j]]],1,4) + ext_m=function(j) substr(date[[id[j]]],6,7) + ext_d=function(j) substr(date[[id[j]]],9,10) + ext_h=function(j) substr(date[[id[j]]],12,13) + ext_mn=function(j) substr(date[[id[j]]],15,16) + prix_essence=function(i) as.numeric(prix[[id[i]]])/1000 + base1=data.frame(indice=no, + id=l[[no]]$.attrs["id"], + adresse=l[[no]]$adresse, + ville=l[[no]]$ville, + lat=as.numeric(l[[no]]$.attrs["latitude"]) /100000, + lon=as.numeric(l[[no]]$.attrs["longitude"]) /100000, + cp=l[[no]]$.attrs["cp"], + saufjour=l[[no]]$ouverture["saufjour"], + Y=unlist(lapply(1:n,ext_y)), + M=unlist(lapply(1:n,ext_m)), + D=unlist(lapply(1:n,ext_d)), + H=unlist(lapply(1:n,ext_h)), + MN=unlist(lapply(1:n,ext_mn)), + prix=unlist(lapply(1:n,prix_essence))) + + base1=base1[!is.na(base1$prix),] + + date_d=paste(year,"-01-01 12:00:00",sep="") + date_f=paste(year,"-12-31 12:00:00",sep="") + vecteur_date=seq(as.POSIXct(date_d, format = + "%Y-%m-%d %H:%M:%S"), + as.POSIXct(date_f, format = + "%Y-%m-%d %H:%M:%S"),by="days") + date=paste(base1$Y,"-",base1$M,"-",base1$D, + " ",base1$H,":",base1$MN,":00",sep="") + date_base=as.POSIXct(date, format = + "%Y-%m-%d %H:%M:%S", tz = "UTC") + idx=function(t) sum(vecteur_date[t]>=date_base) + vect_idx=Vectorize(idx)(1:length(vecteur_date)) + P=c(NA,base1$prix) + base2=ts(P[1+vect_idx], + start=year,frequency=365) + list(base=base1, + ts=base2) + }
To get the time series, extrapolation is necessary, since we have here observation at irregular dates. Here, for instance, for the second gas station, we get
> plot(time_series(2)$ts,ylim=c(1,1.6),col="red") > lines(time_series(2,"SP98")$ts,col="blue")

An alternative is to study gas price from a spatial perspective. Given a date, we want the price in all stations. As previously, we keep the last price observed, in each station,
> spatial=function(dt){ + base=NULL + for(no in 1:length(l)){ + prix=list() + date=list() + j=0 + for(i in 1:length(l[[no]])){ + v=names(l[[no]]) + if(!is.null(v[i])){ + if(v[i]=="prix"){ + j=j+1 + date[[j]]=as.character(l[[no]][[i]]["maj"]) + }} + } + n=j + D=as.Date(substr(unlist(date),1,10),"%Y-%m-%d") + k=which(D==D[which.max(D[D<=dt])]) + if(length(k)>0){ + B=Vectorize(function(i) l[[no]][[k[i]]])(1:length(k)) + if("nom" %in% rownames(B)){ + k=which(B["nom",]=="Gazole") + prix=as.numeric(B["valeur",k])/1000 + if(length(prix)==0) prix=NA + base1=data.frame(indice=no, + lat=as.numeric(l[[no]]$.attrs["latitude"]) /100000, + lon=as.numeric(l[[no]]$.attrs["longitude"]) /100000, + gaz=prix) + base=rbind(base,base1) + }}} + return(base)}
For instance, for the 5th of May, 2014, we get the following dataset
> B=spatial(as.Date("2014-05-05"))
To visualize prices, consider only mainland France (excluding islands in the Pacific, or close to the Caribeans)
> idx=which((B$lon>(-10))&(B$lon<20)& + (B$lat>35)&(B$lat<55)) > B=B[idx,] > Q=quantile(B$gaz,seq(0,1,by=.01),na.rm=TRUE) > Q[1]=0 > x=as.numeric(cut(B$gaz,breaks=unique(Q))) > CL=c(rgb(0,0,1,seq(1,0,by=-.025)), + rgb(1,0,0,seq(0,1,by=.025))) > plot(B$lon,B$lat,pch=19,col=CL[x])

Red dots are the most expensive gas stations, that particular day.
If we add contours of the French regions, we get
> library(maps) > map("france") > points(B$lon,B$lat,pch=19,col=CL[x])

We can also focus on some specific region, say the South of Brittany.
> library(OpenStreetMap) > map <- openmap(c(lat= 48, lon= -3), + c(lat= 47, lon= -2)) > map <- openproj(map) > plot(map) > points(B$lon,B$lat,pch=19,col=CL[x])

As we can see on that map, there are regions that are rather empty, where the closest gas station might be a bit far away. Actually, it is possible to add Voronoi sets on the map,
> dB=data.frame(lon=B$lon,lat=B$lat) > idx=which(!duplicated(dB)) > dB=dB[idx,]
which could help to get the price of the closest gaz station.
> library(tripack) > V <- voronoi.mosaic(dB$lon[id],dB$lat[id]) > plot(V,add=TRUE)

It is possible to plot each polygon with the color of the gaz station we add. Actually, it is a bit tricky, and I could not find a R function to to this. So I did it manually,
> plot(map) > P <- voronoi.polygons(V) > library(sp) > point_in_i=function(i,point) point.in.polygon(point[1],point[2],P[[i]][,1],P[[i]][,2]) > which_point=function(i) which(Vectorize(function(j) point_in_i(i,c(dB$lon[id[j]],dB$lat[id[j]])))(1:length(id))>0) > for(i in 1:length(P)) polygon(P[[i]],col=CL[x[id[which_point(i)]]],border=NA)

With this map, we can see that we have blue areas, i.e. all stations in a given area are cheap (because of competition), but in some places, a very expensive one is next to a very cheap one. I guess we should look closer at the dynamics… [to be continued….]
Playing with Leaflet (and Radar locations)
Yesterday, my friend Fleur did show me some interesting features of the leaflet package, in R.
library(leaflet)
In order to illustrate, consider locations of (fixed) radars, in several European countries. To get the data, use
download.file("http://carte-gps-gratuite.fr/radars/zones-de-danger-destinator.zip","radar.zip")
unzip("radar.zip")
ext_radar=function(nf){
radar=read.table(file=paste("destinator/",nf,sep=""), sep = ",", header = FALSE, stringsAsFactors = FALSE)
radar$type <- sapply(radar$V3, function(x) {z=as.numeric(unlist(strsplit(x, " ")[[1]])); return(z[!is.na(z)])})
radar <- radar[,c(1,2,4)]
names(radar) <- c("lon", "lat", "type")
return(radar)}
L=list.files("./destinator/")
nl=nchar(L)
id=which(substr(L,4,8)=="Radar" & substr(L,nl-2,nl)=="csv")
radar_E=NULL
for(i in id) radar_E=rbind(radar_E,ext_radar(L[i]))
(to be honest, if you run that code, you will get several countries, but not France… but if you want to add it, you should be able to do so…). The first tool is based on popups. If you click on a point on the map, you get some information, such as the speed limit where you can find a radar. To get a nice pictogram, use
fileUrl <- "http://evadeo.typepad.fr/.a/6a00d8341c87ef53ef01310f9238e6970c-800wi" download.file(fileUrl,"radar.png", mode = 'wb') RadarICON <- makeIcon( iconUrl = fileUrl, iconWidth = 20, iconHeight = 20)
And then, use to following code get a dynamic map, mentionning the variable that should be used for the popup
m <- leaflet(data = radar_E) m <- m %>% addTiles() m <- m %>% addMarkers(~lon, ~lat, icon = RadarICON, popup = ~as.character(type)) m

Because the picture is a bit heavy, with almost 20K points, let us focus only on France,
Construction de cartes minimalistes
Le week-end passé, suite à la publication de See the world differently with these minimalist maps par Matthew Champion, il y a eu pas mal d’activité autour des cartes minimalistes. En particulier, Reka (aka @visionscarto) et Philippe (aka @recifs) m’ont proposé de faire un billet pour Visions Carto sur la construction de ces cartes. Je suis flatté, même si je trouve ma contribution incroyablement modeste sur ce projet (et je me sens toujours humble face aux dessins superbes de Reka).

Je renvoie donc vers le billet Cartes Minimalistes pour plus de détails, mais pour les plus curieux, je rajoute deux cartes, plus française. La première correspond aux voies ferroviaires,
library(maptools)
setwd("/home/freakonometrics/Documents/data")
loc="http://www.mapcruzin.com/download-shapefile/france-railways-shape.zip"
download.file(loc,destfile="rail_france.zip")
unzip("rail_france.zip", exdir="./rail_france/")
shap=readShapeLines("./rail_france/railways.shp")
plot(shap,lwd=.7)

et la seconde, aux routes dans la région parisienne,
loc="http://www.mapcruzin.com/download-shapefile/france-roads-shape.zip"
download.file(loc,destfile="road_france.zip")
unzip("road_france.zip", exdir="./road_france/")
shap=readShapeLines("./road_france/roads.shp")
plot(shap,lwd=.7,ylim=48.85+c(-.5,.5),
xlim=2.35+c(-.5,.5))

La prochaine fois, j’expliquerais un peu comment corriger les shapefiles quand on a des soucis avec (je repense au commentaire qui disait que qu’il était dommage d’avoir une route entre le Royaume Uni et l’Islande).
Minimalist Maps
This week, I mentioned a series of maps, on Twitter,
some minimalist maps https://t.co/YCNPf3AR9n (poke @visionscarto) pic.twitter.com/Ip9Tylsbkv
— Arthur Charpentier (@freakonometrics) 2 Septembre 2015
Friday evening, just before leaving the office to pick-up the kids after their first week back in class, Matthew Champion (aka @matthewchampion) sent me an email, asking for more details. He wanted to know if I did produce those graphs, and if he could mention then, in a post. The truth is, I have no idea who produced those graphs, but I told him one can easily reproduce them. For instance, for the cities, in R, use
> library(maps)
> data("world.cities")
> plot(world.cities$lon,world.cities$lat,
+ pch=19,cex=.7,axes=FALSE,xlab="",ylab="")

It is possible to get a more minimalist one by plotting only cities with more than 100,000 unhabitants, e.g.,
> world.cities2 = world.cities[ + world.cities$pop>100000,] > plot(world.cities2$lon,world.cities2$lat, + pch=19,cex=.7,axes=FALSE,xlab="",ylab="")

For the airports, it was slightly more complex since on http://openflights.org/data.html#airport, 6,977 airports are mentioned. But on http://www.naturalearthdata.com/, I found another dataset with only 891 airports.
> library(maptools) > shape <- readShapePoints( + "~/data/airport/ne_10m_airports.shp") > plot(shape,pch=19,cex=.7)

On the same website, one can find a dataset for ports,
> shape <- readShapePoints( + "~/data/airport/ne_10m_ports.shp") > plot(shape,pch=19,cex=.7)

This is for graphs based on points. For those based on lines, for instance rivers, shapefiles can be downloaded from https://github.com/jjrom/hydre/tree/, and then, use
> require(maptools) > shape <- readShapeLines( + "./data/river/GRDC_687_rivers.shp") > plot(shape,col="blue")

For roads, the shapefile can be downloaded from http://www.naturalearthdata.com/
> shape <- readShapeLines( + "./data/roads/ne_10m_roads.shp") > plot(shape,lwd=.5)

Last, but not least, for lakes, we need the polygons,
> shape <- readShapePoly( + "./data/lake/ne_10m_lakes.shp") > plot(shape,col="blue",border="blue",lwd=2)

Nice, isn’t it? See See the world differently with these minimalist maps for Matthew Champion‘s post.