It is possible to scrape http://storia.camera.it/, but the data are pretty large (~ 800MB for l. 9-16 as HTML files) and do not include l. 17, so that won't provide a unified method to get more and better data from all possible legislatures.
SPARQL endpoints
Example Camera query, with some missing values and multiple rows caused by multiple party affiliations and committee memberships (replace 00 by the legislature number):
SELECT DISTINCT
?url ?name ?surname ?born ?sex ?constituency
?party ?start ?end ?committee ?photo
WHERE {
?url ocd:rif_mandatoCamera ?mandato; a foaf:Person.
?d a ocd:deputato; ocd:aderisce ?aderisce;
ocd:rif_leg <http://dati.camera.it/ocd/legislatura.rdf/repubblica_00>;
ocd:rif_mandatoCamera ?mandato.
?d foaf:firstName ?name; foaf:surname ?surname.
OPTIONAL { ?d foaf:gender ?sex. }
OPTIONAL { ?d foaf:depiction ?photo. }
OPTIONAL {
?url <http://purl.org/vocab/bio/0.1/Birth> ?nascita.
?nascita <http://purl.org/vocab/bio/0.1/date> ?born.
}
OPTIONAL {
?mandato ocd:rif_elezione ?elezione.
?elezione dc:coverage ?constituency.
}
OPTIONAL {
?aderisce ocd:startDate ?start.
}
OPTIONAL {
?aderisce ocd:endDate ?end.
}
OPTIONAL {
?aderisce ocd:rif_gruppoParlamentare ?gruppo.
?gruppo dc:title ?party.
}
OPTIONAL {
?d ocd:membro ?membro.?membro ocd:rif_organo ?organo.
?organo dc:title ?committee.
}
}
Example Senato query (all senators from l. 9, with many fields set to 'optional' to avoid filtering out senators with missing data):
PREFIX osr: <http://dati.senato.it/osr/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?senatore ?nome ?cognome ?legislatura ?mandato
WHERE {
?senatore a osr:Senatore.
?senatore foaf:firstName ?nome.
?senatore foaf:lastName ?cognome.
?mandato osr:legislatura ?legislatura.
OPTIONAL { ?senatore osr:mandato ?mandato. }
OPTIONAL { ?senatore osr:dataNascita ?dataNascita. }
FILTER(?legislatura=9)
}
Example Camera query to get all bills sponsored by a particular MP (replace 00 by legislature and uid by the unique identifier that contains both the MP uid and the legislature; the query is limited to 10,000 signatures but that should not be an issue):
SELECT DISTINCT
?role ?ref ?date ?title
WHERE {
{
?atto ?ruolo ?deputato;
dc:date ?date;
dc:identifier ?ref;
dc:title ?title;
dc:type ?tipo.
FILTER(?ruolo = ocd:primo_firmatario)
}
UNION {
?atto ?ruolo ?deputato;
dc:date ?date;
dc:identifier ?ref;
dc:title ?title;
dc:type ?tipo.
FILTER(?ruolo = ocd:altro_firmatario)
}
## filter bills
FILTER(?tipo = 'Progetto di Legge')
?ruolo rdfs:label ?role.
## filter sponsor
?deputato ocd:rif_leg <http://dati.camera.it/ocd/legislatura.rdf/repubblica_00>
FILTER(REGEX(?deputato,'uid','i'))
}
All queries can be passed to the endpoint with httr to set the query element. The results are RDF files that can be parsed by xml2 as if they were HTML.
RDF data dumps
Camera
The dumps should be parsable with xml2, but coercing the RDF to an HTML structure will create errors that make the dumps unusable without extra software like Apache Jena, which is available from within R only through the Java-dependent rrdf package.
Senato
The dumps return very few senators on past legislatures (e.g. 7).
It is possible to scrape http://storia.camera.it/, but the data are pretty large (~ 800MB for l. 9-16 as HTML files) and do not include l. 17, so that won't provide a unified method to get more and better data from all possible legislatures.
SPARQL endpoints
Example Camera query, with some missing values and multiple rows caused by multiple party affiliations and committee memberships (replace
00by the legislature number):Example Senato query (all senators from l. 9, with many fields set to 'optional' to avoid filtering out senators with missing data):
Example Camera query to get all bills sponsored by a particular MP (replace
00by legislature anduidby the unique identifier that contains both the MP uid and the legislature; the query is limited to 10,000 signatures but that should not be an issue):All queries can be passed to the endpoint with
httrto set thequeryelement. The results are RDF files that can be parsed byxml2as if they were HTML.RDF data dumps
Camera
The dumps should be parsable with
xml2, but coercing the RDF to an HTML structure will create errors that make the dumps unusable without extra software like Apache Jena, which is available from within R only through the Java-dependent rrdf package.Senato
The dumps return very few senators on past legislatures (e.g. 7).