Ana I. Torre-Bastida

Followers

Following

Public Views

Address: Zamudio, Pais Vasco, Spain

less

Matthew J Weait

University of Oxford

Alexandra Delano

The New School University

Mena B Lafkioui

EHESS-Ecole des hautes études en sciences sociales

Alessandra La Fragola

Badiha Nahhass

UNIVERSITE MOHAMED V RABAT

Gianluca Correndo

University of Southampton

Hatice ŞİRİN

Ege University

Tuğba Özacar

Celal Bayar University

Övünç Öztürk

Celal Bayar University

Viacheslav Kuleshov

Stockholm University

InterestsView All (13)

Uploads

Papers by Ana I. Torre-Bastida

MLPacker: A Unified Software Tool for Packaging and Deploying Atomic and Distributed Analytic Pipelines

2022 7th International Conference on Smart and Sustainable Technologies (SpliTech)

K2E: Building MLOps Environments for Governing Data and Models Catalogues while Tracking Versions

2022 IEEE 19th International Conference on Software Architecture Companion (ICSA-C)

On demand translation for querying incompletely aligned datasets

More and more users aim at taking advantage of the existing Linked Open Data environment to formu... more More and more users aim at taking advantage of the existing Linked Open Data environment to formulate a query over a dataset and to then try to process the same query over different datasets, one after another, in order to obtain a broader set of answers. However, the heterogeneity of vocabularies used in the datasets on the one side, and the fact that the number of alignments among those datasets is scarce on the other, makes that querying task difficult for them. Considering this scenario we present in this paper a proposal that allows on demand translations of queries formulated over an original dataset, into queries expressed using the vocabulary of a targeted dataset. Our approach relieves users from knowing the vocabulary used in the targeted datasets and even more it considers situations where alignments do not exist or they are not suitable for the formulated query. Therefore, in order to favour the possibility of getting answers, sometimes there is no guarantee of obtaining...

Download

Concept Tracking and Adaptation for Drifting Data Streams under Extreme Verification Latency

When analyzing large-scale streaming data towards resolving classification problems, it is often ... more When analyzing large-scale streaming data towards resolving classification problems, it is often assumed that true labels of the incoming data are available right after being predicted. This assumption allows online learning models to efficiently detect and accommodate non-stationarities in the distribution of the arriving data (concept drift). However, this assumption does not hold in many practical scenarios where a delay exists between predicted and class labels, to the point of lacking this supervision for an infinite period of time (extreme verification latency). In this case, the development of learning algorithms capable of adapting to drifting environments without any external supervision remains a challenging research area to date. In this context, this work proposes a simple yet effective learning technique to classify non-stationary data streams under extreme verification latency. The intuition motivating the design of our technique is to predict the trajectory of concept...

A novel machine learning approach to the detection of identity theft in social networks based on emulated attack instances and support vector machines

Concurrency and Computation: Practice and Experience, 2015

The proliferation of social networks and their usage by a wide spectrum of user profiles has been... more The proliferation of social networks and their usage by a wide spectrum of user profiles has been specially notable in the last decade. A social network is frequently conceived as a strongly interlinked community of users, each featuring a compact neighborhood tightly and actively connected through different communication flows. This realm unleashes a rich substrate for a myriad of malicious activities aimed at unauthorizedly profiting from the user itself or from his/her social circle. This manuscript elaborates on a practical approach for the detection of identity theft in social networks, by which the credentials of a certain user are stolen and used without permission by the attacker for its own benefit. The proposed scheme detects identity thefts by exclusively analyzing connection time traces of the account being tested in a nonintrusive manner. The manuscript formulates the detection of this attack as a binary classification problem, which is tackled by means of a support vector classifier applied over features inferred from the original connection time traces of the user. Simulation results are discussed in depth toward elucidating the potentiality of the proposed system as the first step of a more involved impersonation detection framework, also relying on connectivity patterns and elements from language processing. Copyright © 2015 John Wiley & Sons, Ltd.

Datos abiertos enlazados (LOD) y su implantación en bibliotecas: iniciativas y tecnologías

El Profesional de la Información, 2015

Ana-Isabel Torre-Bastida es ingeniera informática por la Universidad de Deusto y máster en sistem... more Ana-Isabel Torre-Bastida es ingeniera informática por la Universidad de Deusto y máster en sistemas informáticos avanzados por la Universidad del País Vasco. Doctorando, centra sus investigaciones en la web semántica, especialmente en datos abiertos enlazados. Es investigadora colaboradora en el centro tecnológico Tecnalia, donde lleva a cabo trabajos de aplicación de las tecnologías semánticas y de big data a varios sectores.

Download

Incremental SPARQL Query Processing

Lecture Notes in Computer Science, 2013

The number of linked data sources available on the Web is growing at a rapid rate. Moreover, user... more The number of linked data sources available on the Web is growing at a rapid rate. Moreover, users are showing an interest for any framework that allows them to obtain answers, for a formulated query, accessing heterogeneous data sources without the need of explicitly specifying the sources to answer the query. Our proposal focus on that interest and its goal is to build a system capable of answering to user queries in an incremental way. Each time a different data source is accessed the previous answer is eventually enriched. Brokering across the data sources is enabled by using source mapping relationships. User queries are rewritten using those mappings in order to obtain translations of the original query across data sources. Semantically equivalent translations are first looked for, but semantically approximated ones are generated if equivalence is not achieved. Well defined metrics are considered to estimate the information loss, if any.

Download

Query Rewriting for an Incremental Search in Heterogeneous Linked Data Sources

Lecture Notes in Computer Science, 2013

Nowadays, the number of linked data sources available on the Web is considerable. In this scenari... more Nowadays, the number of linked data sources available on the Web is considerable. In this scenario, users are interested in frameworks that help them to query those heterogeneous data sources in a friendly way, so avoiding awareness of the technical details related to the heterogeneity and variety of data sources. With this aim, we present a system that implements an innovative query approach that obtains results to user queries in an incremental way. It sequentially accesses different datasets, expressed with possibly different vocabularies. Our approach enriches previous answers each time a different dataset is accessed. Mapping axioms between datasets are used for rewriting the original query and so obtaining new queries expressed with terms in the vocabularies of the target dataset. These rewritten queries may be semantically equivalent or they could result in a certain semantic loss; in this case, an estimation of the loss of information incurred is presented.

On Interlinking Linked Data Sources by Using Ontology Matching Techniques and the Map-Reduce Framework

Intelligent Data Engineering and Automated Learning – IDEAL 2014, 2014

Interlinking different data sources has become a crucial task due to the explosion of diverse, he... more Interlinking different data sources has become a crucial task due to the explosion of diverse, heterogeneous information repositories in the so-called Web of Data. In this paper an approach to extract relationships between entities existing in huge Linked Data sources is presented. Our approach hinges on the Map-Reduce processing framework and context-based ontology matching techniques so as to discover the maximum number of possible relationships between entities within different data sources in an computationally efficient fashion. To this end the processing flow is composed by three Map-Reduce jobs in charge for 1) the collection of linksets between datasets; 2) context generation; and 3) construction of entity pairs and similarity computation. In order to assess the performance of the proposed scheme an exemplifying prototype is implemented between DBpedia and LinkedMDB datasets. The obtained results are promising and pave the way towards benchmarking the proposed interlinking procedure with other ontology matching systems.

A Rule-Based Transducer for Querying Incompletely Aligned Datasets

ACM Transactions on the Web, 2018

A growing number of Linked Open Data sources (from diverse provenances and about different domain... more A growing number of Linked Open Data sources (from diverse provenances and about different domains) that can be freely browsed and searched to find and extract useful information have been made available. However, access to them is difficult for different reasons. This study addresses access issues concerning heterogeneity. It is common for datasets to describe the same or overlapping domains while using different vocabularies. Our study presents a transducer that transforms a SPARQL query suitably expressed in terms of the vocabularies used in a source dataset into another SPARQL query suitably expressed for a target dataset involving different vocabularies. The transformation is based on existing alignments between terms in different datasets. Whenever the transducer is unable to produce a semantically equivalent query because of the scarcity of term alignments, the transducer produces a semantic approximation of the query to avoid returning the empty answer to the user. Transform...

Download

Pangea: An MLOps Tool for Automatically Generating Infrastructure and Deploying Analytic Pipelines in Edge, Fog and Cloud Layers

Sensors

Development and operations (DevOps), artificial intelligence (AI), big data and edge–fog–cloud ar... more Development and operations (DevOps), artificial intelligence (AI), big data and edge–fog–cloud are disruptive technologies that may produce a radical transformation of the industry. Nevertheless, there are still major challenges to efficiently applying them in order to optimise productivity. Some of them are addressed in this article, concretely, with respect to the adequate management of information technology (IT) infrastructures for automated analysis processes in critical fields such as the mining industry. In this area, this paper presents a tool called Pangea aimed at automatically generating suitable execution environments for deploying analytic pipelines. These pipelines are decomposed into various steps to execute each one in the most suitable environment (edge, fog, cloud or on-premise) minimising latency and optimising the use of both hardware and software resources. Pangea is focused in three distinct objectives: (1) generating the required infrastructure if it does not ...

Download

Estimating query rewriting quality over LOD

Semantic Web

Nowadays it is becoming increasingly necessary to query data stored in different datasets of publ... more Nowadays it is becoming increasingly necessary to query data stored in different datasets of public access, such as those included in the Linked Data environment, in order to get as much information as possible on distinct topics. However, users have difficulty to query those datasets with different vocabularies and data structures. For this reason it is interesting to develop systems that can produce on demand rewritings of queries. Moreover, a semantics preserving rewriting cannot often be guaranteed by those systems due to heterogeneity of the vocabularies. It is at this point where the quality estimation of the produced rewriting becomes crucial. In this paper we present a novel framework that, given a query written in the vocabulary the user is more familiar with, the system rewrites the query in terms of the vocabulary of a target dataset. Moreover, it also informs about the quality of the rewritten query with two scores: firstly, a similarity factor which is based on the rewriting process itself, and secondly, a quality score offered by a predictive model. This model is constructed by a machine learning algorithm that learns from a set of queries and their intended (gold standard) rewritings. The feasibility of the framework has been validated in a real scenario.

Download

MLPacker: A Unified Software Tool for Packaging and Deploying Atomic and Distributed Analytic Pipelines

2022 7th International Conference on Smart and Sustainable Technologies (SpliTech)

K2E: Building MLOps Environments for Governing Data and Models Catalogues while Tracking Versions

2022 IEEE 19th International Conference on Software Architecture Companion (ICSA-C)

On demand translation for querying incompletely aligned datasets

Download

Concept Tracking and Adaptation for Drifting Data Streams under Extreme Verification Latency