1 s2.0 S0957417416303141 Main
1 s2.0 S0957417416303141 Main
a r t i c l e i n f o a b s t r a c t
Article history: Web analytics has emerged as one of the most important activities in e-commerce, since it allows com-
Received 15 December 2015 panies and e-merchants to track the behavior of customers when visiting their web sites. There exist a
Revised 15 June 2016
series of tools for web analytics that are used not only for tracking and measuring web traffic, but also
Accepted 16 June 2016
for analyzing the commercial activity. However, most of these tools focus on low level web attributes and
Available online 23 June 2016
metrics, making other sophisticated functionalities and analyses only available for commercial (non-free)
Keywords: versions.
Semantic model In this context, the SME-Ecompass European initiative aims at providing e-commerce SMEs with ac-
Ontology cessible tools for high level web analytics. These software facilities should use different sources of data
E-commerce coming from digital footprints allocated in e-shops, to fuse them together in a coherent way, and to
Web analytics make them available for advanced data mining procedures. This motivated us to propose in this work an
ontology-based approach to collect, integrate and store web analytics data, from many sources of popular
and commercial digital footprints. As article’s main impact, we obtain enriched and semantically anno-
tated data that is used to properly train an intelligent system, involving data mining procedures, for the
analysis of customer behavior in real e-commerce sites. In concrete, for the validation of our semantic ap-
proach, we have captured and integrated data from Google Analytics and Piwik digital footprints allocated
in 15 e-shops of different commercial sectors and countries (UK, Spain, Greece and Germany), through-
out several months of activity. The obtained results show different perspectives in customer’s behavior
analysis that go one step beyond the most popular web analytics tools in the current market.
© 2016 Elsevier Ltd. All rights reserved.
1. Introduction In the current market, there exist a series of tools for web an-
alytics, such as: Google Analytics, Piwik, Clicky, and StatCounter;
In the last few years, web analytics has emerged as one of the that are widely used not only for tracking and measuring web traf-
most important activities in e-commerce, since it allows companies fic, but also for analyzing the commercial activity, hence to im-
and e-merchants to track the behavior of customers when visiting prove the effectiveness of a website. However, these tools often fo-
their e-shop sites. Web analytic applications can also help compa- cus on low level and limited sets of web metrics and attributes,
nies to measure the results of traditional print or broadcast adver- without the possibility of providing specialized analyses. In most of
tising campaigns. Web analytics procedure is based on measuring cases, high level web metrics and sophisticated functionalities are
a visitor’s behavior once on a given e-shop site, which includes its available only for commercial (non-free) versions, which are rarely
drivers and conversions (to actual customer). These data are typi- accessible by SMEs or individual e-merchants.
cally compared against key performance indicators and used to im- In this context, the SME-Ecompass European initiative2 aims at
prove a website or marketing campaign’s audience response. providing e-commerce SMEs with accessible tools for high level
web analytics. These software facilities use the different sources of
data coming from different digital footprints allocated in e-shops.
∗
Corresponding author. However, integrating data from multiple heterogeneous sources
E-mail addresses: [email protected] (M.d.M.R. García), [email protected], entails dealing with different data models, schema and query lan-
[email protected] (J. García-Nieto), [email protected] (J.F. Aldana-Montes). guages. Therefore, there is a clear demand of integrative proce-
1
This work is partially funded by FP7 EU project SME E-COMPASS under Grant
No: 315637. It is also partially funded by Grants TIN2014-58304 (Spanish Ministry
of Sciences and Innovation) and Regional projects P11-TIC-7529/P12-TIC-1519. Au-
thors thanks to involved e-shops to kindly offer web tracking data for testing and
2
validation. SME-Ecompass FP7 European initiative http://www.sme-ecompass.eu/
http://dx.doi.org/10.1016/j.eswa.2016.06.034
0957-4174/© 2016 Elsevier Ltd. All rights reserved.
M.d.M.R. García et al. / Expert Systems With Applications 63 (2016) 20–34 21
dures for providing the advanced data mining algorithms with a 2.1. Background concepts
uniform access to multiple heterogeneous web data sources.
The main hypothesis in this work is: (H1) an ontology-based - Ontology. Ontologies provide a formal representation of the
integration approach will help us to collect, fuse the data to- real world, shared by a sufficient amount of users, by defining con-
gether in a coherent way, and store web analytics data, from cepts and relationships between them (Gruber, 1993). In computer
many sources of popular and commercial digital footprints. As a and information sciences, an ontology defines a set of represen-
result, (H2) we will obtain enriched and semantically annotated tational primitives with which to model a domain of knowledge
data that will be able to train data mining procedures for ad- or discourse. These primitives are typically concepts (classes), at-
vanced analysis of customer behavior in real e-commerce sites. tributes (properties), class members (class instances) and relation-
This motivated us to propose a semantic approach that uses an ships (property instances). The definitions of the representational
ontology as a mediated schema for the representation and consoli- primitives include information about their meaning and constraints
dation in a knowledge base of the tracking data from web source’s on their logically consistent application.
semantics. Semantic web ontologies become a key technology for Ontologies are part of the W3C standards stack for the semantic
intelligent knowledge processing, providing a framework for shar- web, in which they are used to specify standard conceptual vocab-
ing conceptual models about a domain. Semantic mappings be- ularies in which to exchange data between systems, provide ser-
tween the source schema and the ontology are then defined and vices for answering queries, publish reusable knowledge bases, and
used to transform the original data to RDF (Resource Descrip- offer services to facilitate interoperability across multiple, hetero-
tion Framework) 3 . This way, data from heterogeneous sources are geneous systems and databases.
stored and integrated inside a single RDF repository, which can be - RDF. Resource Description Framework is a basic ontology lan-
now easily queried by high level algorithms. The goal is to prop- guage used for representing information about resources on the
erly feed artificial intelligence procedures capable of deciding how web (Staab & Studer, 2009). Resources are described in terms of
to perform marketing activities, such as: displaying a given adver- properties and property values using RDF statements. Statements
tisement targeted to certain category of clients, or decreasing the are represented as triples, consisting of a subject, predicate and
price of a product in a given region; then giving rise to sophisti- object. RDF Schema (Staab & Studer, 2009) (RDFS) “semantically
cated expert systems for e-commerce applications. extends” RDF to enable us to talk about classes of resources, and
The main contributions of this study are summarized as fol- the properties that will be used with them. It does this by giv-
lows: ing particular meanings to certain RDF properties and resources.
RDFS provides the means to describe application specific RDF vo-
– We have developed a semantic approach for the data integra- cabularies. RDF and RDFS provide basic capabilities for describing
tion and consolidation of multiple web analytics data sources. vocabularies that describe resources, metadata and ontologies.
These data are daily accumulated from many heterogeneous - SPARQL. It is an RDF query language for ontology models
digital footprints allocated on actual e-shops. and databases, capable of extracting and manipulating informa-
– We have designed and implemented for the first time an OWL tion stored in RDF format. Essentially, SPARQL is a graph-matching
(Web Ontology Language) ontology (Dean & Schreiber, 2004) query language that can be used to extract knowledge from the
for web analytics. This ontology considers a large and comple- model such as the one proposed in this article. Given a data source
mented set of attributes and metrics, which have been token D, a query consists of a pattern, which is matched against D. The
from several representative web analytics tools in the market. combinations of values resulting from this matching constitute the
– To test hypothesis H1, we have captured and integrated data result of the query (Pérez, Arenas, & Gutierrez, 2009). SPARQL has
from Google Analytics and Piwik digital footprints allocated in strong support for querying semi-structured and tagged data, e.g.
15 e-shops of different commercial sectors (retail, tourism, elec- data with an unpredictable and unreliable structure. SPARQL sup-
tronics, pharmacy, etc.) and countries (UK, Spain, Greece and ports queries to networked, web data sources identified by URIs. In
Germany), throughout several months of activity. The data are fact, it is a W3C recommendation for RDF data.
integrated following the same (standard) format and stored in - OWL. In 2004, the W3C ontology working group (Dean &
a common RDF repository. Schreiber, 2004) proposed OWL as a semantic markup language for
– To test hypothesis H2, obtained “semantized” data are used to publishing and sharing ontologies on the World Wide Web. From
train advanced data mining algorithms to perform customer’s a formal point of view, OWL is equivalent to a very expressive de-
profile analyses. In particular, these algorithms are tested with scription logic where an ontology corresponds to a Tbox (Gruber,
success in two cases of study to classify the visitor’s behavior 1993). This equivalence allows the language to exploit descrip-
and product preference. tion logic researcher results. OWL extends RDF and RDFS. When
compared to RDF models, OWL adds more vocabulary for describ-
The remaining of this article is organized as follows. In
ing properties and classes: relations between classes (e.g. disjoint-
Section 2, background and literature overview are presented.
edness), cardinality (e.g. “exactly one”), equality, richer typing of
Section 3 presents the current state and practices in web analytics
properties, characteristics of properties (e.g. symmetry), and enu-
for e-commerce. In Section 4, the semantic approach is described,
merated classes (McGuinness & Harmelen, 2004).
giving details of the service architecture and the OWL ontology. Af-
- OWL-DL. Syntactic variant of the SHOIN (D) description logic
ter this, the validation procedure is reported in Section 5. Finally,
(Haase & Stojanovic, 2005) with a different terminology to OWL,
main conclusions and future work are given in Section 6.
which is based on RDFS, hence the support for data values, data
types and data type properties. OWL-DL restricts OWL into two
2. Background and related work distinct ways (Horrocks & Patel-Schneider, 2003): first, some syn-
tactic constructs like recursive descriptions in them are not al-
This section describes the main background concepts. A review lowed; second, classes, individuals and properties (respectively
of current related works in the specialized literature is carried out concepts, individuals and roles in description logics) must all
to point out their main differences with regards to our approach. be disjoint. In this work, we use OWL-DL syntax to formalize
the proposed ontology here for our semantic model. A summa-
rized description of basic OWL-DL semantics syntax is shown
3
RDF in W3C https://www.w3.org/RDF/ in Table 1, where an informal logic syntax is represented (left
22 M.d.M.R. García et al. / Expert Systems With Applications 63 (2016) 20–34
Table 2
Related approaches in the state of the art. The target area of application, the used ontology/vocabulary and the post-
processing analysis, and the validation procedure are reported for each work.
Fig. 1. Current practices in surveyed e-commerce SMEs with regards to the use of automatic tools for web analytics.
e-commerce sites. For this reason, we opted to design a seman- e-shop owner is owner of an e-shop, a visitor makes visits, a
tic approach for sharing and reconciliation, whereby an agreed on- device has a browser, an IP address belongs to an organiza-
tology model is used to archive a common understanding of the tion, etc. Examples of data type properties are the title and
domain in which the system operates. In concrete, we have devel- URL of a page, the first and last name of an e-shop owner,
oped an OWL ontology to describe the e-shops main features by the version of the operating system, the duration of a visit,
following the standard ontology 101 development process (Natalya, etc. An object property is defined for each subclass to estab-
McGuinness, & Deborah, 2001) of seven steps: lish the correct relationship. For example, Page is related to
Bounce_rate and Date_of_last_visit; E-shop is related to Num-
(i) Determine the domain and scope of the ontology. As the start- ber_of_customers. Tables 3–8, describe in OWL-DL represen-
ing point, to limit the scope of the ontology, we selected tative subsets of object and data properties of a selection of
the kind of variables that the data mining algorithms need the main classes.
from Google Analytics and Piwik and also from competitors (vi) Define the facets of the slots. This step includes the defini-
e-shops, for instance: visitors origin, visitors attributes, pur- tion of cardinality constraints and value restrictions. Value
chasing behavior, product and customer details, etc. restrictions are used in our ontology to specify the data
(ii) Consider reusing existing ontologies. As we examined in type for the value in each subclass of the Analytic_parameters
Section 2.2, there are no similar ontologies that have been class. For example, the range of the property hasValue is re-
previously proposed for modeling web tracking data in e- stricted to float, when the class Bounce_rate is its domain;
commerce. However, we partially considered two related on- the range of the property hasValue is restricted to date, when
tologies: GoodRelations (Hepp, 2008), which is a standard- the class Date_of_last_visit is its domain.
ized vocabulary for e-commerce and the Product Ontology (vii) Create instances. Instances (individuals in OWL) correspond
(Hepp, 2008), which contextualizes product types based on to the specific data obtained from a specific e-shop. Individ-
Wikipedia. uals will be obtained by mapping the data from Google Ana-
(iii) Enumerate important terms in the ontology. Important terms lytics, Piwik or competitors e-shops to RDF according to the
in the ontology were extracted in a previous phase of re- ontology. Individuals can be also used in the ontology to de-
quirements specification (Garía-Nieto & Roldán, 2014) from fine the exact members of a class. The range of the property
the minimum set of variables that are needed. Exam- hasType is restricted to values: ”ASIN, ”EAN or ”ISBN, when
ples of such terms are: address, visitor, customer, device, its domain is Article_number. ASIN, EAN and ISBN are then
browser, Geographical_origin, Number_of_visitors, Conver- ontology individuals (see Table A.5 for further explanations).
sion_rate, etc.
(iv) Define classes and the class hierarchy. From the list of 4.1. Ontology model
terms, we obtained the ontology classes. Fig. 2 shows the
main set of classes in the hierarchy starting from the top The proposed ontology, called “wao.owl” (Web Analytics Ontol-
class Thing (). These main classes are related to other ogy), resulting from the development process described above has
classes and some of them have subclasses. For instance, a total of 62 classes (groups of individuals sharing the same at-
Analytic_parameters has a series of subclasses, such as: tributes), 61 object properties (binary relationships between indi-
Bounce_rate, Total_revenue, Number_of_returning_visitors, and viduals), and 67 data properties (individual attributes), 33 restric-
Number_of_transactions. tion axioms and 3 individuals. The complete ontology is available
(v) Define the properties of classes and slots. In order to relate in WebProtégé repository.4
classes and to define attributes, we identified objects and
data type properties based on the minimum set of variables
previously defined. Examples of object properties are: an 4
URL link http://stanford.io/1XHhHzr
24 M.d.M.R. García et al. / Expert Systems With Applications 63 (2016) 20–34
Fig. 2. General overview of the WAO ontology. Continuous arrows refer to sub class of. Dotted arrows refer to specific properties.
Table 3
Analytics_parameters group: object and data properties.
For simplicity, we describe here a representative subset of main others: Average_order_value, Average_pages_visited_per_session, Av-
classes including some of their most interesting object and data erage_session_duration, Average_time_on_site, Bounce_ rate, Con-
properties. These classes are: Analytics_parameters, E-shop, Visitor, version_rate, Number_of_transactions, Number_of_landings, Num-
Page, and Item. Each class requires a set of properties or conditions ber_of_new_visitors, Number_of_page_views, Revenue_per_ session
in order to be conceptualized. That is, an individual that satisfies and Total_revenue. Table 3 shows some representative object and
those properties is considered to be a member of that class. data properties of Analytics_parameters. Each analytic parame-
- Analytics_parameters. Those attributes provided by Google ter belongs to a data type. For instance, the value of Num-
Analytics and Piwik that depend on time. Each analytic param- ber_of_transactions is an non-negative integer and the value of Con-
eter has a value (hasValue in Table 3), which corresponds to version_rate is a float. Data type restrictions are included in the on-
the data provided by the analytic tool, and a date (hasDate), tology by means of data properties.
which corresponds to the date when the data was obtained. - E-shop. An e-shop has one or several pages and also
Subclasses in the ontology ( Analytic_parameters) are, among an e-shop’s owner. Each e-shop’s owner has an address.
M.d.M.R. García et al. / Expert Systems With Applications 63 (2016) 20–34 25
Table 4
E-shop group: object and data properties.
−
hasVisitor > ≡ makesVisit >
∃ hasVisitor.Thing E-shop
∀ hasVisitor.Visitor
hasNumberOfVisitors ∃ hasNumberOfVisitors.Thing E-shop Page
∀ hasNumberOfVisitors.Number_of_visitors
hasNumberOfVisits ∃ hasNumberOfVisits.Thing E-shop Page Visitor
∀ hasNumberOfVisits.Number_of_visits
isOwnerOf ∃ isOwnerOf.Thing E-shop_owner
∀ isOwnerOf.E-shop
Data properties Description logic
hasName ∃ hasName.DatatypeLiteral Browser Competitor E-shop Goal Item
Operating_system Page Product
∀ hasName.Datatypestring
hasURL ∃ hasURL.DatatypeLiteral Competitor E-shop Page Price
∀ hasURL.Datatypestring
Table 5
Visitor group: object and data properties.
Table 7
Page group: object and data properties.
Table 8
Item group: object and data properties.
attribute number. Tables 5 and 6 show the properties with classes are: name, type, availability on a specific date and article number.
in the visitor and visit group as domain, respectively. The article number can be “ASIN”, “EAN” or “ISBN”.
- Page. Pages contain items, i.e. product and/or services to be
sold. The analytic parameters for Page are: average_order_value,
4.2. Data sources: mapping and querying
average_time_on_page, bounce_rate, date_ of_last_ visit, num-
ber_of_exits, number_of_landings, number_of_new_visitors, number_
As we explained in Section 3, we have focused on three main
of_page_views, number_of_returning_visitors, number_of_sessions_
sources of data coming from different web tracking methods,
by_medium (mediums are direct link, social media and search
namely: Google Analytics, Piwik, and specific web scrapping meth-
engine), number_of_sessions, number_of_unique_page_views, num-
ods in the scope of SME E-Compass project.
ber_of_unique_visitors, number_of_units_sold, number_of_visitors,
The process of translating the collected data from different
number_of_visits, revenue_ per_session_and_total_revenue. Attributes
sources to RFD is carried out by means of mapping functions. Each
of page are title and URL. A series of representative properties
data source has a different set of methods to gather, harmonize,
whose domain is page are shown in Table 7. Interestingly, we can
store and provide access to the analytical data. Therefore, a dif-
observe in this table that the property hasTotalRevenue is related
ferent set of mapping functions is required to parse the informa-
to the Page, as well as the to the whole e-shop, as this value can
tion coming from each data source to RDF, according to the on-
be calculated for both classes.
tology. Fig. 3 illustrates an general overview of the mapping pro-
- Item. As commented before, an Item is a product or a service
cess to store data from different sources in a common RDF reposi-
which is sold in an e-shop. Specific items of an e-shop are modeled
tory. Each set of mappings is then composed by functions to trans-
by defining a domain ontology for a specific domain, i.e., travel,
late the attributes with their values into their corresponding triple
books, music, etc. Table 8 contains some representative object and
form in RDF. In fact, for most of the attributes, a corresponding
data properties of class item. According to this, Items have a price
mapping function has been developed, involving its correspond-
(hasPrice). Prices are valid on a certain date. Therefore, attributes
ing class in the ontology. Nevertheless, as a number of analytic
for prices are value, currency and the date for the price validity.
attributes shares a common structure in the ontology, they have
The attributes of Items are category and whether or not it has been
been mapped by using generic functions, hence taking advantage
deleted. Products have a manufacturer. The attributes for products
of the ontology’s design.
M.d.M.R. García et al. / Expert Systems With Applications 63 (2016) 20–34 27
Fig. 3. General overview of the mapping process injecting data from different sources into the RDF repository.
4.2.1. Google Analytics sion tracking, event tracking, geolocation, pages transitions views
Google Analytics5 is a partially free web analytics service that and page overlay.
provides statistics and basic analytical tools for Search Engine Op- Similarly to Google Analytics, the web tracking procedure in
timization (SEO) and marketing purposes. The service is available Piwik is also performed by a digital footprint script, that is al-
to anyone with a Google account, although advanced e-commerce located in the e-shops HTML source code. In the case of Piwik,
functionalities are only available for restricted users. Google Ana- the analytical data is automatically stored in a relational database
lytics is geared toward small and medium-sized retail websites. (SQL). Therefore, as we have the possibility to access to this rela-
The web tracking procedure in Google Analytics is performed tional database, we have developed the mapping functions to di-
by a “snippet” or digital footprint component, that provides the rectly query the analytic attributes. These attributes are described
developer with an API of functions for accessing to each attribute in Tables A.2–A.4 of Appendix A with regards to their correspond-
value. This digital footprint is a small piece of JavaScript code that ing ontology classes. The obtained data is then translated to RDF
is pasted into the e-shops HTML source code and deployed in the according to the ontology by means of specific mapping methods,
web server where the e-shop is hosted. It activates Google Ana- as shown in Fig. 3.
lytics tracking by inserting the JavaScript ga.js/analytics.js
into the page. As illustrated in Fig. 3, the JavaScript component is 4.2.3. Web scrapping methods
then instantiated by our mapping functions by means of a series In the scope of the SME E-Compass project, there exist a series
of java classes to generate RDF triples. of methods for scraping product and price data from the competi-
Table A.1 in Appendix A contains the set of Google Analytics tors e-shop websites. This way, a given e-shop’s owner is able to
attributes that are currently tracked by our semantic approach. In compare their products’ prices with those ones of their direct com-
this table, each attribute is listed with regards to its corresponding petitors automatically.
ontology class, data type, and description. This is a representative This specific functionality provides a REST API service from with
subset of the whole set of possible attributes (and its combina- we can obtain attributes of competitor’s profile in JSON7 format,
tions) in the Google Analytics’s API specification, that covers all our which is a compact and easily readable data format for the purpose
preliminary requirements for visitor’s behavior and products’ anal- of data exchange. Table A.5 contains the competitors attributes that
ysis. However, it is worth mentioning that the proposed ontology are mapped to RDF in our semantic model (see Fig. 3), with re-
can be easily extended to consider any of the attributes worked gards to the corresponding ontology classes.
with Google Analytics.
5
http://www.google.com/analytics/
6 7
http://piwik.org/ http://json.org
28 M.d.M.R. García et al. / Expert Systems With Applications 63 (2016) 20–34
Fig. 4. Example of SPARQL query that returns disaggregated data attributes, as the ones provided by Piwik, as well as calculated metrics, as those obtained from Google
Analytics.
Table 9 with a short duration finished without any conversion, which rep-
Two samples of the query result (Fig. 4) of a certain time slot
resents a visitor that leaves the site prematurely.
(day 2015-10-23) of a real e-shop.
In the case of aggregative attributes, they are calculated for
Attribute/metric Visit75688 Visit75692 all the visits in the time period of the SPARQL query. Therefore,
timestamp 14:19:44 14:21:41
as shown in the second half of Table 9, the e-shop registered a
visit_total_searches 0 0 bounce rate close to 53% with conversion rate8 of 34.12%, that cor-
visit_total_events 0 0 responds to all visits, bounces and purchases of the queries time
visit_total_duration 2071 12 period.
visit_total_goal_converted 1 0
Another important attribute is the number of new visitors, that
total_bounce_rate 52.6066 for this e-shop and for this date is 145, e. g., 68.72% of total entries.
total_conversion_rate 34.1232
This information could be now used to feed predictive algorithms
total_number_of_entries 211
total_number_of_new_visitors 145 that help the e-merchant to adopt a given marketing strategy to
catch clients.
In order to automatize and simplify the accesses to the stored
data, our semantic approach includes a specific REST API service
with methods that implement predefined SPARQL queries. These
visits of a given e-shop, in a certain date or period of time. The re- methods are used as input of the data mining algorithms as de-
quired information of visits should consist of both: disaggregated scribed in the following step of validation.
data attributes, as the ones provided by Piwik, and calculated met- As an additional advantage of this semantic approach, it is pos-
rics, as those obtained from Google Analytics. sible to connect our RDF repository with other/s external open
The SPARQL query represented in Fig. 4 unifies the encoding of linked data repository/ies. In this regard, a minimum adaptation
such logic, for which a couple of result samples are displayed in has to be done in terms of deciding which class/classes are directly
Table 9. In concrete, these results correspond to two consecutive linked from the two repositories with similar semantic meaning. In
visits to the e-shop with ID <eshop-id>, that were performed fact, this is one of the most powerful features when using the se-
at date 2015-10-23. The visit IDs are 75688 and 75692, and they mantic structure induced by the ontology.
were captured at timestamps 14:19:44 and 14:21:41, respectively.
As shown in this table, the visit with a prolonged duration led to
one goal conversion (usually a successful sale), whereas the visit 8
Conversion rate: proportion of visitors converted into paying customers.
M.d.M.R. García et al. / Expert Systems With Applications 63 (2016) 20–34 29
Fig. 5. Percentage of visitors classified by typologies (misplaced, loyal and wandering) for country of origin. The percentages are relative to each country.
Fig. 7. Visitor typologies over time. The plot below shows the activity in a range of three months from 2015-06-30 to 2015-10-01, whereas the plot above is a specific
timeframe selected for one month from 2015-08-15 to 2015-09-15.
6. Conclusions
Appendix A. Analytic metrics and attributes
In this work, we propose a semantic approach that uses an on-
tology as a mediated schema for the representation and consoli- The complete set of used attributes and metrics from
dation of the tracking data from web source’s semantics. Semantic Google Analytics, Piwik and Scrapping methods are described in
mappings between the source schema and the ontology are then Tables A.1–A.5. The corresponding ontology class of each attribute
defined and used to transform the original data to RDF. In this way, are located in the first column of these tables.
Table A.1
Google Analytics used metrics and attributes in the ontology model.
∗
E-shop transactions int Number of e-commerce transactions
E-shop, Page itemRevenue float∗ Total e-commerce revenue
E-shop, Page itemQuantity int∗ Total number of units sold
E-shop, Page transactionRevenuePerSession float∗ E-commerce revenue per Session
E-shop, Page revenuePerTransaction float∗ Average order value
E-shop, Page bounces int∗ Total number of single page (or single engagement hit) sessions
Visit uniquePurchases int∗ Number of product sets purchased
All these metrics are combined with dimensions: date, hour, city, region, browser, networkDomain, and source.
Besides, sessions are combined with dimensions: city, region, country and continent.
Table A.2
Piwik used metrics and attributes in the ontology model.
Table A.3
Piwik used metrics and attributes in the ontology model.
Table A.4
Piwik used metrics and attributes in the ontology model.
∗
If this conversion is for an e-commerce order or abandoned cart.
Table A.5
Attributes of the web scrapping methods in the ontology model.
E-shop owner E-shop ID Integer ID of E-shop owner given by E-COMPASS Cockpit (user management)
Last name String Name of the person in charge (employee of the e-shop)
First name String Name of a person (employee of the e-shop)
E-Mail address String E-Mail address of the person in charge (employee of the e-shop)
E-shop URL String Start page of the e-shop
E-shop owner ID Integer ID of E-shop owner given by E-COMPASS Cockpit (user management)
Competitor ID E-shopID E-shop ID of all competitors
Product Product ID Integer Product ID of the E-COMPASS System
Name String Product Name given by E-Shop owner (e.g. as a search query)
Article Number Type String Type of article number (ASIN, EAN and/or ISBN)∗
Value String The value of product (ASIN, EAN and/or ISBN)∗
Price Value Double Price value on scraping date
Currency String Currency of Price
Date Date Scraping date of product price
Availability Value String Availability of product available or ”not available
Date Date Scraping date of availability
∗
ASIN: Amazon Standard Identification Number, a ten-digit alpha-numerical product code;
EAN: European Article Number, 8-digit or 13-digit number for product identification;
ISBN: International Standard Book Number, 10-digit or 13-digit number for book identification.
34 M.d.M.R. García et al. / Expert Systems With Applications 63 (2016) 20–34
References Horrocks, I., & Patel-Schneider, P. (2003). Reducing owl entailment to description
logic satisfiability. In The semantic web - iswc 2003. In Lecture Notes in Computer
Akanbi, A. K. (2014). Lb2co: a semantic ontology framework for b2c ecommerce Science: 2870 (pp. 17–29). Springer Berlin.
transaction on the internet. International Journal of Research in Computer Science, McGuinness, D., & Harmelen, F. (2004). OWL web ontology language overview. Tech-
4(1), 1–9. nical Report. W3C Recommendation.
Dean, M., & Schreiber, G. (2004). OWL web ontology language reference. Technical Natalya, N., McGuinness, F., & Deborah, L. (2001). DOntology Development 101:
Report. W3C Recommendation, 10 February 2004. A Guide to Creating Your First Ontology. Technical Report. tanford University
Garía-Nieto, J., & Roldán, M. (2014). D2.1 SME-E-COMPASS requirements analysis. Knowledge Systems Laboratory Technical Report KSL-01-05.
Technical Report. Public Deliverable. Pérez, J., Arenas, M., & Gutierrez, C. (2009). Semantics and complexity of sparql.
Gatchalee, P., Li, Z., & Supnithi, T. (2013). Ontology development for smes e-com- ACM Transactions on Database Systems, 34(3), 1–45.
merce website based on content analysis and its recommendation system. In Staab, S., & Studer, R. (2009). Handbook on Ontologies. International Handbooks on
Computer science and engineering conference (icsec), 2013 international (pp. 7–12). Information Systems. Springer.
Gruber, T. R. (1993). A translation approach to portable ontologies. Knowledge Acqui- Tamma, V., Phelps, S., Dickinson, I., & Wooldridge, M. (2005). Ontologies for sup-
sition,, 5(2), 199–220. porting negotiation in e-commerce. Engineering Applications of Artificial Intelli-
Haase, P., & Stojanovic, L. (2005). Consistent evolution of owl ontologies. In gence, 18(2), 223–236.
A. Gmez-Prez, & J. Euzenat (Eds.), The semantic web: research and applications. Trastour, D., Bartolini, C., & Preist, C. (2003). Semantic web support for the business–
In Lecture Notes in Computer Science: 3532 (pp. 182–197). Springer Berlin Hei- to-business e-commerce pre-contractual lifecycle. Computer Networks, 42(5),
delberg. 661–673.
Hepp, M. (2008). Goodrelations: an ontology for describing products and ser- Waralakv, S. (2008). Learning semantic web from e-tourism. In N. Nguyen, G. Jo,
vices offers on the web. In Proceedings of the 16th international conference on R. Howlett, & L. Jain (Eds.), Agent and multi-agent systems: Technologies and ap-
knowledge engineering and knowledge management (ekaw2008) (pp. 332–347). plications. In Lecture Notes in Computer Science: 4953 (pp. 516–525). Springer
Springer LNCS, Vol 5268. Berlin Heidelberg.