0% found this document useful (0 votes)
29 views10 pages

SRR 125

Uploaded by

tipyang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views10 pages

SRR 125

Uploaded by

tipyang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

DATA ANALYTICS IN LIBRARIES AND INFORMATION CENTRES

Amit Tiwari
Documentation Research and Training Centre,
Indian Statistical Institute, Bangalore, 560059
Email: [email protected]

Abstract

Data are like crude oil, so as to generate some value from data, it needs to undergo further
processing. This further processing is the analytics, which adds value to the crude data.
Libraries and information centres generate data through its usual activities. Library data
analytics is a scientific way to make decisions out of the data.
Libraries and information centres have a huge opportunity in the era of data and information.
Big Data, Research Data Management (RDM), Internet of Things (IoT) and Geographical
Information Systems (GIS) are some of the areas where libraries and information centres can
do a lot if they integrate these areas with the data analytics. Tools like R can be very handy for
the information professionals for data analytics and visualization.
This paper briefly explains the data analytics, its needs, its workflow in the library context,
issues and challenges in the libraries, opportunities of libraries to use data analytics and
discussion and suggestions to integrate data analytics with libraries and information centres.

Keywords: Data Analytics and Libraries, RDM, GIS, IoT, R, Data Analytics Workflow.

1. Introduction

It is a prime obligation of any organization to prove its relevance to their users and parent
organizations. Organizations need to present their activities to users for their product
promotions and to parent organizations for funding and other supports. To prove their actions,
organizations are required to give the necessary proof to both users and parent organizations
(Egghe & Rousseau, 2001).

Libraries and information centres deal with the collection, processing, and dissemination of
the information. They generate a huge amount of data due to these activities. This data can be
used to improve the library services for the users and to justify the relevance of the
organization to the parent organization.

Data and information are growing rapidly. The rate of growth of information is so high that
the data scientist has to create new measurement scale regularly to measure the data or
information volume. Here, the job of information managers increases, apparently the libraries
and information centres have to make the strategy to deal with this huge volume of data and
information. Even at the organization level, the data generated by users, staff, and collection is
very huge. If this data is not used effectively, will be just a heap of garbage.

Data-driven culture is the mother of data analytics. Data analytics is used in many
organizations based on their need. Data analytics framework can be integrated with the
libraries and information centres for the optimal use of available data.

2. Data Analytics
Data analytics is the pursuit of extracting meaning from raw data. Analytics is performed
using some statistical or computer system. These systems transform, organize, and model the
data to draw conclusions and identify patterns. Data analytics is used interchangeably with
data analysis.

Malika Sanon defines,

“Data Analytics refers to qualitative and quantitative techniques and processes used to
enhance productivity and business gain.”(Sanon, 2018)

From the above definition, it is clear that data analytics is the concept used to do the analysis,
using qualitative or quantitative statistical techniques. The motive of this analysis is to
enhance the productivity of the organization. Data analytics is a broad concept under which
data analysis is a subheading.

3. Why Data Analytics?

Organisations collect data that they have gathered from its users, businesses, economy and
practical experience. Data extracted from its activities are very huge and useful. This data is
only useful if analyzed. The data analysis techniques identify the trends, patterns and useful
information from a set of existing data. The importance of data analytics in an organization is
shown in the [Figure 1].

As shown in the [Figure 1] data analytics gives a clear picture of gross income and
expenditure by which administration can identify the leakage in the system and reduce the
cost in the required areas. It also gives faster and better decision-making power to the
administration. Data analytics enable the managers to create new products and services for the
users. The analysis of the data about the staff and other components gives results for better
functioning in any organization.
Importance of Data Analytics in an Organization. [Figure 1]
4. Data Analytics Workflow

Data analytics workflow is a coordinated framework for conducting data analysis (Scotlong,
n.d.). The workflow of data analytics is shown in [Figure 2].

Data Analytics
Workflow. [Figure 2]

For data analytics, the primary step is planning, organizing, and data collection. Collected
data is cleaned in the second step. In the third step, data is analyzed using statistical methods
either manually or using some tools, but in case of huge data, the analysis needs visualization
for better understanding. Finally, the report is generated and future prediction is made in the
last step of data analytics process.

5. Data Analytics and Libraries

Report making is a common practice by an organization for developing services and


justification. Libraries prepare their annual reports of its data. An annual report is an inclusive
report on library activities throughout the preceding year. It consists of a comprehensive data
about the administration, finance and structural aspect of the library. This report having a huge
amount of data, if analyzed properly could be a powerful proof of the library activities. Data
analytics workflow in the context of libraries is explained as follows:

5.1 Data Collection

The goal of collecting library data is to have at one's disposal useful information about past
and present activities in order to prepare for future ones (Egghe & Rousseau, 2001). In
libraries, collected data are stored in annual reports. They are gathered both in the electronic
and print environment. Emerging technologies are boon for libraries and librarians as they
can justify their value and contributions effectively using easy and powerful tools of
collection and analysis (Chen-Zhang et al., 2016).
The collection of data is becoming easier for librarians in the era of emerging tools (e.g.,
analytics software), as libraries are offering more online resources and services.
Here are the library areas from where the data is being collected:
5.1.1 Collection
Libraries collect data related to its collections (Books, Journals, Multimedia Items, etc).
For example:
• A number of books ordered, cataloged, invoiced and budget etc.
• A number of series, budget spent, binding issues etc.
• Data about the purchase of the multimedia item, loan, and uses of other multimedia
item etc.
5.1.2 User Services
User services vary from library to library. Here are some of the general library services where
data is collected:
• Circulation data includes renewal, new borrowing, reserved books etc.
• Interlibrary services which include incoming and outgoing requests.
• Reprographic service.
• OPAC use, by its session and person.
• Library facilities and maintenance.
5.1.3 Automation and Catalog Aspects
Some points of automation and catalog are also stored in library reports:
• Partners with whom the automation is done.
• Total use of the automated system.
5.1.4 Staff Information
In this section data related to the professional and non-professional staff is collected.
5.1.5 External Relations
This section deals with the data related to interlibrary meetings, consortia, about the
publishers etc.
5.1.6 Budget
General budget related to maintenance and administration comes under this section while
specific budgetary data goes in their respective section.
5.2 Data Curation
According to R. J. Miller,
“Data curation includes all the processes needed for principled and controlled data creation,
maintenance, and management, together with the capacity to add value to data”. (Miller,
n.d.).
This step is complicated but critical as without this step the analysis is not error-free. A library
professional is in the best position to clean and curate a data.
5.3 Data Analysis
Data analysis is an approach to analyze data sets to describe and summarize their main
characteristics and identify the relationship and differences between or among the variables.
The analysis is irrespective of whether the data is qualitative or quantitative. The analysis of
quantitative data is straightforward and statistical tools can be operated based on the need but
for the qualitative data before analysis, it needs to quantify.
Statistics has two branches (descriptive and inferential), Both the branches are used in data
analysis:
5.3.1 Descriptive Statistics
Descriptive statistics is used for the analysis of data that helps describe, show or summarize
data in a meaningful way such that, patterns might emerge from the data. Descriptive statistics
is used to measure central tendency and spread of data.
5.3.1.1 Measures of Central Tendency
The purpose of measuring central tendency is to find the representative value from the dataset.
Major tools, used to measure central tendency are mean, median and mode.
5.3.1.2 Measures of Spread
The purpose to measure spread is to see the scatteredness of the values in the dataset. It also
shows the difference of data from the mean value of the dataset. Major tools, used to measure
the spread are the range, quartile, mean deviation, variance, and standard deviation.
5.3.2 Inferential Statistics
Inferential statistics is used to draw the conclusion from the sample taken from a population.
There are two methods of inferential statistics which are estimation of parameters and
hypothesis testing. Major tools, used in inferential statistics are regression analysis, T-test,
ANOVA, ANCOVA etc.
Statistical Analysis Tools. [Figure 3]
Descriptive and inferential statistics are the compliments of each other. For the decision-
making the application of both is inevitable. Some of the statistical analysis tools (Statistical
Analysis of Medical Data, n.d.) are shown in the [Figure 3].
Libraries need to use both types of statistical tools for analysis. In the real world, libraries can
use these tools to see:
• The relationship between various library aspects like incoming and outgoing
interlibrary requests, number of patrons and distance from public libraries, the height
of shelf and number of misplaced books etc. can be found using regression models.
• Various inferences about lost books, borrowed books, user subscription of library
services like CAS and SDI can be made using the statistical techniques like mean and
confidence interval.
• Retrieval systems can be compared using Z-Value which is obtained using mean and
standard deviation of any population.
5.4 Data Visualization
Data analysis is not enough when analyzing a huge amount of data, so data visualization is
necessary. Visualization can be done using various graphical or other tools. Some of the
common visualization tools are boxplot, histogram, multi-vari chart, run chart, Pareto chart,
scatter plot, stem-and-leaf plot, parallel coordinates etc.
As libraries deal with both technical and non-technical users and funders, and when it comes
to representing results to a non-technical audience, a good visualization is worth hundreds of
words.
5.5 Report Generation and Prediction
The final step in data analysis is report generation and prediction. This step stores all the
previous findings and predicts based on that. These reports are the primary sources of library
data. Further when library staff or funder need, can use it.
The very purpose of report generation is to make any work reusable. Report generation also
helps to justify the value of ones work to the higher authority.

6. Issues and Challenges

21st century libraries are technology driven. All the activities of a library can be functioned
using certain tools and technology. Therefore, libraries having more complex data today. For
such a huge and complex data there is two major challenge for the library administration:
1. How to analyze the data?
2. How to make the analysis understandable for all?

Data analysis requires technical and statistical skill, therefore it is a huge challenge to make
the professionals educated and skilled for the analysis. It also requires a tool that will make
the analyzed data understandable.

7. Opportunities to Libraries

It is a high time for libraries to use analytics in its regular activities. Libraries and information
centres have golden opportunity to work in the following areas in which data analytics is an
integral part.

7.1 Research Data Management (RDM)

RDM is the process to control and manage research data. Academic libraries are in the best
position to work in the area of RDM. RDM is very useful as it avoids the duplication of any
research, validates the further results and avoids the appearances of research misconduct
(Data Management Basics, n.d.). As shown in the [Figure 4] RDM life cycle contains data life
cycle, in which the data collection and data analysis are major activities. Therefore, data
analytics is required for RDM.

RDM Life Cycle. [Figure 4]


(Source: http://guides.osu.edu/IntroDataManagement)

7.2 Big Data

The concept of big data is not new, but it has got very much popularity in the last decade.
Libraries and information centres generate and use a huge amount of data. In recent years the
volume and velocity of the data and information have increased rapidly. Libraries have a huge
opportunity to use their potential and data analytics to draw the inference from the big data.

7.3 Geographical Information Systems (GIS)

GIS allows adding another layer of location information on the actual information using
longitude and latitude. The main function of GIS is to provide a visual representation of
spatial data. GIS deals with the mapping of geospatial data, proximity analysis by using
interpolation or extrapolation techniques and buffering by using the influence zone technique
(Contributor, 2013).

It is a long-term strategy to use GIS in libraries. Libraries are the champions in the collection,
preservation, processing and providing access to the spatial resources, but the emergence of
GIS had opened new opportunities to the many of the library functions (Adler & Larsgaard,
2005). Spatial data analytics is an inevitable operation for the effective use of GIS techniques.

7.4 Internet of Things (IoT)

The IoT is comprised of billions of connected devices (Hahn, n.d.). It permits any device with
a power source to collect data from its environment. Some of the examples of IoT are smart
appliances like mobile apps, smart clothing and smart accessories (wearables), hobbyist
projects like Raspberry Pi, Beacons etc. In the libraries, IoT can improve access to documents
or services, and create better learning experiences, without compromising patron privacy.
These IoTs generate a huge volume of data whose analysis gives optimum result.

8. Discussion and Suggestions

The two challenges in the integration of data analytics with libraries are discussed in the
issues and challenges part of the paper. There can be manual data analytics or tool supported
data analytics. In the scenario of huge data, manual processes are neither effective nor
efficient. Therefore, a powerful and efficient tool for the analysis of the data is needed.
Visualization can be used for the understanding of the analysis. Visualization provides visual
access to the data in graphics which is simple and more understandable.
There are many statistical and visualization tools that can be integrated to the libraries, for
example, Tableau, OpenRefine, Knime, Rapidminer, R etc. These tools can be a landmark to
deal with both the challenges mentioned above.
All the tools have their strengths and limitations. R is a simple, powerful and open source tool
for data analytics and visualization. Therefore, it is recommended that the libraries use R for
the data analytics and visualization purpose.
R was developed by Ross Ihaka and Robert Gentleman at the University of Auckland in 1991.
The workflow of R is similar to the workflow of data analytics. The workflow of R is shown
in the [Figure 5].
Wo
r kfl
o w
of
R.

[Figure 5]

The first step of R workflow for data analytics starts with the data importing. In R session,
data can be imported through web, databases or by using local files. The second step is data
manipulation in which the imported data is curated and cleaned. In the next step, data is being
analyzed using various descriptive, inferential or graphical statistical tools. Further one can
visualize the analyzed data or do some operation for the forecasting as data analysis process is
finally for making an inference or to do the forecasting. In the next step, data is visualized or
reported for the further use.

In R, a package is a collection of R functions, data and compiled code. The location where the
packages are stored is called the library. These packages can be installed from the R
environment. Some of the R packages used for data analytics are shown in the [Table 1].

Sr. No. Process Packages


1. Data Importing read.csv(), hmisc
2. Data Manipulation MICE, dplyr
3. Data Analysis Pastecs, doBy
4. Forecasting forecast, randomforest
5. Data Visualization ggplot2, googleVis
6. Data Reporting shiny, RMarcdown
Common R packages. [Table 1]

9. Conclusion

A huge amount of data is generated by library activities. This data is wealth for the institutions
as it is used for analytics. Analytics is the key to understand the users and improve the
systems and services that are being offered by libraries and other cultural institutions. Using
analytics new services can also be developed in libraries.
Libraries are complex organizations which deal with the huge amount of data. Data are not
exhaustive in nature. Librarians are not statistical experts, therefore dealing with the huge
amount of data is a complex task. They can take the help of statistical tools like R which can
help them to do all the analysis tasks. Librarians are good at data collection and curation, that
is what R requires for better functioning. Practical expertise of librarians in data collection
and curation and powerful analysis technique of R collectively can enable the librarians to
make better decisions.

References

1. Egghe, L., & Rousseau, R. (2001). Elementary statistics for effective library and
information service management. London: Aslib-IMI.
2. What is Data Analytics: Definition | Informatica India. (n.d.). Retrieved from
https://www.informatica.com/in/services-and-training/glossary-of-terms/data-analytics-
definition.html#fbid=L7pGwWhg6W7
3. Efron, B., & Tibshirani, R. (1991). Statistical Data Analysis in the Computer Age. Science,
253(5018), 390-395. doi:10.1126/science.253.5018.390
4. Data Analysis Workflow with R Packages. (n.d.). Retrieved from
https://www.dezyre.com/article/data-analysis-workflow-with-r-packages/259
5. Chen, H., Doty, P., Mollman, C., Niu, X., Yu, J., & Zhang, T. (2016, February 24). Library
assessment and data analytics in the big data era: Practice and policies. Retrieved from
http://onlinelibrary.wiley.com/doi/10.1002/pra2.2015.14505201002/full
6. Peng, R. D. (2014). R Programming for data science. Victoria, British Columbia, Canada:
Leanpub.

7. Russell, F. (2016). Library analytics and metrics: Using data to drive decisions and
services. Australian Academic & Research Libraries, 47(2), 117-118.
doi:10.1080/00048623.2016.1207272
8. Data curation. (2018, May 12). Retrieved from https://en.wikipedia.org/wiki/Data_curation
9. Exploratory data analysis. (2018, Feb 10). Retrieved from
https://en.wikipedia.org/wiki/Exploratory_data_analysis
10. Descriptive and Inferential Statistics. (n.d.). Retrieved from
https://statistics.laerd.com/statistical-guides/descriptive-inferential-statistics.php
11. Sanon, M. (2018, May 08). 4 Reasons Why Data Analytics is Important. Retrieved from
https://www.digitalvidya.com/blog/reasons-data-analytics-important/
12. Statistical Analysis of Medical Data - Doing it right. (n.d.). Retrieved from
http://www.akspublication.com/editorial_july2009.html
13. Introductory Research Data Management: Data Management Basics. (n.d.). Retrieved
from http://guides.osu.edu/IntroDataManagement
14. Hahn, J. (n.d.). Chapter 1. The Internet of Things (IoT) and Libraries. Retrieved from
https://journals.ala.org/index.php/ltr/article/view/6175/8001
15. Adler, P., & Larsgaard, M. (2005). Applying GIS in libraries. In Geographical
information system (2nd ed., Vol. 2). Retrieved from
https://www.geos.ed.ac.uk/~gisteac/gis_book_abridged
16. Contributor, G. (2013, July 19). Basic Uses of GIS ~ GIS Lounge. Retrieved from
https://www.gislounge.com/basic-uses-of-gis/

You might also like