0% found this document useful (0 votes)
26 views4 pages

Data Visualization

This paper presents a visual design and analysis framework for scientific literature to enhance information extraction for beginners. It discusses the challenges of visualizing scientific texts and reviews various visualization tools, highlighting their strengths and weaknesses in improving readability and interpretability. The authors emphasize the importance of careful sample and database selection in achieving effective visualization outcomes.

Uploaded by

sushmaremala1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views4 pages

Data Visualization

This paper presents a visual design and analysis framework for scientific literature to enhance information extraction for beginners. It discusses the challenges of visualizing scientific texts and reviews various visualization tools, highlighting their strengths and weaknesses in improving readability and interpretability. The authors emphasize the importance of careful sample and database selection in achieving effective visualization outcomes.

Uploaded by

sushmaremala1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2020 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS)

Data Visualization for Making Sense of Scientific Literature

Ji Huanqin, Gan Wei*


School of Art and Design, Guangdong University of Technology, Guangzhou, Guangdong, 510090, China
[email protected]

Abstract—In order to solve the problem that it is difficult for visualization enhance people's cognitive ability [4].
beginners to obtain information effectively when facing the Visualization analysis of scientific literature can help
visual design of scientific literature, this paper puts forward a researchers and other interested people analyze and extract
visual design and analysis framework of scientific literature. the desired information.
By analyzing the attributes of existing scientific literature In this paper, we use empirical methods to study the
visualization design, such as chart type, interactivity, layout challenges of scientific text visualization through case
and label, the framework obtains the potential impact of studies and literature analysis. After reviewing relevant tools
scientific literature visualization design on audience and filtering the design of data visualization tools for
understanding. At the same time, this method also puts
scientific literature analysis, We share tools commonly used
forward the corresponding design strategy by making use of
the properties of visual design tools, combined with the current
by six researchers that address the issues in the visualization
situation of literature co-citation network visualization design. of scientific literature. Based on literature studies and cases,
The research results show that the existing visual design of we summarized general strategies by comparing and
scientific literature can improve the readability and analyzing the advantages and disadvantages of visualization
interpretability of visual design through interactive design of various tools.
visualization and adaptive layout adjustment.
II. LITERATURE REVIEW
Keywords-Scientific Literature; Visualization; Visualization In the early 1960s, the first field visualization study
Tools; Data Analysis based on citation data was a historical map of DNA research
that was created manually. Although it’s time-consuming
I. INTRODUCTION and laborious, and difficult to reproduce, this is the first step
in visualization of scientific literature [5].
Scientific information has achieved rapid growth recently,
In 1973, American information scientist Henry Small
and databases and libraries hold a large amount of scientific
first proposed the concept of co-citation analysis. Co-citation
information resources. Elsevier and Scopus offer 16 million
analysis means that two literatures appear together in the
articles from 2,500 journals, and the Web of Science Group
reference catalogue of the third cited literatures, thus
has a total of 74.8 million articles from more than 21,100
forming the co-citation relationship between the two
journals in its core collection alone. Meanwhile, the amount
literatures [6].
of scientific information available online is growing rapidly,
In 1994, Garfield created the mapping with co-citation
and more and more papers are published online as "open-
clustering and ranked the scientific literature chronologically.
access" content [1]. However, just as Chaomei Chen said,
Researchers can gain insight into the development of
the explosion of information makes it difficult for the
knowledge by visualizing the authors, publications,
traditional methods adopted by researchers to keep up with
institutions, countries and regions that currently dominate
the pace of information growth since investigating by
the scientific or academic fields being studied today. Even
directly reading a large number of literatures is time-
the layman can identify key articles and books in the field
consuming, hard to repeat and subjective. Moreover, the
through visualization [7].
continuous flow of new scientific literature makes the
The SCI-Map software developed by ISI is widely used
complex network of scientific literature constantly develop
in physics, chemistry, quantum system and other fields. For
and evolve, which makes the research in the field of science
example, in 1994 Henry Small used the SCI-Map system to
more difficult[2].
investigate the structure of science and social sciences. SCI-
In this case, researchers are limited by the continuous
Map can control the formation and development of literature
flow of science, and it is difficult to extract useful scientific
network through different co-citation strength or distance
information from a large amount of information. The
based on one author, paper or keyword [8].
visualization of the scientific literature is developed in this
context. Visualization, as a carrier of information, value and III. CHALLENGES
experience, has long been a way for people to gain insight
into the world, as well as an aid for people to understand When faced with updated scientific literature and
complex things. Some studies have confirmed that databases, since it is difficult for traditional visualization
visualization reduces cognitive load by externalizing tools to break through technical bottlenecks and achieve
cognitive processes and perceiving system recognition better cognitive effects, they may not be able to fully meet
patterns[3], and external representations formed by the needs of current researchers. Visualization tools should

978-1-7281-6698-8/20/$31.00 ©2020 IEEE 870


DOI 10.1109/ICITBS49701.2020.00191

Authorized licensed use limited to: University of Exeter. Downloaded on June 13,2020 at 18:55:26 UTC from IEEE Xplore. Restrictions apply.
be able to explore as many interrelationships as possible, CitespaceII differs from earlier versions in that it
eliminate distracting information, present clear visualizations, improves visual attributes in visualization and reduces the
and provide low latency on top of that. It is a common cognitive burden on users. In CitespaceI, the user can only
problem for scientific maps to adopt appropriate methods to visually observe the nodes linking different clusters in the
reduce the number of data items and to calculate a good network to identify key points. CitespaceII nodes with high
layout. These visualizations are designed to provide a betweenness centrality are annotated with purple circle in the
meaningful overview, maintain the overall structure of the network, which is a significant visual effect and improves
network and be able to convey information [9]. According to the interpretability of the network.
the literature analysis, the visualization of scientific literature
analysis mainly faces the following problems. The first is the
overload of scientific information and the large number of
scientific literature. Secondly, it is difficult to obtain accurate
samples. Terms in the scientific literature are so similar that
it is hard to separate these seemingly similar terms from
studies in different fields through simple semantic analysis.
Last but not least, visualization is limited by physical
perception since perceiving information from visualization is
limited by the use of devices. Meanwhile, the introduction of
interactive visualization and dynamic visualization puts
higher requirements on tools. Based on the three challenges
of literature analysis, we analyze the visualization effects of
specific visualization tools.
IV. VISUALIZATION TOOLS FOR SCIENTIFIC LITERATURE
Figure 1. A 515-node network of co-cited articles on mass extinction
Nowadays, many tools try to solve the above problems. (1981–2004) [10]
For the visualization effect of scientific literature, it is all
necessary to extract samples accurately from literature
measurement, and analyze social network relations between B. CitNetExplorer
articles and the visualization effects. We reviewed six
popular visualization software (Table 1), listed their
attributes and visualizations, and analyzed the visual design
of their co-cited networks.

TABLE I. VISUALIZATION SOFTWARE FOR SCIENTIFIC LITERATURE

Figure 2. Citation network of the literature on science mapping. [12]

A. Citespace CitNetExplorer is a tool developed by University Leiden


Citespace is a visual co-citation network Java application to visualize and analyze citation networks for scientific
developed by Chaomei Chen with the goal of facilitating publications, and it allows direct import from the Web of
research and trend analysis in the knowledge domain. Science database and explores the citation network
Citespace provides functions that support structural and interactively. Through CitNetExplorer, we can analyze the
temporal analysis of various networks of scientific literature, development of the research field over a period of time to
such as collaboration networks, author co-citation networks, identify the scientific literature on relevant topics and
and document co-citation networks, enabling researchers to explore the publications of specific people. The software
identify rapidly growing subject areas through Citespace and supports directly exporting data from the Web of Science to
search citation hotspots in publication field[11]. Citespace generate a visual network, which can be browsed by
mainly supports scientific literature data from the Web of zooming and scrolling, and the intelligent tag algorithm can
Science, but it can also convert and analyze database data ensure that tags do not overlap.
such as Scopus, CNKI and CSSCI.

871

Authorized licensed use limited to: University of Exeter. Downloaded on June 13,2020 at 18:55:26 UTC from IEEE Xplore. Restrictions apply.
C. Gephi VOSViewer can screen the number of cited literatures in the
Gephi is a powerful network visualization software that network to prune the network, eliminate interference to the
can be imported processed data by other software to generate greatest extent, and automatically mark important node
co-cited network, such as HAMMER, Bibexcel, citespace. In literatures presenting clear and intuitive.
network layout, Gephi provides the latest algorithm layout G. Tools Comparison
algorithm to improve efficiency and quality. The software
palette allows users to change layout settings at runtime, thus
TABLE II. COMPARISON OF VISUALIZATION OF VISUALIZATION
greatly increasing user feedback and experience. The force- TOOLS FOR SCIENTIFIC LITERATURE
based algorithm and the readability optimized gephi co-cited
network have strong interpretability. On the node side, the
use of ranking or partitioning data makes more sense for
network visual representations. Customize colors, sizes, or
labels to make the network presentation more meaningful.
The vector preview module lets you make a final touch and
focus on aesthetics before exploring SVG or PDF.
D. ScienceScape By comparing the main visual algorithms and visual
ScienceScape, developed by Mathieu Jacomy, is an designs used for cited network analysis in the six tools
online science visualization software that allows users to mentioned above, we know that all tools use networks to
obtain desired visual charts by uploading data from scientific present cited relationships among authors except
literature to a web page. ScienceScape supports the data ScienceScape. ScienceScape highlights the correspondence
exported by Scopus and Web of Science databases, and can between the relevant journals and the literature and authors,
export 10 kinds of visualization graphics, such as keyword so it adopts the form of sankey diagram. The relationship
network, author network, journal network, the number of between networks seems to be more in line with the user's
papers published each year, and keywords changing over mental model, since most tool-generated co-citation
time. Its co-citation network is represented by a sankey networks support interaction, and interactive diagrams are
diagram of the main authors, keywords, journals and their better able to extract relevant data and have better
correlation. ScienceScape is generated entirely automatically interpretability in the face of complex data visualization.
by software with high interpretability diagrams, but can't Geph can be interacted with during network generation, but
adjust the network layout or interact with it. is not supported in the final exported interface. ScienceScape
and SciMat show the final graphics after entering the data,
E. SciMAT and do not support tweaking the layout or interaction.
SciMAT is an open source (GPLv3) software tool In terms of layout, CiteSpace automatically adjusts
developed by M.J. Cobo, A.G., et al., which supports the ISI network distribution, while other software requires users to
Web of Knowledge format and the RIS format, and is manually adjust it. ScienceScape shows the final graphics
designed to perform scientific cartographic analysis in a directly, so it doesn't support layout adjustments, but it does
vertical framework that combines the methods, algorithms have the advantage of being able to read the graphics
and metrics of all steps from preprocessing to result directly.
visualization in the scientific drawing workflow. Users can In terms of labels, CiteSpace automatically adjusts font
conduct research based on several bibliometric networks, use size, avoids overlapping, and even adds white stroke to the
different standardization and similarity measures on data, font, which is more powerful. Gephi's tag can be customized
and choose several clustering algorithms to segment data. In with more features, such as adjusting font and font size,
the visualization module, the three representations (strategy adding background color, changing color and adding stroke,
map, cluster network, and evolution region) are used in etc. The disadvantage is that the software cannot be adaptive
combination for users to better understand the results. to avoid overlapping. CitNetExplorer and SciMat do not
SciMAT can support data from ISI Web of Knowledge and support label adjustment.
RIS format, as well as analysis of network, performance, In terms of literature visualization of key nodes,
quality and time[13]. Citespace visualizes literature of key nodes by surrounding
the purple circle around the nodes, which is clear and
F. VOSviewer readable. It is inconvenient since Gephi requires manual
VOSviewer is a software tool developed by Van Eck,N.J., adjustment of key node literature. Other co-citation network
et al. for building and visualizing bibliometric networks. visualization tools all support automatic tagging of key node
These networks include journals, researchers, or individual literature.
publications, for example, and can be constructed based on Overall, the six tools we illustrate can be differentiated
citation, bibliographic coupling, co-citation, or co-authorship. between user self-tuning and software auto-tuning. Tools
VOSviewer supports Web of Science, Scopus, Dimensions such as Citespace and ScienceScape tend to automate
and PubMed files, also provides text mining for constructing algorithms to get the best visualizations. Tools represented
and visualizing co-occurrence networks of important terms by Gephi tend to be customized to obtain visual designs that
extracted from scientific literature. The visual network of meet different requirements. However, in general, the design

872

Authorized licensed use limited to: University of Exeter. Downloaded on June 13,2020 at 18:55:26 UTC from IEEE Xplore. Restrictions apply.
purpose and principle of scientific literature visualization is [11] Chen C. Searching for intellectual turning points: Progressive
to clearly limit the relationship between literatures and the knowledge domain visualization[J]. Proceedings of the National
Academy of Sciences, 2004, 101(suppl 1): 5303-5310.
number of cited literatures to assist researchers in
[12] Van Eck N J, Waltman L. CitNetExplorer: A new software tool for
understanding and analysis. No matter which method is analyzing and visualizing citation networks[J]. Journal of
adopted, it should be designed around this goal. Informetrics, 2014, 8(4): 802-823.
[13] M.J. Cobo, A.G. López-Herrera, E. Herrera-Viedma and F. Herrera,
V. CONCLUSION SciMAT: A new Science Mapping Analysis Software Tool. Journal
We believe that visual design of scientific literature of the American Society for Information Science and Technology,
63:8 (2012) 1609-1630 doi: 10.1002/asi.22688.
should be able to show clear literature network relationships
and present clear node literature for analysis by researchers.
Of course, visualization of scientific literature is not only
about visualization design, but also about sample selection
and database selection. The purpose of scientific literature
visualization is to show the research trends and research
context that can not be found through traditional literature
reading by visualization tools. To achieve this goal, it is very
important to select a database that contains enough relevant
literature, take appropriate retrieval to exclude irrelevant
research, choose suitable visual analysis tools, as well as
every link in visual design and analysis. Only careful
treatment of every step is possible to explore the research
trend. Our future work contains measurement of the visual
design of co-cited networks to probe the readability and
interpretability of relevant designs. We only discuss the
visual design of co-citation network for the visualization of
scientific literature, hoping to provide some references for
researchers.
REFERENCES
[1] B.-C. Björk, M. Laakso, P. Welling, P. Paetau, "Anatomy of green
open access", Journal of the Associationfor Information Science and
Technology, vol. 65, no. 2, pp. 237-250, 2014.
[2] Börner K, Chen C, Boyack K W. Visualizing knowledge domains[J].
Annual review of information science and technology, 2003, 37(1):
179-255.
[3] Norman, Don. Things that make us smart: Defending human
attributes in the age of the machine. Diversion Books, 2014.
[4] Kirsh, David. "Thinking with external representations." Ai & Society
25.4 (2010): 441-454.
[5] Small, Henry. "Co ̺ citation in the scientific literature: A new
measure of the relationship between two documents." Journal of the
American Society for information Science 24.4 (1973): 265-269.of
science. INSTITUTE FOR SCIENTIFIC INFORMATION INC
PHILADELPHIA PA, 1964.
[6] Garfield, Eugene, Irving H. Sher, and Richard J. Torpie. The use of
citation data in writing the history
[7] Small, H., and H. Rothman. "Investigations into the structure of
science and social science using the SCI-Map system." Identifying
innovation in social science: Some bibliometric approaches (SPSG
Review Paper No. 8), London: Science Policy Support Group (1994).
[8] Garfield, Eugene, Irving H. Sher, and Richard J. Torpie. The use of
citation data in writing the history of science. INSTITUTE FOR
SCIENTIFIC INFORMATION INC PHILADELPHIA PA, 1964.
[9] Federico P, Heimerl F, Koch S, et al. A survey on visual approaches
for analyzing scientific literature and patents[J]. IEEE transactions on
visualization and computer graphics, 2016, 23(9): 2179-2198.
[10] Chen C. CiteSpace II: Detecting and visualizing emerging trends and
transient patterns in scientific literature[J]. Journal of the American
Society for information Science and Technology, 2006, 57(3): 359-
377.

873

Authorized licensed use limited to: University of Exeter. Downloaded on June 13,2020 at 18:55:26 UTC from IEEE Xplore. Restrictions apply.

You might also like