0% found this document useful (0 votes)
151 views11 pages

Statistics Science Vs Data Science

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views11 pages

Statistics Science Vs Data Science

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Technological Forecasting & Social Change 173 (2021) 121111

Contents lists available at ScienceDirect

Technological Forecasting & Social Change


journal homepage: [Link]/locate/techfore

The science of statistics versus data science: What is the future?


Hossein Hassani a, Christina Beneki b, Emmanuel Sirimal Silva c, *, Nicolas Vandeput d,
Dag Øivind Madsen e
a
Research Institute for Energy Management and Planning, University of Tehran, Tehran, Iran
b
Department of Tourism, Ionian University, P. Vraila Armeni 4, 49100 Corfu, Greece
c
Fashion Business Research Centre, University of the Arts London, 272 High Holborn, London WC1V 7EY, UK
d
Université Paris-Saclay, CentraleSupélec, Laboratoire Génie Industriel, 3 rue Joliot-Curie, 91190 Gif-sur-Yvette, France
e
University of South-Eastern Norway, School of Business, Bredalsveien 11, 3511 Hønefoss, Norway

A R T I C L E I N F O A B S T R A C T

Keywords: The importance and relevance of the discipline of statistics with the merits of the evolving field of data science
Perspective continues to be debated in academia and industry. Following a narrative literature review with over 100
Science scholarly and practitioner-oriented publications from statistics and data science, this article generates a prag­
Statistics
matic perspective on the relationships and differences between statistics and data science. Some data scientists
Data science
Similarities
argue that statistics is not necessary for data science as statistics delivers simple explanations and data science
Differences delivers results. Therefore, this article aims to stimulate debate and discourse among both academics and
Pragmatism practitioners in these fields. The findings reveal the need for stakeholders to accept the inherent advantages and
disadvantages within the science of statistics and data science. The science of statistics enables data science
(aiding its reliability and validity), and data science expands the application of statistics to Big Data. Data sci­
entists should accept the contribution and importance of statistics and statisticians must humbly acknowledge
the novel capabilities made possible through data science and support this field of study with their theoretical
and pragmatic expertise. Indeed, the emergence of data science does pose a threat to statisticians, but the op­
portunities for synergies are far greater.

1. Introduction era of Big Data (McNutt, 2014), many organisations are actively seeking
to employ data scientists. In fact, Baškarada and Koronios (2017) note
In recent years, a growing debate in academia and industry has that many organisations often seek “unicorn data scientists”, a rare
compared the importance and relevance of the discipline of statistics breed, almost mythical creatures that are experts in multiple specialties,
with the merits of the evolving field of data science (MacGillivray, 2021; from mathematics to computer science and artificial intelligence (AI).
Nachtsheim and Stufken, 2019; Ben-Zvi et al., 2018; Ribeiro et al., 2017; There are, however, commentators who remain critical and skeptical of
Davenport & Patil, 2012; Wickham, 2014; Wu, 1997). These debates these broad-based portrayals of data scientists as corporate saviours. For
have also extended to comparisons of software tools used for statistics example, some researchers criticize data science and largely see it as a
and data science (Sardareh et al., 2021). While the discipline of statistics myth, suggesting instead a return to conventional scientific approaches
has a long history and is well established (Marquardt, 1987; Stigler, where data and methodology are just processual components (Learner
1986), traditional statisticians have recently been overshadowed by the and Phillips, 1993; Phillips, 2017).
emergence of a new class of number crunchers – data scientists. Today, Through a review of the relevant literature, this article aims to take
statistics is a profession that is both invaluable and invisible (Rodriquez, stock of these differing views on the fields of statistics and data science.
2015) with data science being considered one of the hottest professions, The goal is to generate a pragmatic perspective on the relationship and
and in the words of Davenport and Patil (2012) “the sexiest job of the differences between statistics and data science.
21st century.” Like Phillips (2017) who presents a perspective on Big Data, this
Whilst the contribution of statistics to the progression of scientific article seeks to present a balanced and pragmatic perspective on the
knowledge across many disciplines continues to be acknowledged in this science of statistics and data science. Thus, this article can arguably be

* Corresponding author.
E-mail address: [Link]@[Link] (E.S. Silva).

[Link]
Received 30 June 2020; Received in revised form 29 July 2021; Accepted 10 August 2021
Available online 20 August 2021
0040-1625/© 2021 Elsevier Inc. All rights reserved.
H. Hassani et al. Technological Forecasting & Social Change 173 (2021) 121111

described as a “perspective article” since the overall aim is to discuss explained, statistics is only part of data science, albeit a crucial part. The
current debates and advances in these two fields and identify future John Hopkins Data Science Specialisation2gives prominence to hy­
directions both in academic research and in practice. Through critical pothesis testing, statistical model development and statistical inference
analysis and discussion, the article holds the potential to stimulate the as essential to the development of a data scientist (Ben-Zvi et al., 2018).
academic discourse about the pragmatic relationship that exists between Dunson (2018) noted that a significant portion of data science is not
these two fields, which goes far beyond semantic considerations. We statistics. Even though, statistics not only supports but is also directly
subscribe to MacGillivray’s (2021, p.S5) view that analysis and associated to data science, and statistical skills are very important for
well-researched cautionary commentary by statisticians [and data sci­ data scientists (Ribeiro et al., 2017), data science is closely connected to
entists] can be extraordinarily valuable for both statistics and data mathematics, statistics, and computer science (Saltz and Stanton, 2017).
science. The rise of ‘Big Data’ and ‘data science’ have given statistics a wake-
The article is conceptual and builds heavily on in-depth examina­ up call (Breiman, 2001; Galeano and Pena, 2019) because the expansion
tions of more than 100 scholarly and practitioner-oriented publications of data science through the increasing availability of data and
from statistics and data science. The literature is reviewed following a user-friendly software could result in the marginalisation of statistics
narrative review approach (Baumeister & Leary, 1997; Ferrari, 2015). (Ben-Zvi et al., 2018). Interestingly, as these two disciplines rely on a set
As Ferrari (2015) notes, a narrative approach is particularly useful for of skills that often overlap (Diggle, 2015), data science and statistics
appraising the current state of knowledge and for contributing to gen­ frequently share distinguishing qualities. Nevertheless, some advocates
eral debates in the research literature. That said, narrative reviews also of data science believe that you can be a good data scientist without a
suffer from some limitations, such as being not very explicit about the background in statistical theory (Granville, 2014; Davison, 2018).
researchers’ assumptions and biases related to selection and sampling of Others suggest that science is the only reality, and that data science is a
studies. While the authors of this article recognise the potential myth as data and methodology are simply two of the four components
subjectivity of following such an approach, steps have taken to remedy that make up science (Phillips, 2017; Learner and Phillips, 1993). Some
these biases. For example, the authors hail from both the statistics and even questioned the longevity of the ‘buzzword’ that is data science
data science communities, which means that each field’s underlying (Press, 2013). However, data science is not merely a ‘buzzword’ and
assumptions and viewpoints have been challenged during the research instead represents significant advances in capabilities for tackling highly
process. complex challenges (MacGillivray, 2021). Challenges that statisticians
The remainder of this article is structured as follows. Section 2 alone would struggle to overcome given the growing volume, velocity,
provides a background to statistics and data science and identifies some and variety of data.
of the main distinguishing features of these fields. Section 3 discusses the It is likely that if data science was to proceed without statistics, it
challenges posed by the evolution of Big Data, while Section 4 discusses would diminish both statistics and data science and worsen data-based
the limitations of data science. Section 5 follows a discussion of the decision-making in society (Ben-Zvi et al., 2018). Furthermore, in
distinctions between the two fields. Lastly, Section 6 concludes the contrast to Granville and other advocates, Huang’s (2019) view is that
article. statistics is one of the three main data science skill sets (in addition to
programming and business knowledge). For Huang (2019), the ability to
2. Background: statistics and data science use statistics to infer insights from smaller data sets onto larger pop­
ulations is a fundamental law of data science. Patil (2018) supports this
The field of statistics has a rich and complex genealogy (Stigler, view as he notes the critical importance of understanding statistics,
1986). The term statistics is said to have originated around the year especially Bayesian probabilities for machine learning. Galeano and
1749 (Walker, 1929; Ribeiro et al., 2017). For example, Norton (1978) Pena (2019) note that machine learning has been successful through the
traces the development of the modern field of statistics to Karl Pearson integration of some methods developed for large data analysis with the
and his work in mathematical biology and biometry. Today, the Amer­ methods in operations research, applied mathematics and statistics (for
ican Statistical Association defines statistics as “the science of learning example, support vector machines, regularization methods, and network
from data, and of measuring, controlling and communicating uncer­ analysis). Rodriguez (2013) asserts that data scientists are in demand
tainty” (Ben-Zvi et al., 2018, p. 6). Statistical methods were developed not only for their expertise in programming, machine learning, and
for a world with scarce data where the lack of information called for strong communication skills, but also statistical modelling.
models based on simplified assumptions to enable drawing conclusions Some authors argue that most statisticians have not contributed
from small datasets (Galeano and Pena, 2019). However, as Google’s much to recent progress in data science (van der Aalst, 2016). In fact,
Chief Economist Hal Varian emphasises, the complexities inherent in over the years, statisticians have been criticised for being too reliant on
modern world problems demands something more than statistics for theory at the expense of computation (Carmichael and Marron, 2018).
understanding and extracting value from data (McKinsey Quarterly, However, it is important to bear in mind that statisticians too have
2009; Dayal, 2020). In Google searches worldwide (Figure 1, top graph), contributed to computation through software such as the R-Project
big data is the most popular search term followed by data scientists, and (Members, 2017). Also, as Phillips (2017, p. 731) explains, “theory is a
statisticians respectively, thereby giving an indication of the interest or necessary overlay for making sense of big data”. As such, the incorpo­
demand for the data scientist role in the modern world. However, it is ration of statistical theory within data science processes can bring some
noteworthy that Google Trends for the fields of ‘statistics’ and ‘data added reliability and validity to the findings and analysis. Furthermore,
science’ shows the popularity of statistics over data science (Fig. 1, statisticians are also as infamous for creating complex models useful for
bottom graph), perhaps fuelled by the increased number of statisticians solving well-defined problems based on assumptions that do not mate­
than data scientists in the world. The trends indicate that whilst the role rialise in the real world. An argument that resonates somewhat with the
of a data scientist is increasingly more popular, as a field, the popularity famous quote by George Box that “all models are wrong; some models
of statistics continues to dominate over data science. are useful” (Box et al., 2005, p. 440).
Over the years, the debate on the superiority of statistics and data For example, in the field of inventory optimization, statisticians are
science has resulted in varied views. Prof. Jeff Wu (1997) argued that infamous for creating extremely precise models that work under very
“statistics” should be renamed “data science,” but as Wickham (2014) restrictive assumptions (n.b., the existence of non-parametric methods

2
[Link]
html

2
H. Hassani et al. Technological Forecasting & Social Change 173 (2021) 121111

Fig. 1. Worldwide search interest for ‘big data’, fields of ‘statistics’, and ‘data science’, and roles of ‘statistician’ and ‘data scientist’ over the last 90 days
[accessed: 28.07.2021].

that do not rely on assumptions) that fail when faced with ‘real’ supply of Web of Science and found that there were more than 50 publications
chains. Nahmias (1979) summarises this aptly as he asserts that in the dealing with aspects of data science published during the 1960s,
past, research has focused primarily on providing rigorous analysis of although it should be noted that the words “data” and “science” were not
optimal policies for very simple problems, rather than focus on devel­ used conjointly in the titles of these articles. Moreover, Raban and
oping practical solutions to realistic problems. Gordon (2020) found that the focus the research of this era was on “data
Thus, as the emergence of data science has created a balance be­ collation in the social sciences, and not in a sense of extracting knowl­
tween theory and computation, the distinction between statisticians and edge from data as referred to this area today” (p. 1565).
non-statisticians has blurred (Cleveland, 2001). Historically, data anal­ It was not until many years later that data science was formed into a
ysis was associated with statisticians. However, the emergence of data field, when authors such as Cleveland (2001) and Wu (1997) started
science with its automation and machine learning has broken this bar­ referring to the practices of Tukey and others as data science (Donoho,
rier, enabling those who do not necessarily possess a background in 2017; Raban and Gordon, 2020). The Data Science Association3defines
statistics to also engage in meaningful data analysis. Even though data science as “the scientific study of the creation, validation and
automated software enables engagement with data analysis, the concept transformation of data to create meaning” and statistics as “the practice
of garbage in, garbage out is very important to consider. Therefore, the or science of collecting and analysing numerical data in large quanti­
rigour inherent within statisticians and statistical methods can provide a ties.” Herein lies the first hint of the relationship between statistics and
solid foundation for data scientists to ensure their output remains reli­ data science, as the former definition appears to encompass the bread
able and valid in practice. and butter of an applied statistician’s daily routine (i.e., use methodol­
In contrast to statistics, the story underlying data science began ogy to make inferences from data) (Donoho, 2017). Nevertheless, data
comparatively recently, over fifty years ago, when the American math­
ematician John Tukey referred to a novel science focused on learning
from data (Tukey, 1962). Raban and Gordon (2020) carried out a search 3
[Link]

3
H. Hassani et al. Technological Forecasting & Social Change 173 (2021) 121111

scientists tend to downplay the importance of the discipline of tradi­ Table 1


tional statistics and intentionally obscure the evident overlap of the two Selected data scientist job adverts and excerpts from their descriptions.
fields (see for example, Granville 2014; Matteson 2020). For that reason, Company Job Title Excerpts from Job Descriptions
some statisticians feel that data science marginalises statistics (Donoho, Google Data Scientist Master’s degree in a quantitative
2017). Engineering discipline (e.g., Statistics, …), expertise
with statistical data analysis such as
By contrast, De Veaux et al., (2017) noted that statistics forms part of
linear models, skills in selecting the right
the primary theoretical foundations of data science; Weihs and Ickstadt statistical tools given a data analysis
(2018) and Cao (2017) noted that as a scientific discipline, data science problem.
is influenced by statistics. Professor Broman (2013) took a more
authoritative stance in explicitly stating that data science is statistics, Dyson Head of Data Solid foundations on statistical and
Science scientific methods.
and anyone who analyses data is doing statistics. Carmichael and Mar­
ron (2018) added to the above ideology by pointing out that data science Revolut Head of Data Deep understanding of fundamentals of
has its origins in statistics and is all about learning from data, which is Science probability and statistics.
traditionally the business of statistics.
Farfetch Data Scientist Master’s degree, or higher, in a
However, one key difference is that statisticians are interested in
quantitative domain such as
developing models that are then confirmed by data. In contrast, data Mathematics, Operational Research,
scientists are more interested in the application of machine learning and Statistics, or similar.
data mining without being restricted by models. Intellectuals such as
Professor Andrew Gelman (2013), however, opines that statistics is the Facebook Data Science Understanding of statistical analysis.
Manager, Ads
least important component of data science, and Hardin et al., (2015)
Playstation Data Scientist Solid theoretical and practical
asserted incorrectly (to the best of our knowledge) that the profession of understanding of Statistics (e.g.,
statistics changed its name to data science! Overall, it is evident that hypothesis testing, experimentation,
there is disagreement between academics over the terminology, value, regressions).
and contribution of both disciplines. It is our opinion that such extreme
Warner Bros. Data Scientist Strong knowledge of statistical
views from intellectuals in both disciplines are not aiding the collabo­ Entertainment techniques.
rative advancement that is required for the benefit of statisticians and Amazon Data Scientist Outstanding quantitative modelling and
data scientists. statistical analysis skills.
Whilst practitioners are more interested in user acceptance, results,
Deloitte UK Data Scientist Strong statistics skills including
and reliability, the M4-competition highlighted the need for statisticians
distributions, statistical testing,
and data scientists to work collaboratively. In particular, the M4- regression, etc.
competition saw statisticians and data scientists compete at fore­
casting 100,000 time series. Interestingly, pure machine learning
Note: These job adverts appeared on various online platforms during the year 2020.
methods and pure statistical methods reported poor accuracy in relation
to hybrid models that utilised both statistical and machine learning
science and statistics at the heart of due process, research, and
features (Makridakis et al., 2020). Whilst these findings were consistent
decision-making.
with those in Makridakis et al., (2018), they differ from those in other
Some view the growth of data science as a threat to the long-term
machine learning studies such as Salaken et al., (2017). However, the
status of the discipline of statistics (Diggle, 2015) while others view
results from machine learning studies claiming superior forecasting ca­
data science as a challenging opportunity for statisticians (Ridgway,
pabilities cannot be replicated or reproduced as the data and algorithms
2015). Barber (2018), for example, praises the relationship between
are not publicly available (Makridakis et al., 2020), thereby hindering
its reliability and validity of the claims within machine learning studies.
Over a decade ago, Hal Varian predicted that the ‘sexy’ job in de­
mand between 2009-2019 would be that of a statistician (Lohr, 2009;
Davenport & Patil, 2012). However, a quick search of job opportunities
on various platforms indicates that the number of roles for data scientists
exceed the number for statisticians (in line with the Google Trends
findings in Fig. 1 where the data scientist role was more popular than
that of a statistician). Thus, Hal Varian’s prediction would appear flawed
to those who do not see the complementary nature of statistics and data
science. A closer look at the job adverts (Table 1) do uncover the
continuing importance of statistics within the field of data science as
evident in several roles that were advertised by high-profile companies.
The authors of the current article do not suggest that all data scientist
jobs in the market would follow a similar pattern, but below are few
examples of excerpts from job adverts:
The debate about the relationship between statistics and data science
is grounded in anecdote and is occasionally viewed as pointless or even
non-sensical. The contentions have even given rise to the definition of a
data scientist as someone who is better at statistics than any software
engineer and better at software engineering than any statistician
(Donoho, 2017). This attitude underlines the mounting issues regarding
appropriate definitions, assignments, and applications of these disci­
plines and problems around the incomplete understanding of what they
involve (see for example, Carmichael and Marron (2018) and references
therein). Importantly, this lack of clarity (Nantais, 2019) has arisen
within the context of a world that, over the last decade, has placed data Fig. 2. Data science ingredients (van der Aalst 2016, p. 12).

4
H. Hassani et al. Technological Forecasting & Social Change 173 (2021) 121111

statistics and data science. Van der Aalst (2016) included statistics as an from data. Statisticians should appreciate the value generated by data
ingredient contributing to data science (Fig. 2), producing an outlook science in scaling the application of statistics to Big Data via technology
that supports the conclusion of Nantais (2019) that “data science is (Donoho, 2017; Greenhouse, 2013; Hardin et al., 2015). Such an
something more than statistics” and going so far as to claim that sta­ appreciation would ensure that the next generation of Big Data analysts
tistics can benefit from the emergence of data science. Van der Aalst will be born out of the combination of sound statistical knowledge and
(2016) also wrote that the discipline of data science is an amalgamation data science skills that are mandatory for future employability and
of classical disciplines such as statistics, data mining, databases, and longevity of both disciplines.
distributed systems. In line with this argument, universities should give equal promi­
The modern world has undoubtedly come to embrace the importance nence to data science and statistics for two main reasons. First, Big Data
of data (Bean, 2020), including Big Data (Mills, 2019), whatever the can lead to inaccurate results by providing false positives at the hy­
issue under investigation and this has led to an increased need for pothesis testing stage during statistical data analysis (McFarland and
adequate data interpretation. In search of information, governments, McFarland, 2015; Hassani and Silva, 2015). As such, a data scientist
corporations, academics, and other organisations have not only pushed with mastery of machine learning, statistics, and analytics (Kozyrkov,
to expand the pervasiveness of data-driven decision-making but have 2018) can be extremely beneficial for businesses in such scenarios and
also allocated significant resources to data mining and the capture of still can help ensure that the data analytics performed appreciates that
more information (see for example, Hassani et al., 2014; Geum et al., common intuition does not always equate to mathematical correctness
2015; Li et al., 2019; Chen et al., 2018; Iqbal et al., 2020; McKinsey (Leetaru, 2019). Kozyrkov (2018) argues that a statistician is ‘your best
Analytics, 2018). It is unsurprising then, that technology companies are protection against fooling yourself in an uncertain world’ as they
pushing technologies that can serve as a data source, notably, ambient emphasize on determining whether the methods applied are apt for the
intelligence, ‘everyware’, and the Internet of Things (see for example, problem and agonize over which inferences are valid from the infor­
Lo and Campos, 2018; Islam et al., 2020; Carayannis et al., 2018). mation at hand.
As technology evolves, so does the role and importance of the data The automation offered by machine learning generally comes at the
scientist (Biswal, 2019). Data science specialists are in high demand expense of the rigorous process offered by statistics for data analysis
throughout the public and private sectors, across government de­ through crucial steps such as sampling, exploratory and descriptive
partments and tech start-ups (Holak, 2019; Teichmann, 2019; Trivedi, analysis, inference, prediction, measurement of uncertainty, and inter­
2018). Big Data has opened the door to creativity and innovation as well pretation (Galeano and Pena, 2019). Here, it is important to note that
as scientific advances achieved through applied statistics and data sci­ data scientists would argue that they too provide careful consideration
ence. As the variety, volume, and value of big data increases (Hassani to inferences by relying on training, test and validation sets that do not
et al., 2020), it becomes vital to identify the skills and qualities that data require specific statistical or mathematical knowledge to identify issues.
science and statistics should provide and re-think how we use and work However, Kozyrkov’s (2018) emphasis is on the notion of combining
with data. Moreover, this shift in thinking offers university systems statistics, machine learning and analytics to create a well-rounded data
worldwide an opportunity to update their current statistics curriculums scientist. Such data scientists can help stakeholders take prudent risks by
and place a greater emphasis on computation skills, which are vital in using the rigor of statistics to minimise the chance of unwise conclu­
the era of Big Data. The importance of ensuring that graduates are sions, be able to automate tricky tasks to pass the pure statistician’s strict
skilled at assessing data quality, finding meaning in data patterns, and controls and have the necessary coding skills to visualise and mine Big
understanding its business/social implications (Phillips, 2017) should Data with speed to uncover insights worthy of further investigation
not be underscored. (Kozyrkov, 2018). In addition, business knowledge (as relevant to a
field) is mandatory to avoid being fooled by data. Phillips (2017) asserts
3. The challenges posed by the evolution of big data that the importance of context and motive for analytics will continue to
remain mandatory even as data science matures. Galeano and Pena
Evidence-based decision-making processes have always been heavily (2019) believe in the wider benefits from the convergence of machine
reliant on data collection. However, the development of more data learning and statistical approaches of data analysis under the data sci­
collection procedures, mainly through ambient intelligence, is having ence umbrella.
different effects on the value, veracity, velocity, and volume of Big Data, Second, even though we live in an age of Big Data, not all problems
which comprise Big Data’s four distinguishing features. Typically, each are “Big Data problems” (for example, there are many small data
of these features is supposed to be robust, and volume, value, veracity, problems in supply chain) where we continue to need statistics as a core
and velocity are all expected to be high. The use of ambient intelligence foundation (Nantais, 2019). As an example of the importance of foun­
in general or multi-disciplinary contexts creates a great deal of noise and dational statistics, Fig. 3 shows a graph from an unpublished consul­
little clarity. The data, information, knowledge, wisdom (DIKW) pyra­ tancy report on the ongoing COVID-19 pandemic. Many analyses of this
mid (Rowley, 2007) is one framework that is widely used to explain the crisis must rely on small sample sizes, and “machine learning often
inherent relationships within data. The application of this framework performs poorly on small datasets” (Faraway and Augustin 2018,
reveals that the rise in ambient intelligence does not result in greater p.144). In this case, a basic exponential model (with a series of addi­
knowledge or wisdom but simply more data as converting data into tional assumptions, as statisticians do) was used to predict the emer­
information becomes more and more difficult. It is up to the analyst or gence of COVID-19 cases in Sri Lanka over a 30-day period. The model
observer, that is, the person who utilises data to achieve a certain was built on 28th March 2020 using only 18 data points, and it provided
outcome, to address this challenge. The question of whether addressing a considerably accurate forecast using simple foundational statistics.
this issue should fall under the purview of statistics, data science, or both This example showcases that in some situations, simple statistical
may itself become a problem. models are still useful (Breiman, 2001; Koehrsen, 2019). This example
Academic institutions offering master’s degrees often connect the does not intend to undermine the simultaneous contributions of data
importance of data science with the growth of Big Data (Donoho, 2017) science during the pandemic (see for example, Marr, 2020), instead it
instead of referring to the importance of statistics. However, it is note­ offers evidence of the valuable role of statistics in an age of data science.
worthy that statisticians (like data scientists are today) have been In short, the authors believe that the science of statistics enables data
dealing with Big Data in the form of census data since the beginning and science, and data science expands the application of statistics.
are comfortable with large datasets (Donoho, 2017; Carmichael and This section ends with a strength-weaknesses-opportunities-threats
Marron, 2018). Nevertheless, the learning capabilities through data (SWOT) analysis matrix (see e.g., Helms & Nixon, 2010). The SWOT
science is higher than with statistics as machine learning ‘learns’ more matrix will be used as an organising framework to analyse the challenges

5
H. Hassani et al. Technological Forecasting & Social Change 173 (2021) 121111

Fig. 3. Daily forecast for COVID-19 cases in Sri Lanka11.

posed by the evolution of Big Data. Here, the SWOT analysis is done whether it is time to go “back to basics” (i.e., statistical analysis).
from the perspective of a data scientist (Fig. 4). A similar analysis from
the perspective of a statistician is performed towards the end of Section 4. The limitations of data science
4.
As Fig. 4 indicates, the field of data science has several strengths. At first glance, data scientists appear to be the ideal specialists to
First, data scientists are generally willing to adopt and apply new come to grips with Big Data and its constantly evolving dynamics.
technologies. This is increasingly important in a world where the vol­ However, they face barriers in terms of using datasets to model forecasts
ume, velocity and veracity of data is expanding exponentially. Second, and achieve optimisation under certain conditions.
data science has expanded the application of statistics. Data scientists Machine learning experts have evaluated several options to over­
are also perceived as “Jacks of all trades” due to their skills across come the barriers. For example, k-fold cross validation (Vandeput, 2020)
several areas, such as machine learning, statistics, and analytics. The flip is popular at present, but there is much potential through the develop­
side of being versatile and having knowledge in many areas, is of course ment of ‘AutoML’ for overcoming the optimisation problem (Cronin,
that one may be “Master of none”. Another weakness of data science is 2018). The idea underlying ‘AutoML’ is the generation of a machine
also that it is less rigorous than statistics. Since the field has not reached learning algorithm that is then used to optimize the parameters of a
maturity, this might change over time. Finally, it may be too focused on second algorithm. Typically, data collection procedures are customised
data, and it is well-known that converting data into meaningful and to respond to a set problem. Analytical and ethical reasoning skills, as
actionable insights is often difficult. well as a knowledge of data acquisition, archiving, and architecture are
There are, however, several opportunities and threats for the field of prerequisite tools that data science ultimately exploits to evaluate and
data science. One significant opportunity is related around the conver­ present data. Data scientists often develop the necessary tools to inter­
gence of fields and technologies. For example, as discussed earlier, there pret data or, alternately, if asked, may use pre-existing tools. Impor­
is an on-going convergence of statistical analysis and machine learning tantly, quite often it is not the processes used by data scientists that can
methods. The general well-roundedness of data scientists may also po­ cause errors but the data itself.
sition themselves to seize opportunities that may arise in the near future. However, there are situations when the processes fail, resulting in
Since technologies are developed at a fast pace, it is difficult to forecast data leakage (Pierre, 2018), which is one of the top ten machine learning
where these fields will be some years from now. In terms of threats, there mistakes (Nisbet et al., 2009) that can lead to flawed conclusions. In
is a general realisation that not all problems are “Big Data problems” and addition, overfitting (models with extremely low training errors but
that basic and foundational statistics may be sufficient to tackle the high testing errors) and underfitting (models with high training and
myriad of small(er) data problems. There is a risk that many will ask testing errors) of models are also concerns for data scientists (Vandeput,
2020). Overfitting was identified as one of the shortcomings of pure
machine learning methods during the M4 competition (Makridakis
et al., 2020). Consequently, ensuring the quality of data is among a data
scientist’s challenges (Hazen et al., 2014; Ardagna et al., 2018; Alaoui
and Gahi, 2019; Ghasemaghaei and Calic, 2019).
To determine the quality and thus suitability of data, it is vital to
possess a full understanding of the origins of data and the data collection
processes used. Statistics too is affected by data quality, but the rigor and
accepted processes in place demands that statisticians pay careful
attention to the design of data collection instruments, sampling bias, and
the reliability and validity of the methods and data collected for anal­
ysis. Important information may be lost if it is inconsistently collected,
or it may be overlooked entirely. This incomplete set of data will lack
predictive authority and will be unable to provide insights into the issue
under investigation.
Understanding the trade-off between accuracy and interpretability is
Fig. 4. SWOT analysis from the perspective of data science.

6
H. Hassani et al. Technological Forecasting & Social Change 173 (2021) 121111

important for data analytics (Rane, 2018) and crucial when discussing noise ratio tends to be low (Silver, 2012; Hassani et al., 2015). Noise
the failures of data science. Interpretability is important for human cu­ distracts us from the truth (Silver, 2012) and therefore data scientists
riosity and learning, finding meaning in the world, gaining knowledge, can be misled if they fail to account for the signal and noise problem in
detecting bias, social acceptance, managing social interactions, and Big Data. Here, statistical techniques such as decomposition and filtering
debugging and auditing (Molnar, 2020). Many of the complex data can add value to a data scientist’s tool kit when dealing with Big Data
science led applications lack interpretability even though they produce that requires differentiation between signal and noise (Galeano and
highly accurate results (Rodriguez, 2018). See for example Fig. 5. Pena, 2019).
Statistical methods such as linear regression is highly interpretable One might argue that the reiterative character of the statistical
but comparatively low in terms of predictive power whilst neural net­ analysis process used to quantify data uncertainty puts statisticians in a
works is an example of machine learning techniques that are highly stronger position to deal with the issues posed by Big Data. However, it
accurate but lack interpretability. Depending on the problem at hand, is no secret that data scientists are better positioned to mine Big Data
one needs to determine whether they are interested in obtaining the best and thus, when their skills in machine learning are topped up with
results or understanding how those results were produced (Rodriguez, knowledge in statistics and business, then they will be better positioned
2018). The application of statistics is very useful where the objective to deal with the issues posed by Big Data. When building models, stat­
calls for understanding relationships and deriving models that can isticians focus on the examination of the correlations, causality between
interpret problems and generate forecasts (Galeano and Pena, 2019). variables, theory, and predictors, and emphasise the certainty of the
Yet, relying on interpretability alone (for example using purely statis­ applied parameters, as illustrated through margins of error or confi­
tical models) can provide us with fairness, privacy, reliability, causality, dence intervals. In contrast, data scientists would be more interested in
and trust, at the expense of accuracy (Molnar, 2020). the prediction errors on a test set and the identification of which features
However, it is important to remember that accuracy does not always to use.
give the full picture (Rawat, 2019) and interpretability is crucial unless Data science, like other scientific fields, should be precise in its
the model has no significant impact or relates to a problem that has been identification and application of the correct tools to a problem. Some­
well studied or is applied to situation where we are not concerned with times the best tool for a use-case is statistics, exploratory data analysis or
potential manipulation of the system (Molnar, 2020). Thus, data scien­ a simple yet understandable visualization of descriptive statistics. While
tists who also knowledgeable in statistics and statisticians who are understanding how AI and machine learning systems work is vital to a
equipped with data science skills are better placed to navigate the career in data science, these professionals often overlook the basics. For
trade-off between accuracy and interpretability in practice. It also example, data science focuses on comparing many methods to create the
noteworthy that data scientists are now focusing on adding interpret­ best machine learning model while statistics instead seeks to improve a
ability to machine learning to demystify the black box (Srinivasan, single, simple model to best suit the data. Overall, the main limitations
2019). Murdoch et al., (2019) proposed the predictive, descriptive, of data science relate to small samples (Faraway and Augustin, 2018),
relevant (PDR) framework for discussing interpretations whilst and at predicting black swans (Taleb, 2007; Rodriguez, 2017) as the
model-agnostic interpretation methods are another example of such entire premise of machine learning is about learning from data.
efforts (Molnar, 2020). Furthermore, Dunson (2018) notes that automated methods (such as
Data scientists and statisticians are distinguished primarily by their those increasingly used by data scientists) presents a lack of consider­
different interests and approaches to problem solving, but aligned by ation of interpretability, quantification of uncertainty (or hypothesis
their end goal, that is data analysis and prediction. It is expected that testing), applications with limited training data, and selection bias.
statistical analysis will continue to remain core to scientific modelling Lastly, this section ends with a SWOT analysis from a statistician’s
with well-structured data whilst machine learning and AI will succeed perspective addressing the limitations of data science (Fig. 6). The
where relationships in data are not well understood (Galeano and Pena, SWOT framework is used to examine the impact of the emergence of
2019). However, it is no secret that AI is not yet advanced enough to data science and Big Data on the future of the field of statistics, as well as
identify anomalies in data that would require the expertise of ‘data the current status and future position of the statistics field.
preparers’ – a role that is increasingly important for ensuring ‘validity’ As Fig. 6 shows, the field of statistics is useful because it provides the
of big data (Phillips, 2017). As such, in a world where data analysis is foundation and oftentimes the basics are all that are needed, especially,
becoming more valuable, existing statistical theory and methodology when it comes to the analysis of structured data. Statistics also scores
also becomes more valuable (Carmichael and Marron, 2018). Assuring high in terms of interpretability. The flip side of this is of course that it is
the integrity of data is becoming more and more important as ambient less useful for data mining and analysing unstructured data.
intelligence becomes increasingly pervasive as a new methodology for There are also opportunities as well as some threats looming in the
data collection is used. horizon. For example, there are untapped potential to better apply sta­
The human brain is prone to finding patterns in random noise and tistics to Big Data via the use of technology. Data science also has the
this problem is even more prevalent in Big Data where the signal-to- potential to expand the application of statistics. From a statistics point of

Fig. 5. Interpretability vs predictive power.

1
Figure 3 was extracted from an unpublished consultancy report prepared by
Emmanuel Sirimal Silva and RemediumOne ([Link]
to provide insights into potential resource constraints during the COVID-19
pandemic. Fig. 6. SWOT analysis from the perspective of statistics.

7
H. Hassani et al. Technological Forecasting & Social Change 173 (2021) 121111

view, the emergence and growth of the data science field is a threat, but Given that assumptions rarely hold in the real world, one would argue in
one that has been around for a very long time (Diggle, 2015). For favour of the data scientist’s approach to analytics. However, statisti­
example, the new field of data science is generally perceived as being cians too can rely on non-parametric approaches when assumptions are
more relevant and having more real-world applications. In general, data violated. The focus of model building is also different. Data scientists
scientists also seem to be more up-date and ready to adopt new tech­ focus on prediction errors and identification of features whilst statisti­
nologies such as AI and machine learning. For the field of statistics, it cians emphasize on the examination of correlations, causality between
becomes important to be more adaptable and receptive to new variables, theory, and predictors.
technologies. As noted earlier, another distinguishing feature is their relative
emphasis on interpretability versus accuracy. The strength of statistics is
5. Discussion interpretability, while for data sciences it is accuracy. The challenge and
opportunity for statisticians and data scientists would be to collaborate
This section discusses key distinctions between the fields of statistics on efforts at creating machine learning algorithms that are both inter­
and data science. These distinctions are summarised in Table 2. As re­ pretable and highly accurate. Another key difference is their preferred
flected by the views of the various authors cited in the previous sections, type of data for analysis. Statisticians prefer to work with well-
the dividing line between the two fields is not clear. The emergence and structured data, while data science shines when the data are highly
evolution of Big Data has further blurred the distinction between these unstructured. Given that Big Data continues to generate large amounts
two fields. of unstructured data that needs analysis, it is evident that data science
In terms of theoretical origins, the field of statistics has deep roots skills are increasingly important on a day-to-day basis. The data wran­
that go back to the early work in mathematical biology and biometry gling skills which are a core focus of data science courses are of much
(Norton, 1978), while data science is a relatively new field that builds value in the modern world. Statisticians can benefit by incorporating
directly on statistics and probability (Tayo, 2019). There are greater such skills and expertise into their own courses to ensure the skill set is
differences when it comes to the main foci of the two fields. Statistics developed. However, the application of machine learning is also
strives for theoretical sophistication, while data science aims to provide becoming more accessible with automated algorithms enabling non-
practical solutions to real-world problems (van der Aalst, 2016). How­ experts to benefit from its application and generation of output. Yet,
ever, it is important to reflect whether practical solutions that cannot be interpretation of this output and the ability to make business sense of the
theoretically supported add value to managerial decision making. This is data is key for analytics to be value adding.
because, as humans, we strive to obtain a deep understanding of phe­ The fact that statistics and data science can be complementary
nomena and tend to prefer outcomes that can be explained. should be emphasised. Data science which relies on data mining and
This difference is also reflected in their main approaches. Statisti­ machine learning techniques are a mixture of statistics, AI, and searches
cians focus on methodology/model development (Wild et al., 2018) and in databases (Ribeiro et al., 2017; Gorunescu, 2011). The Cross Industry
confirmation and prefer precise models with strict assumptions (Olhede Standard Process for Data Mining (CRISP-DM) methodology is a sound
and Wolfe, 2018). Data scientists, on the other, apply new techniques example which demonstrates the complementary nature of the two
such as machine learning and try to avoid being restricted by models. fields (Ribeiro et al., 2017). Przybyla (2020) notes several similarities
between data scientists and statisticians, from the understanding of
mathematics, investigating problems, exploratory data analysis, ana­
Table 2
lysing trends, creating forecasts, visualisations, to reporting findings to
Key differences between statistics and data science.
non-technical users. Furthermore, there are two types of statistics called
Statistics Data science descriptive and inferential statistics (Singpurwalla, 2013; van der Aalst,
Theoretical Mathematical biology and Statistics (De Veaux et al.,
2016), and similarly, data mining is composed of two types of tech­
origins biometry (Norton, 1978) 2017; Weihs and Ickstadt,
2018; Cao, 2017) and niques called descriptive and predictive (Ribeiro et al., 2017). Saltz and
probability (Tayo, 2019) Stanton (2017) presents the “four A’s” of data science (i.e., data archi­
tecture, data acquisition, data analysis and data archiving) and notes
Main focus Theoretical sophistication ( Practical solutions to real
that the analysis phase requires statistical aspects (Ribeiro et al., 2017).
Carmichael and Marron, problems (Cleveland, 2001;
2018; Olhede and Wolfe, van der Aalst, 2016)
Therefore, despite the differences, the two fields share a common end
2018; van der Aalst, 2016) goal. As such, it makes sense for experts in both fields to work collab­
oratively to develop data analysis and prediction capability further for
Main approach Methodology/model Application of machine the benefit of society. By working collaboratively with data scientists,
development and learning and data mining
statisticians can ensure statistics is placed firmly at the heart of data
confirmation (Wild et al., (avoid being restricted by
2018) (precise models with models) (Ribeiro et al., 2017; science and aid in protecting the identity of data science as an ongoing
strict assumptions, Olhede Gorunescu, 2011) true science (MacGillivray, 2021). The complementary nature of data
and Wolfe, (2018)) science and statistics was further emphasised by Weihs and Ickstadt:
Focus of model Examination of correlations, Hyperparameter optimization Statistics is one of the most important disciplines to provide tools and
building causality between variables, and feature selection ( methods to find structure in and to give deeper insight into data, and the
theory, and predictors ( Nantasenamat, 2020; Shaikh,
Galeano and Pena, 2019) 2018)
most important discipline to analyse and quantify uncertainty…Finding
structure in data and making predictions are the most important steps in
Interpretability High interpretability High accuracy Data Science. Here, in particular, statistical methods are essential since
vs. accuracy Low accuracy (Molnar, 2020; Low interpretability (Olhede they are able to handle many different analytical tasks.
Olhede and Wolfe, 2018) and Wolfe, 2018; Donoho,
2017; Hall, 2016; Rane, 2018; (Weihs and Ickstadt 2018, p.189-191)
Rodriguez, 2018)

Preferred type of Well-structured data ( Unstructured data (Galeano


6. Conclusions
data Galeano and Pena, 2019; and Pena, 2019; Olhede and
Olhede and Wolfe, 2018) Wolfe, 2018) Through a review of over 100 sources representing both fields of
statistics and data science, this article developed a pragmatic perspec­
End goal Data analysis and prediction
tive on the importance and relevance of the science of statistics in an age

8
H. Hassani et al. Technological Forecasting & Social Change 173 (2021) 121111

of data science. The research uncovered continuing debate and (2019) eloquently said whilst accepting the 2019 international prize in
disagreement (in most cases) by statisticians and data scientists statistics:
regarding the superiority of each other’s disciplines. However, the evi­
“I tell my fretful friends that we have a strong positive regression coeffi­
dence presented herewith clearly shows the growing need for and
cient with data science, as long as we remember not to let the inferential
importance of positive collaborative efforts between data scientists and
side of statistical thinking get lost in the excitement over new technology”.
statisticians, as the science of statistics enables data science, and data
science expands the application of statistics. The SWOT analysis and So, what is the future? Based on the review, we foresee a future
discussions around the advantages and limitations around both disci­ where the synergies made possible through the collaboration between
plines further highlight the potential for synergies. statisticians and data scientists will drive reliable and valid data ana­
In summary, statisticians should embrace data science, approaching lytics and empower the continued relevance of both disciplines in a
the collaboration with equal parts confidence in what statisticians can world where ‘big’ and ‘small’ data problems will continue to emerge.
offer and humility to learn from the newer field (Diggle, 2015).
Furthermore, the era of Big Data demands that statisticians broaden
their understanding of statistical practice to be inclusive of all those who Declaration of Competing Interest
learn from data (Rodriguez, 2015). This is vital as data science has
helped improve the reproducibility and communication of statistical The authors declare that they have no known competing financial
outcomes, thereby adding to the reliability and validity of scientific interests or personal relationships that could have appeared to influence
studies (Carmichael and Marron, 2018). There is no doubt that in the the work reported in this paper.
absence of data science, statistics (as asserted by Diggle 2015) would
make an essential but incomplete contribution in this age of Big Data. Acknowledgements
Likewise, data scientists should understand that statistics teaches the
scientific method (Carmichael and Marron, 2018) that underlies data The authors would like to acknowledge and thank the Editor, and the
science. In addition, statisticians can develop new theories and methods two anonymous referees for their constructive comments, guidance, and
to meet the upcoming challenges of data science (Olhede and Wolfe, patience throughout the review process. The usual disclaimer applies.
2018). Dr. Hassani would also like to express his sincere gratitude to Professors
Furthermore, it is important to acknowledge that the frontier be­ Vahidi Asl, Faghihi, and Taheriyoun from Shahid Beheshti University
tween the fields of statistics and data science is blurred and not easy to (SBU), Iran for their valuable insights, comments, suggestions, and ex­
demarcate. In some cases, the two fields are indistinguishable from each periences shared with him on this topic.
other and therefore share a close association. For example, data scien­
tists are expected to master statistics, machine learning and analytics References
(Kozyrkov, 2018), but statisticians themselves must align with data
science or risk being left behind (Ben-Zvi et al., 2018). However, it is Ardagna, D., Cappiello, C., Samá, W., Vitali, M., 2018. Context-aware data quality
assessment for big data. Future Gener. Comput. Syst. 89, 548–562.
misleading to argue that data science is a rebranding of statistics (Car­ Alaoui, I.E., Gahi, Y., 2019. The impact of big data quality on sentiment analysis
michael and Marron, 2018), or vice versa. There is more to data science approaches. Proc. Comput. Sci. 160, 803–810.
than statistics alone and vice versa (as there are problems that demand Barber, M., 2018. Data science concepts you need to know! Part 1. Towards Data Sci.
Available via [Link]
the application of statistics over data science). The need of the hour is for 818745 [Accessed: 09.05.2020].
data scientists to genuinely appreciate statistics as an important element Baškarada, S., Koronios, A., 2017. Unicorn data scientist: the rarest of breeds. Program:
of data science, and for statisticians to celebrate the emergence of data electronic library Informat. Syst. 51 (1), 65–74.
Baumeister, R.F., Leary, M.R., 1997. Writing narrative literature reviews. Rev. Gen.
science for making statistics more applicable and accessible across the
Psychol. 1 (3), 311–320.
globe (Carmichael and Marron, 2018). Bean, R., 2020. Now more than ever! – the necessity of data, analytics, and expertise.
At the end of the M4 forecasting competition, stakeholders Forbes. Available via. [Link]
entral/2020/04/17/now-more-than-ever–the-necessity-of-data-analytic
concluded that the way forward was to exploit the advantages of both
s-and-expertise/#5dccc25a20f4 [Accessed: 10.05.2020].
machine learning and statistical methods (Makridakis et al., 2020). He Ben-Zvi, D., Makar, K., Garfield, J., 2018. International Handbook of Research in
and Lin (2020) outlined 10 research challenge areas that have piqued Statistics Education. Springer International Publishing.
the interest of statisticians from a data science perspective, and vice Biswal, M., 2019. Medium.
Box, G., Hunter, J.S., Hunter, W.G., 2005. Statistics for Experimenters, 2nd Ed. John
versa. Thus, universities have an active role to play in facilitating the Wiley & Sons.
exchange of ideas between statisticians and those aspiring to be data Breiman, L., 2001. Statistical modeling: the two cultures. Statistic. Sci. 16 (3), 199–231.
scientists to stimulate advances in all areas of knowledge (Galeano and Broman, K., 2013. Data science is statistics. Blog post. Available via. [Link]
[Link]/2013/04/05/data-science-is-statistics/ [Accessed: 09.05.2020].
Pena, 2019; Faraway and Augustin, 2018). The emergence and growth Cao, L., 2017. Data Science: A Comprehensive Overview. ACM Comput. Surv. 50 (3), 43:
of Big Data ensures that data science will remain an extremely important 1-43:42.
and comparatively more popular field of study in relation to statistics. Carayannis, E.G., Del Guidice, M., Soto-Acosta, P., 2018. Disruptive technological change
within knowledge-driven economies: the future of the Internet of Things (IoT).
There are also interesting developments currently taking place in the Technol. Forecast. Soc. Change 136, 265–267.
field of data science. One notable example is the EDISON Data Science Carmichael, I., Marron, J.S., 2018. Data science vs. statistics: two cultures? Japan. J.
Framework (Manieri et al., 2015; Demchenko et al., 2016) which at­ Statistic. Data Sci. 1, 117–138.
Chen, C.P., Weng, J.-Y., Yang, C.-S., Tseng, F.-M., 2018. Employing a data mining
tempts to lay the foundation for the professionalisation of the data sci­
approach for identification of mobile opinion leaders and their content usage
ence field. This framework explicitly recognises the importance of patterns in large telecommunications datasets. Technol. Forecast. Soc. Change 130,
statistics for data scientists. 88–98.
Cleveland, W., 2001. Data science: an action plan for expanding the technical areas of the
Therefore, in the era of data science, statistics, “the most unselfish of
field of statistics. Int. Statis. Rev. 69, 21–26.
science” (Rodriguez, 2013) is far from dead. Statistics lays the founda­ Cronin, S. K. (2018). What’s auto ML? Available via: [Link]
tion for data science and adds to its reliability and validity whilst data whats-auto-ml-b457d2710f9d [Accessed: 23.05.2020].
science powers the application of statistics to Big Data through its Davenport, T.H., Patil, D.J., 2012. Data Scientist: The Sexiest Job of the 21st Century.
Harv. Bus. Rev. October. Available via: [Link]
incorporation of technology and AI. It is the authors’ view that a good ist-the-sexiest-job-of-the-21st-century [Accessed: 22.05.2020].
data scientist could benefit from a solid (if not, at least foundational) Davison, J. (2018). No, Machine Learning is not just glorified Statistics. Available via: htt
theoretical and practical understanding of statistics, in addition to ps://[Link]/no-machine-learning-is-not-just-glorified-statistic
s-26d3952234e3 [Accessed: 22.05.2020].
expertise in machine learning, analytics, and business knowledge Dayal, V., 2020. Quantitative Economics with R: A Data Science Approach. Springer
(which differentiates a data scientist from a statistician). As Brad Efron Nature, Singapore.

9
H. Hassani et al. Technological Forecasting & Social Change 173 (2021) 121111

... & Demchenko, Y., Belloum, A., Los, W., Wiktorski, T., Manieri, A., Brocks, H., Lohr, S., 2009. For Today’s Graduate, Just One Word: Statistics. The New York Times.
Brewer, S., 2016. EDISON data science framework: a foundation for building data Available via. [Link]
science profession for research and industry. 2016 IEEE International Conference on _r=1& [Accessed: 09.05.2020].
Cloud Computing Technology and Science (CloudCom). IEEE, pp. 620–626. Makridakis, S., Spiliotis, E., Assimakopoulos, V, 2020. The M4 competition: 100,000 time
Veaux, De, et al., 2017. Curriculum Guidelines for Undergraduate Programs in Data series and 61 forecasting methods. Int. J. Forecast. 36 (1), 54–74.
Science. Ann. Rev. Statis. Appl. 4, 15–30. Makridakis, S., Spiliotis, E., Assimakopoulos, V, 2018. Statistical and machine learning
Diggle, P.J, 2015. Statistics: a data science for the 21st century. J. R. Statis. Soc. forecasting methods: concerns and ways forward. PLoS One 13 (3), 1–26.
(Statistics in Society: Series A) 178, 793–813. Part 4. ... & Manieri, A., Brewer, S., Riestra, R., Demchenko, Y., Hemmje, M., Wiktorski, T.,
Donoho, D., 2017. 50 years of data science. J. Comput. Graph. Statist. 26 (4), 745–766. Frey, J., 2015. Data Science Professional uncovered: How the EDISON Project will
Dunson, D.B., 2018. Statistics in the big data era: Failures of the machine. Statis. Prob. contribute to a widely accepted profile for Data Scientists. 2015 IEEE 7th
Lett. 136, 4–9. International Conference on Cloud Computing Technology and Science (CloudCom).
Efron, B. (2019) Acceptance speech, 2019. Available via: [Link] IEEE, pp. 588–593.
[Link] (Accessed: 22.07.2021). Marr, B., 2020. Coronavirus: how artificial intelligence, data science and technology is
Faraway, J.J., Augustin, N.H, 2018. When small data beats big data. Statis. Prob. Lett. used to fight the pandemic. Forbes. Available via. [Link]
136, 142–145. com/sites/bernardmarr/2020/03/13/coronavirus-how-artificial-intelligence-data
Galeano, P., Pena, D, 2019. Data science, big data and statistics. TEST 28, 289–329. -science-and-technology-is-used-to-fight-the-pandemic/#1aa797915f5f [Accessed:
Gelman, A., 2013. Statistics is the least important part of data science. Blog post. 10.05.2020].
Available via. [Link] Marquardt, D.W., 1987. The importance of statisticians. J. Am. Statist. Assoc. 82 (397),
-important-part-data-science/ [Accessed: 09.05.2020]. 1–7.
Geum, Y., Lee, H., Lee, Y., Park, Y., 2015. Development of data-driven technology Matteson, S., 2020. How to become a data scientist without getting a Ph.D.
roadmap considering dependency: an ARM-based technology roadmapping. TechRepublic. Available via [Link]
Technol. Forecast. Soc. Change 91, 264–279. -a-data-scientist-without-getting-a-ph-d/ [Accessed: 09.05.2020].
Ghasemaghaei, M., Calic, G., 2019. Can big data improve firm decision quality? The role McFarland, D.A., MacFarland, H.R., 2015. Big Data and the danger of being precisely
of data quality and data diagnosticity. Decision Support Syst. 120, 38–49. inaccurate. Big Data Society, (July – December) 1–4.
Gorunescu, F., 2011. Data Mining: Concepts, Models and Techniques. Springer-Verlag, MacGillivray, H., 2021. Statistics and data science must speak together. Teach. Statistics
Berlin Heidelberg. 43, S5–S10.
Granville, V., 2014. Data science without statistics is possible, even desirable. Data McKinsey Analytics, 2018. Analytics comes of age. McKinsey & Company. Available via.
Science Central. Available via: [Link] [Link]
/data-science-without-statistics-is-possible-even-desirable [Accessed: 09.05.2020]. %20Analytics/Our%20Insights/Analytics%20comes%20of%20age/Analytics-come
Greenhouse, J.B., 2013. Statistical thinking: the bedrock of data science. Huffpost. [Link] [Accessed: 10.05.2020].
Available via. [Link] McKinsey Quarterly. (2009). Hal Varian on how the Web challenges managers. Available
k-of-data-science_b_3651121 [Accessed: 09.05.2020]. via: [Link]
Hall, P. (2016). Predictive modeling: striking a balance between accuracy and tions/our-insights/hal-varian-on-how-the-web-challenges-managers [Accessed:
interpretability. Available via: [Link] 22.05.2020].
ling-striking-a-balance-between-accuracy-and-interpretability/ [Accessed: McNutt, M., 2014. Raising the Bar. Science 345 (6192), 9.
26.07.2021]. Members, R. P. (2017). The r project for statistical computing. Available via: https:
Hassani, H., Saporta, G., Silva, E.S., 2014. Data mining and official statistics: the past, the //[Link]/[Accessed: 22.05.2020].
present and the future. Big Data 2 (1), 34–43. Mills, T., 2019. Why Big Data And Machine Learning Are Important In Our Society.
Hassani, H., Silva, E.S., 2015. Forecasting with big data: a review. Ann. Data Sci. 2 (1), Forbes. Available via [Link]
5–19. council/2019/01/07/why-big-data-and-machine-learning-are
Hassani, H., Silva, E.S., Unger, S., TajMazinani, M., Mac Feely, S, 2020. Artificial -important-in-our-society/#6eeec6097aa2 [Accessed: 10.05.2020].
Intelligence (AI) or Intelligence Augmentation (IA): What Is the Future?, 1. AI, Molnar, C. (2020). Interpretable machine learning: A guide for making black box models
pp. 143–155. explainable. Available via: [Link]
Hardin, J., Hoerl, R., Horton, N.J., Nolan, D., Baumer, B., Hall-Holt, O., Murrell, P., [Link] [Accessed: 22.05.2020].
Peng, R., Roback, P., Temple Lang, D., Ward, M.D., 2015. Data science in statistics Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B, 2019. Definitions, Methods,
curricula: preparing students to “think with data”. Am. Statistic. 69 (4), 343–353. And Applications In Interpretable Machine Learning, 116. PNAS, pp. 22071–22080.
Hazen, B.T., Boone, C.A., Ezell, J.D., Jones-Farmer, L.A., 2014. Data quality for data Nachtsheim, A.C., Stufken, J., 2019. Comments on: Data science, big data and statistics.
science, predictive analytics, and big data in supply chain management: An TEST 28, 345–348.
introduction to the problem and suggestions for research and applications. Int. J. Nahmias, S., 1979. Simple approximations for a variety of dynamic leadtime lost-sales
Prod. Econ. 154, 72–80. inventory models. Oper. Res. 27 (5), 857–1066.
He, X., Lin, X, 2020. Challenges and opportunities in statistics and data science: ten Nantais, J., 2019. Data Science or Statistics? Towards Data Sci. Available via [Link]
research areas. Harvard Data Scie. Rev. 2, 3. [Link] [Link]/data-science-or-statistics-9e826ebf7fe2 [Accessed:
99608f92.95388fcb. 09.05.2020].
Helms, M.M., Nixon, J., 2010. Exploring SWOT analysis–where are we now? J. Strat. Nantasenamat, C. (2020). How to build a machine learning model: a visual guide to
Manag. 3 (3), 215–251. learning data science. Available via: [Link]
Holak, B., 2019. Demand for data scientists is booming and will only increase. d-a-machine-learning-model-439ab8fb3fb1 [Accessed: 28.07.2021].
SearchBusinessAnalytics. Available via. [Link] Nisbet, R., Elder, J., Miner, G., 2009. Handbook of Statistical Analysis and Data Mining
com/feature/Demand-for-data-scientists-is-booming-and-will-increase [Accessed: Applications. Academic Press.
09.05.2020]. Norton, B.J., 1978. Karl pearson and statistics: the social origins of scientific innovation.
Huang, R. (2019). How to Learn Data Science Without a Degree. Available via: https:// Soc. Stud. Sci. 8 (1), 3–34.
[Link]/blog/learn-data-science-without-degree/ [Accessed: Olhede, S.C., Wolfe, P.J., 2018. The future of statistics and data science. Statis. Probab.
22.05.2020]. Lett. 136, 46–50.
Islam, N., Marinakis, Y., Majadillas, M.A., Fink, M., Walsh, S.T., 2020. Here there be Patil, A. (2018). How to self-learn statistics of data science. Available via: [Link]
dragons, a pre-roadmap construct for IoT service infrastructure. Technol. Forecast. [Link]/ml-research-lab/how-to-self-learn-statistics-of-data-science-c05db1f7cfc3
Soc. Change 155, 119073. [Accessed: 22.05.2020].
Iqbal, R., Doctor, F., More, B., Mahmud, S., Yousuf, U., 2020. Big data analytics: Phillips, F., 2017. A perspective on ‘Big Data. Science and Public Policy 44 (5), 730–737.
Computational intelligence techniques and application areas. Technol. Forecast. Soc. Pierre, R. (2018). Data Leakage, Part I: Think You Have a Great Machine Learning
Change 153, 119253. Model? Think Again. Available via: [Link]
Koehrsen, W. (2019). Thoughts on the two cultures of statistical modeling. Available via: part-i-think-you-have-a-great-machine-learning-model-think-again-ad44921fbf34
[Link] [Accessed: 22.05.2020].
eling-72d75a9e06c2 [Accessed: 22.05.2020]. Press, G. (2013). Data Science: What’s The Half-Life Of A Buzzword? Available via: htt
Kozyrkov, C., 2018. What great data analysts do — and why every organization needs ps://[Link]/sites/gilpress/2013/08/19/data-science-whats-the-half-lif
them. Harv. Bus. Rev. Available via [Link] e-of-a-buzzword/[Accessed: 22.07.2021].
lysts-do-and-why-every-organization-needs-them [Accessed: 09.05.2020]. Przybyla, M. (2020). The difference between data science and statistics: which role are
Learner, D.B., Phillips, F.Y., 1993. Method and progress in management science. you, should you change careers? Available via: [Link]
Socioecon. Plann. Sci. 27 (1), 9–24. com/the-difference-between-data-science-and-statistics-168c7062c201 [accessed:
Leetaru, K., 2019. How data scientists turned against statistics. Forbes. Available via: 26.02.2021].
[Link] Raban, D.R., Gordon, A., 2020. The evolution of data science and big data research: a
/how-data-scientists-turned-against-statistics/#15777d91257c [Accessed: bibliometric analysis. Scientometrics 122 (3), 1563–1581.
09.05.2020]. Rawat, S. (2019). Is accuracy EVERYTHING? Available via: [Link]
Li, X., Xie, Q., Jiang, J., Zhou, Y., Huang, L., 2019. Identifying and monitoring the com/is-accuracy-everything-96da9afd540d [Accessed: 22.05.2020].
development trends of emerging technologies using patent analysis and Twitter data Rane, S. (2018). The balance: accuracy vs. Interpretability. Available via: [Link]
mining: the case of perovskite solar cell technology. Technol. Forecast. Soc. Change [Link]/the-balance-accuracy-vs-interpretability-1b3861408062
146, 687–705. [Accessed: 22.05.2020].
Lo, F.-Y., Campos, N., 2018. Blending internet-of-things (IoT) solutions into relationship Ribeiro, V., Rocha, A., Peixoto, R., Portela, F., Santos, M.F., 2017. Importance of statistics
marketing strategies. Technol. Forecast. Soc. Change 137, 10–18. for data mining and data science. In: 5th International Conference on Future Internet

10
H. Hassani et al. Technological Forecasting & Social Change 173 (2021) 121111

of Things and Cloud Workshops (FiCloudW), pp. 156–163. [Link] Teichmann, J., 2019. The increasing demand for data scientists. An interview. Towards
FiCloudW.2017.86, 2017. Data Science. Available via. [Link]
Ridgway, J., 2015. Implications of the data revolution for statistics education. Int. Statis. nd-for-data-scientists-an-interview-6d74d98afba0 [Accessed: 09.05.2020].
Rev. 84 (3), 528–549. Trivedi, A., 2018. Why data science jobs are in high demand? Medium. Available via. htt
Rowley, J., 2007. The wisdom hierarchy: representations of the DIKW hierarchy. J. Inf. ps://[Link]/cutshort/why-data-science-jobs-are-in-high-demand-c1b
Sci. 33 (2), 163–180. 5614d3083 [Accessed: 09.05.2020].
Rodriguez, R.N., 2013. The 2012 ASA presidential address: building the big tent for Tukey, J., 1962. The future of data analysis. Ann. Math. Statis. 33, 1–67.
statistics. J. Am. Statist. Assoc. 108 (501), 1–6. Vandeput, N., 2020. Data Science for Supply Chain Forecasting, De Gruyter, 2nd Ed.
Rodriguez, R.N., 2015. Who will celebrate our 200th anniversary? Growing the next van der Aalst, W., 2016. Data Science in Action. Process Mining. Springer, Berlin,
generation of ASA members. Am. Statis. 69, 91–95. Heidelberg.
Rodriguez, J. (2017). The Black Swan Problem in Artificial Intelligence: Part I. Available Walker, H.M., 1929. Studies in the History of Statistical Method: With Special Reference
via: [Link] To Certain Educational Problems. Williams & Wilkins Co.
elligence-part-i-74306aee0156 [Accessed: 23.05.2020]. Weihs, C., Ickstadt, K., 2018. Data science: the impact of statistic. Int J Data Sci Anal 6,
Rodriguez. J. (2018). Interpretability vs. Accuracy: The Friction that Defines Deep 189–194.
Learning. Available via: [Link] Wickham, H., 2014. Data Science: How is it Different to Statistics? Institute of
y-the-friction-that-defines-deep-learning-dae16c84db5c [Accessed: 22.05.2020]. Mathematical Statistics. Available via: [Link]
Salaken, S.M., Khosravi, A., Nguyen, T., Nahavandi, S, 2017. Extreme learning machine -how-is-it-different-to-statistics%E2%80%89/ [Accessed: 09.05.2020].
based transfer learning algorithms: a survey. Neurocomputing 267, 516–524. Wild, C.J., Utts, J.M., Horton, N.J., 2018. What IS STAtistics? In: Ben-Zvi, D., Makar, K.,
Saltz, J.S., Stanton, J.M., 2017. An Introduction to Data Science. SAGE Publications. Garfield, J. (Eds.), International Handbook of Research in Statistics Education.
Sardareh, S.A., Brown, G.T.L., Denny, P., 2021. Comparing four contemporary statistical Springer International Handbooks of Education. Springer, Cham.
software tools for introductory data science and statistics in the social sciences. Wu, J. (1997). Statistics = Data Science? Inaugural lecture for the Carver Chair.
Teach. Statis. 43, S157–S172. Available via: [Link]
Shaikh, R. (2018). Feature selection techniques in machine learning with python. pdf [Accessed: 09.05.2020].
Available via: [Link]
machine-learning-with-python-f24e7da3f36e [Accessed: 28.07.2021].
Dr. Hossein Hassani is a member of the faculty affiliated with the Research Institute for
Silver, N., 2012. The Signal and the Noise: Why So Many Predictions Fail – but Some
Energy Management and Planning at University of Tehran, Iran.
Don’t. The Penguin Press, New York.
Singpurwalla, D., 2013. A Handbook of Statistics: An Overview of Statistical Methods.
Bookboon. Dr. Christina Beneki is an Associate Professor at the Department of Tourism, Ionian Uni­
Stigler, S.M., 1986. The History of Statistics: The Measurement of Uncertainty Before versity in Greece.
1900. Harvard University Press.
Srinivasan, P. (2019). Interpretable Machine Learning: An attempt to demystify the
Dr. Emmanuel Sirimal Silva is Head of Research Coordination: Fashion Business School at
black-box. Available via: [Link]
London College of Fashion, University of the Arts London.
bility-paradox-382803f6a99d [Accessed: 22.05.2020].
Taleb, N.N., 2007. The Black Swan: The Impact of the Highly Improbable. Random House
Trade Paperbacks, New York. Nicolas Vandeput is a supply chain data scientist specialized in demand forecasting and
Tayo, B. O. (2019). Theoretical Foundations of Data Science— Should I Care or Simply inventory optimization and a PhD candidate at Université Paris-Saclay, CentraleSupélec,
Focus on Hands-on Skills? Available via: [Link] France.
ical-foundations-of-data-science-should-i-care-or-simply-focus-on-hands-on-skills-
c53fb0caba66 [Accessed: 26.02.2021]. Dr. Dag Øivind Madsen is a Professor at the USN School of Business, University of South-
Eastern Norway.

11

You might also like