0% found this document useful (0 votes)
72 views12 pages

Free Software For Data Analysis

Uploaded by

Hector S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views12 pages

Free Software For Data Analysis

Uploaded by

Hector S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Free and (Mostly) Open Source Data Analysis Software for Academic

Research

Dr. Chinchu C.
Abstract

Data analysis is a crucial task in knowledge creation in social sciences. Free resources

for data analysis provide researchers with greater freedom and make the research process

more accessible and democratic. This article lists some free software which can perform basic

and advanced statistical data analysis tasks. Some software which can perform other tasks

such as text mining are also introduced. Ease of use and functionality are the major criteria

for selecting these software packages.

Keywords: Free Software, Data Analysis, Social Science, Research


Free and (Mostly) Open Source Data Analysis Software for Academic
Research

Dr. Chinchu C.

Introduction

Research is a knowledge creation process and hence crucial to human progress.

However there have been attempts to transform and treat knowledge as a commodity and use

it for economic gain. This is generally viewed as an obstruction to the growth of research and

to the propagation of knowledge. Research in social sciences generally requires the analysis

of data in various forms. Use of software tools is an imperative in this situation. The range of

software needed range from spreadsheets and statistical software to big data analysis and

qualitative data analysis tools. Nonetheless the proprietary software tools available for use in

social science research are priced so high that it is almost impossible for individuals or even

small institutions to buy them. This may also lead to instances of piracy which is again a

punishable offence and an unethical act, detrimental to the quality of research as well. It is in

this context that the use of free and open-source (FOSS) software tools becomes a necessity

and a political act for keeping knowledge free of restrictions. In a broader sense, use and

promotion of free software is an act of responsibility towards the future generations. This

paper presents an introduction to a number of free and/or open source software packages

which can be used for various purposes, mainly by social science researchers.

The Concept of Knowledge Freedom

The idea that knowledge, in its various forms should be freely available to all, is not

new. Foucault (1980) has maintained that knowledge is inextricably linked to power. Thus

knowledge can be viewed as a form of power. In the academia too, knowledge acts as a form

of soft power which can determine the prospects of one's career growth. From such a
position, it becomes imperative that anyone who works for equality and egalitarianism should

also work for making knowledge accessible to all.

The free software movement is based on the philosophy that considers software as a

form of knowledge. The movement for software freedom was initiated because of the

rampant commercialization in the field of software. Proprietary software, also known as

licensed software, imposes a number of restrictions on the user, even after she/he has bought

it. This typically includes restrictions on copying, sharing, modifying etc. The restraint on

modifying the source code of software packages is also a hindrance to the learning process

inherent in the process. Thus it can be argued that even after one buys a software, the

ownership is not entirely transferred to the buyer.

Free and Open Source Software (FOSS) are different from proprietary software in that

they give freedom to the user to run, change or modify, share, and copy a software. The term

‘free’ denotes freedom, not free-of-cost.

Software Licenses

Software packages are distributed under a variety of different licenses, each with a

different view on the rights of the end-user of the product. The software are usually referred

to with the type of license that they belong to. The license types that are relevant to research

tools fall under any of the categories listed below.

Proprietary: The absolute ownership of the software rests with the provider, not with the

customer. Usually the customer is given various rights to use the software, under an end-user

license agreement (EULA). There will be restrictions on the copying, sharing, modifying or

any such operations on the product. Some software need monthly or annual licensing with

payments to the provider.


Open Source: Open source software give the underlying programming code of the software

to the user. The user may modify and even re distribute the modified version, under the terms

of the license agreement. A number of license formats are used to distinguish between

different agreements between the provider and the user.

Free Software: Free software may either be provided free of cost, or at a price. The term

‘free’ refers to liberty of usage, not price. Even though all free software are open-source, all

open-source software may not be classified as free software, according to the philosophical

issues existing between the two movements.

Apart from these, there exists other forms of licensing such as freewares, which are

packages provided free of cost, but without the source code, and freemium packages, where a

basic version of the software is provided freely and more advanced versions have to be

purchased.

Statistical analysis software

A list of free or open-source software packages, mainly used for quantitative

statistical analyses, is given here. Even though most of them provide support for advanced

procedures like text mining, the primary usage is taken as criteria to include them under the

category of statistical software. The list is not exhaustive, and is curated mostly based on

individual preference and experience of the author.

R: Even though touted as a free alternative to proprietary statistical analysis packages

including SPSS, R is not a software package in itself. Rather, R is a programming language

and an environment for statistical analyses (Ihaka & Gentleman, 1996). It is a dynamic

environment, offering many facets for development (Chambers, 2009). R Being a

programming language, R offers a steep learning curve for those without a basic training in

programming. A number of Graphical User Interfaces (GUI) that use R in the background are
available offering reliable and fast data analysis capabilities. This include R Commander,

RKWard, Deducer, Rattle, Jamovi etc.

R Commander: R Commander, developed by John Fox, is a GUI for the R programming

language. It offers one of the most powerful and user-friendly interfaces to use R for

statistical data analysis (Downie, 2016). R Commander, along with its plug-ins like the EZR

and FactoMineR provide a number of functionalities to perform a range of advanced

statistical analyses. R Commander has to be installed as a package from within R, and is not

available as an executable installation file.

Deducer: Deducer is another GUI built as a potential replacement for proprietary data

analysis software (Fellows, 2012). Unlike R Commander, Deducer, along with the JGR (Java

GUI for R) console of R, can be downloaded and installed as a separate package. This makes

installation easier. Once installed, Deducer offers an analysis experience that is much similar

to SPSS, with its intuitive and familiar menu structure. Descriptive and inferential statistics

are available as menu items similar to the popular proprietary packages.

PSPP: PSPP is a GNU project for statistical data analysis (GNU Project, 2015). It is

sponsored by the Free Software Foundation. For users familiar with SPSS, PSPP offers an

interface which is almost a clone of the SPSS menu system. It has a data/variable view

window and menus labeled Data, Transform, Analyze, etc. all of which are reminiscent of

SPSS. Unlike SPSS, PSPP can be downloaded and used free of cost, and offers a good

portion of capabilities provided by the proprietary package. PSPP supports over one billion

cases and variables, and is inter-operable with multiple software like LibreOffice and

OpenOffice.Org. The data files and syntax are compatible with SPSS as well. Elaborate

documentation is available and there is an active online global community that supports the

users.
Openstat: Openstat is a free package for statistical analysis developed by William Miller

(Miller, 2013). Initially designed as an instructional package, it provides a large number of

statistical operations to users, with detailed documentation. Openstat is designed for the

Windows platform only, and may need compatibility software to run in Linux or Mac

platforms.

RapidMiner: RapidMiner is a software platform, first developed by Ralf Klinkenberg, Ingo

Mierswa, and Simon Fischer in 2001. It was initially named YALE (Yet Another Learning

Environment) and later changed to RapidMiner in 2013. It is a freemium software, which

means a low-key edition is available as free software and the full version is proprietary.

RapidMiner is available for Windows, Linux and Mac platforms.

SOFA: SOFA (Statistics Open For All) is one of the most user-friendly and intuitive

statistical packages currently available. The software, developed by Paton-Simpson &

Associates Ltd, can be run on Windows, Linux and Mac platforms (SOFA, 2017). A good

collection of descriptive and inferential statistical tests, as well as visualizations are provided.

The software adopts an instinctive approach where the user is guided through the various

steps to be followed in the analysis, starting from choosing the procedures based on the type

of data and results expected.

JASP: JASP is one of the most sought-after free and open-source alternatives to the

proprietary packages. JASP was created by pooling resources from various Universities and

research funds. In addition to the frequentist procedures offered by traditional software, JASP

provides Bayesian inferential tests as well. Available in Windows, Linux, and Mac platforms,

the tables and plots produced by JASP are in APA style, which helps in preparing

manuscripts. The addition of Bayesian statistics gives an edge to JASP in terms of scientific

rigour. (The JASP Team, 2015).


Jamovi: As of 2021, Jamovi is probably the frontrunner in terms of ease of use and

functionality among the free and open source packages which aim to provide a simple and

powerful alternative to proprietary data analysis software such as SPSS and Stata (The

jamovi project, 2020). Jamovi is an R based GUI which is easy to install and use, with most

functions needed for an academic research project in social sciences bundled with the default

installation. Besides, there is the option of adding more specialized modules using the jamovi

library. These modules add a plethora of additional capabilities to jamovi. With continuous

updates being released, what also make jamovi special is its ability to dynamically modify

analysis results when the underlying data is modified. Cross-platform compatibility is also

ensured, with the ability to import data in formats such as csv, sav (used by SPSS), dta (used

by Stata) etc. Detailed documentation and third party tutorials are also available for jamovi.

In India, Jamovi has become part of many workshops and training programmes in research

methodology, where SPSS used to be the default choice.

MicrOsiris: Microsiris was developed at the University of Michigan. It is based on an earlier

package, OSIRIS IV, developed at the same University. It is a data management and and

statistical analysis tool. Speed and lightness are the distinct features of Microsiris. Microsiris

offers a good user experience, with a clean interface and features like provision to deal with

missing data. It can also import data from various platforms, including SPSS and SAS.

PAST: PAST (PAleontological STatistics) (Hammer et al., 2001) is another free software

package which has recently been recognized as a viable alternative to proprietary software

across social sciences. Originally developed in 2001 for use in Paleontology research, it has

later been adapted to be used in various other fields. PAST offers a rich menu of statistical

procedures to researchers, with some features like simple phylogenetic analysis exclusive for

the paleontology and ecology fields. Features like Google Maps integration are unique to this

package. Only Windows platform is supported at present.


Gretl: Gretl is an open-source and multi-platform software package designed mainly for

econometrics. It is considered as a free software alternative to the popular econometrics

software, EViews, which is a proprietary product. Launched in 2000, gretl is an acronym for

Gnu Regression, Econometrics and Time-series Library. A large range of menus are available

for procedures like time series analysis. Apart from English, gretl supports more than 15

languages, including French, Spanish and Chinese.

Bluesky Statistics: Bluesky became an open source project in 2018, and is being developed

by a community ever since. This is yet another R based GUI which aims to provide a point-

and-click data analysis experience to social science researchers and students. It provides a

familiar user interface for users who have used SPSS or similar software, and some elements,

such as the ribbon below the menu bar resemble popular office packages. The desktop

version of Bluesky is available for free download and use. Like modules in jamovi,

extensions are available for Bluesky. There is a 'variable and data view' functionality similar

to SPSS, and a number of basic and advanced data analysis options too. Books featuring

Bluesky for advanced statistical applications have been published (Lamprianou, 2019).

SalStat: SalStat is another free data analysis software written in the Python language. Ease of

use is a guiding principle for its development, and the developers have designed it

particularly for researchers in social sciences, agriculture and biological sciences, business

analytics, market research etc. Some feature like scraping data from web pages is also

available in SalStat, making it more than just a data analysis tool. A number of descriptive

analyses are available, and decent amount of inferential statistics are provided, including

linear regression. SalStat does not support importing the sav file format used in SPSS. No

new updates have been released since 2018.


Ministat Application: Ministat is an Android application which tries to provide a complete

data analysis experience in a smartphone, and it does this job reasonable well. Ministat

provides data visualizations, t-tests, ANOVA, correlation tests, regression analyses, and much

more. It also provides some useful e-books which can be accessed only if one agrees to watch

an advertisement. A premium version of the application without the advertisements is also

available for those willing to pay. Though not much popular, with only 10k+ downloads as of

December 2021, Ministat provides a good experience of data analysis with a smartphone and

is worth a try.

Software for other research purposes

The software packages in this list are either data mining packages or CAQDAS

(Computer Aided Qualitative Data Analysis Software). Most of them are able to perform

descriptive and inferential statistical operations as well.

Tanagra: Tanagra is a free data mining platform, intended for use in research and academics.

It was developed as an experimental platform for students and researchers (Rakotomalala,

2005). It provides support for machine learning along with exploratory data analysis. The

source code is provided for those interested in working on similar projects. Even though

written as instructional software, Tanagra is widely used in actual studies as well. It has also

been recommended for small-scale industrial usage (Enright & Klippenstein, 2004).

Weka: Named after a flightless bird found only in the New Zealand, Weka (Waikato

Environment for Knowledge Analysis) is a product of the University of Waikato, New

Zealand. It is a multi-platform machine learning software written in Java. Weka is used for

educational and research purposes, along with industry applications (Frank et al., 2009). The

simplicity and user-friendliness of user interfaces is an important feature of Weka which

makes it popular among researchers.


Rattle: Rattle (the R Analytical Tool To Learn Easily) is a GUI package for R, primarily

intended for the purpose of data mining. Rattle is used both as an instructional tool for data

mining in universities and also by consultants and data scientists worldwide (Williams,

2009). Interactions with the interface are also recorded as scripts in the underlying R console,

so that these scripts can be used independently of the GUI. Text mining capability has also

been added to rattle in its later versions.

Orange: Orange is an open-source program for data mining, data visualization, and machine

learning. Python scripting is used in the back-end. It provides an interactive visualization and

analysis platform (Demšar et al., 2013). Operations are done through interactive workflows

combined of widgets, which can be either custom-defined or predefined. Widgets for

Bioinformatics, Time Series Analysis, Natural Language Processing and text mining etc. also

available as add-ons in Orange, along with customary data mining capabilities.

KH Coder: KH Coder is a light-weight program which is developed specifically for

quantitative content analysis and text mining. It was developed by Koichi Higuchi for

personal research purpose and later offered as an open-source software under the GNU GPL

license. Various text-mining procedures can be performed with KH Coder. An elaborate

documentation is also available online.

Conclusion

Usage of software in the various stages of academic research is on the rise. Hence it is

imperative that researchers pay more attention to the various issues and options in the area of

research software, irrespective of their area of academic expertise. Early career researchers

and students should have a keen interest in the various happenings in this area to stay updated

in their academic pursuit. There is no claim that the list of software presented here is

exhaustive or complete. It is hoped that this introduction to the software various packages
available to the researchers in social science will inspire some of them to explore the hitherto

unexplored areas, since research is also identified as a process of defying the beaten path to

discover hidden knowledge.

References

Chambers, J. M. (2009). Facets of R. The R Journal, 1(1), 5–8.

Demšar, J., Curk, T., Erjavec, A., Gorup, Č., Hočevar, T., Milutinovič, M., Možina, M.,

Polajnar, M., Toplak, M., & Starič, A. (2013). Orange: Data mining toolbox in

Python. The Journal of Machine Learning Research, 14(1), 2349–2353.

Downie, T. (2016). Using the R Commander: A Point-and-Click Interface for R. Journal of

Statistical Software, 75(Book Review 3). https://doi.org/10.18637/jss.v075.b03

Enright, J., & Klippenstein, J. (2004). Tanagra: An Evaluation. University of Alberta.

https://webdocs.cs.ualberta.ca/~zaiane/courses/cmput695-04/work/A2-

reports/tanagra.pdf

Fellows, I. (2012). Deducer: A data analysis GUI for R. Journal of Statistical Software,

49(8), 1–15.

Foucault, M. (1980). Power/knowledge: Selected interviews and other writings, 1972-1977.

Pantheon.

Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I. H., & Trigg, L. (2009).

Weka-a machine learning workbench for data mining. In Data mining and knowledge

discovery handbook (pp. 1269–1277). Springer.

GNU Project. (2015). GNU PSPP for GNU/Linux (0.8.5) [Computer software]. Free Software

Foundation. https://www.gnu.org/software/pspp/

Hammer, Ø., Harper, D. A. T., & Ryan, P. D. (2001). PAST: PALEONTOLOGICAL

STATISTICS SOFTWARE PACKAGE FOR EDUCATION AND DATA

ANALYSIS. Palaeontologia Electronica, 4(1), 1–9.


Ihaka, R., & Gentleman, R. (1996). R: A Language for Data Analysis and Graphics. Journal

of Computational and Graphical Statistics, 5, 299–314.

Lamprianou, I. (2019). Applying the Rasch Model in Social Sciences Using R (1st edition).

Routledge.

Miller, W. (2013). Statistics and Measurement Concepts with OpenStat. Springer-Verlag.

//www.springer.com/in/book/9781461457428

Rakotomalala, R. (2005). TANAGRA: a free software for research and academic purposes.

Proceedings of EGC’2005, 2, 697–702.

SOFA. (2017). SOFA Statistics (1.4.6) [Computer software]. Paton-Simpson & Associates

Ltd. https://www.sofastatistics.com

The jamovi project. (2020). Jamovi (1.2) [Computer software]. https://www.jamovi.org

The JASP Team. (2015). Software to Sharpen Your Stats. APS Observer, 28(3).

https://www.psychologicalscience.org/observer/bayes-or-bust-with-new-softwares

Williams, G. J. (2009). Rattle: A Data Mining GUI for R. The R Journal, 1(2), 45–55.

About the author: Dr. Chinchu C. is a Consulting Psychologist, Podcaster, Writer, Educator, and

Research Consultant from Kerala, India. His short bio is available at

https://www.mhinnovation.net/profile/dr-chinchu-c

Sources of Funding: No funding agencies have provided financial support or funding for writing this

article.

You might also like