Anton Antonov
MathematicaForPrediction at GitHub
MathematicaVsR project at GitHub
September, 2016
This project has two goals:
-
to show how to experiment in Mathematica and R with algebraic computations determination of the most important sentences (or paragraphs) in natural language texts, and
-
to compare the Mathematica and R codes (built-in functions, libraries, programmed functions) for doing these experiments.
In order to make those experiments we have to find, choose, and download suitable text data. This project uses Freakonomics radio podcasts transcripts.
The project executable documents and source files give a walk through with code and explanations of the complete sequence of steps, from intent to experimental results.
The following concrete steps are taken.
-
Data selection of a source that provides high quality texts. (E.g. English grammar, spelling, etc.)
-
Download or scraping of the text data.
-
Text data parsing, cleaning, and other pre-processing.
-
Mapping of a selected document into linear vector space using the Bag-of-words model.
-
Finding sentence/statement salience using matrix algebra.
-
Experimenting with the salience algorithm over the data and making a suitable interactive interface.
The following scripts can be executed to go through all the steps listed above.
-
Mathemaitca script : "./Mathematica/StatementsSaliencyInPodcastsScript.m".
-
R script : "./R/StatementsSaliencyInPodcastsScript.R".
-
See the Markdown document "./Mathematica/StatementsSaliencyInPodcasts.md" for using Mathematica.
-
See the HTML document "./R/StatementsSaliencyInPodcasts.html" for using R.
After executing the scripts listed above the executing following scripts would produce interactive interfaces that allow to see the outcomes of different parameter selections.
-
Mathematica interactive interface : "./Mathematica/StatementsSaliencyInPodcastsInterface.m".
-
R / Shiny interactive interface : "./R/StatementsSaliencyInPodcastsInterface.R".
TBD
All code files and executable documents are with the license GPL 3.0. For details see http://www.gnu.org/licenses/ .
All documents are with the license Creative Commons Attribution 4.0 International (CC BY 4.0). For details see https://creativecommons.org/licenses/by/4.0/ .