A small script that aims to visualize and understand K-Pop songs.
I have always noticed that K-Pop songs tend to end lines in English, or so I thought. One day, when thinking about this phenomenon, I thought it could be understood through some rudimentary processing. Thus, Kren
was born.
(Below is the output of lyrics_violin
)
Kren
is not developed as a formal package. Try to evaluate Kren.R
.
Packages required:
Search for your favorite songs, specifically its Genius page in original (unromanized & untranslated) lyrics.
Copy only the lyrics into a file, see ./examples/dummy_lyrics.txt
for the correct format. Make sure there is an empty line in between each section, and to keep section headers.
The data produced by lyrics_tree_data
is the main interface for this program.
# Object must be in this NAME._.TITLE format for plots to show proper title
Artist._.Song <- lyrics_tree_data("dummy_lyrics.txt")
lyrics_violin(Artist._.Song)
lyrics_smry(Artist._.Song)
See ./examples/usage.R
for a full list of interface examples.
Optionally, run:
podman build -t kren .
podman run -v ./examples:/home/src/examples kren
I prefer podman
, but docker
should work too.
See METHODOLOGY.pdf for examples.
- Words
- In both languages, words are defined as space-delimited chunks of Unicode.
- Ex: “대통령은 of the great (truth)” has 5 words
- Syllables
- Korean
- Each block is counted as one syllable
- Further work can be done on condensing slurred-vowels such as “다음”
- English
- Syllable counting is done through the syllable library, which uses poetrysoup.com
- Further work can be done using Knuth-Liang Hyphenation
- Such expansion would allow for analysis of many more languages (mostly latin script-based)
- Korean
- Special Characters, such as æ
- Processed as 0 syllables, as they are not in the Korean or English Unicode sets
- Ad-Libs
- Ad-Libs are considered any lyric encapsulated by parentheses
- Can be ignored through
lyrics_tree_data(fileName, rm.adlibs = TRUE)
- Example
- Original:
normal (this is an adlib) lyrics
- With
rm.adlibs = TRUE
:normal lyrics
- Original:
- Punctuation is ignored
lyrics_violin
- Create a violin-esque plot
- Plots syllable count and language across lines and sections
- Meant to be read top-down, left to right
lyrics_chist
- Creates a centered histogram plot
- Plots language word proportion across total line count
lyrics_series
- Creates a time series-esque plot
- Plots language word proportion across total line count
- Meant to be more artistic, think as a wallpaper
lyrics_smry
- Returns a df with sections and respective
smry_f(stat)
- Default is
mean(kr_word_prop)
- Default is
smry_f
can be and one-dimensional statisticstat
is any oflyrics_STATS$line
- Returns a df with sections and respective
lyrics_begend
- Analyses occurrence of language at beginning and end of lines
- Computes McNemar’s test, with a null that the two marginal probabilities for
kr_beg
andkr_end
are the same- Rejection means that there is a difference in language frequency between the beginning and end of lines
lyrics_comp
for a stat across line_tot- Compare two sets of lyrics on a
stat
- Default is
kr_word_prop
stat
is any oflyrics_STATS$line
- Default is
- Computes a two-sample Kolmogorov-Smirnov test
- Effectively tells if the maximum distance between the two empirical cumulative distribution (ecdf) is large
- Rejection means that the two sets do not come (“sample”) from a similar generator
- Compare two sets of lyrics on a