Skip to content

A small script that aims to visualize and understand Korean-English Language songs

License

Notifications You must be signed in to change notification settings

jonathanmfung/Kren

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kren - Analyze Korean-English Language Songs

A small script that aims to visualize and understand K-Pop songs.

I have always noticed that K-Pop songs tend to end lines in English, or so I thought. One day, when thinking about this phenomenon, I thought it could be understood through some rudimentary processing. Thus, Kren was born.

(Below is the output of lyrics_violin) ./examples/violin.jpg

Installation

Kren is not developed as a formal package. Try to evaluate Kren.R.

Packages required:

Getting Started

Search for your favorite songs, specifically its Genius page in original (unromanized & untranslated) lyrics.

Copy only the lyrics into a file, see ./examples/dummy_lyrics.txt for the correct format. Make sure there is an empty line in between each section, and to keep section headers.

The data produced by lyrics_tree_data is the main interface for this program.

# Object must be in this NAME._.TITLE format for plots to show proper title
Artist._.Song <- lyrics_tree_data("dummy_lyrics.txt")
lyrics_violin(Artist._.Song)
lyrics_smry(Artist._.Song)

See ./examples/usage.R for a full list of interface examples.

Using Containers (Docker/Podman)

Optionally, run:

podman build -t kren .
podman run -v ./examples:/home/src/examples kren

I prefer podman, but docker should work too.

Methodology Details

See METHODOLOGY.pdf for examples.

  • Words
    • In both languages, words are defined as space-delimited chunks of Unicode.
    • Ex: “대통령은 of the great (truth)” has 5 words
  • Syllables
    • Korean
      • Each block is counted as one syllable
      • Further work can be done on condensing slurred-vowels such as “다음”
    • English
      • Syllable counting is done through the syllable library, which uses poetrysoup.com
      • Further work can be done using Knuth-Liang Hyphenation
        • Such expansion would allow for analysis of many more languages (mostly latin script-based)
  • Special Characters, such as æ
    • Processed as 0 syllables, as they are not in the Korean or English Unicode sets
  • Ad-Libs
    • Ad-Libs are considered any lyric encapsulated by parentheses
    • Can be ignored through lyrics_tree_data(fileName, rm.adlibs = TRUE)
    • Example
      • Original: normal (this is an adlib) lyrics
      • With rm.adlibs = TRUE: normal lyrics
  • Punctuation is ignored

Available Analyses

  • lyrics_violin
    • Create a violin-esque plot
    • Plots syllable count and language across lines and sections
    • Meant to be read top-down, left to right
  • lyrics_chist
    • Creates a centered histogram plot
    • Plots language word proportion across total line count
  • lyrics_series
    • Creates a time series-esque plot
    • Plots language word proportion across total line count
    • Meant to be more artistic, think as a wallpaper
  • lyrics_smry
    • Returns a df with sections and respective smry_f(stat)
      • Default is mean(kr_word_prop)
    • smry_f can be and one-dimensional statistic
    • stat is any of lyrics_STATS$line
  • lyrics_begend
    • Analyses occurrence of language at beginning and end of lines
    • Computes McNemar’s test, with a null that the two marginal probabilities for kr_beg and kr_end are the same
      • Rejection means that there is a difference in language frequency between the beginning and end of lines
  • lyrics_comp for a stat across line_tot
    • Compare two sets of lyrics on a stat
      • Default is kr_word_prop
      • stat is any of lyrics_STATS$line
    • Computes a two-sample Kolmogorov-Smirnov test
      • Effectively tells if the maximum distance between the two empirical cumulative distribution (ecdf) is large
      • Rejection means that the two sets do not come (“sample”) from a similar generator

Gallery

lyrics_chist

./examples/chist.jpg

lyrics_series

./examples/series.jpg

About

A small script that aims to visualize and understand Korean-English Language songs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published