Skip to content

toyama0919/embulk-filter-icu4j

Repository files navigation

Icu4j filter plugin for Embulk

Unicode normalize string value.

Icu4j filter plugin for Embulk. see. http://site.icu-project.org/

Overview

  • Plugin type: filter

Configuration

Example normalize NFKC

filters:
  - type: icu4j
    key_names:
      - title
    settings:
      - { transliterators: 'Any-NFKC', case: upper }

Example

filters:
  - type: icu4j
    keep_input: false
    key_names:
      - catchcopy
    settings:
      - { suffix: _katakana, transliterators: 'Katakana-Hiragana,Fullwidth-Halfwidth', case: upper }
      - { transliterators: 'Katakana-Hiragana', case: lower }
      - { suffix: _romaji_lower, transliterators: 'Katakana-Hiragana,Hiragana-Latin', case: lower }

input

{
    "catchcopy" : "ホゲホゲ"
}

As below

{
    "catchcopy" : "ほげほげ",
    "catchcopy_katakana" : "ホゲホゲ",
    "catchcopy_romaji_lower" : "hogehoge"
}

transliterator rules

see. http://hondou.homedns.org/pukiwiki/pukiwiki.php?Java%20ICU4J

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

About

Normalize String for Embulk

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published