Skip to content

Latest commit

 

History

History
42 lines (26 loc) · 1.93 KB

README.md

File metadata and controls

42 lines (26 loc) · 1.93 KB

KaomojiParser

KaomojiParser is a Swift Package to deal with texts that contain Japanese kaomoji. KaomojiParser supports removing kaomoji from texts and extracting kaomoji from texts.

Implementation notes are available at Qiita(Japanese).

Kaomoji structure

This project aims to search kaomoji in texts. There are various kaomoji, and whether a character is contained in kaomoji or not is not obvious.

ホームランキタ━━━━(゚∀゚)━━━━‼︎

This is one of popular kaomoji, but the range of kaomoji depends on the definition. Perhaps there are three candidates:

  1. (゚∀゚) is kaomoji, and ホームランキタ━━━━━━━━‼︎ is the main text.
  2. キタ━━━━(゚∀゚)━━━━ is kaomoji, and ホームラン!! is the main text.
  3. キタ━━━━(゚∀゚)━━━━!! is kaomoji, and ホームラン is the main text.

At first, this project uses the third way to parse texts. Maybe first and second ways can be added in the future.

Accuracy

Currently, KaomojiParser works well with kaomoji that uses un-common characters. However, for kaomoji that use common characters like (- -;), (..), (*_*), or (TT), KaomojiParser doesn't work well.

Usage

Use Swift Package Manager to use KaomojiParser.

Once you enabled it, KaomojiParser can be used like this.

import KaomojiParser

let parser = KaomojiParser()
let target = "嬉しいです(≧▽≦)"
print(parser.removeKaomoji(from: target))  // "嬉しいです"
print(parser.search(in: target))           // ["(≧▽≦)"]

Reference

This implementation relys on this paper. Thanks to the authors.

  • 風間一洋, 水木栄, & 榊剛史. (2016). Twitter における顔文字を用いた感情分析の検討. In 人工知能学会全国大会論文集 第 30 回全国大会 (2016) (pp. 3H3OS17a4-3H3OS17a4). 一般社団法人 人工知能学会.