NAME Lingua::JA::NormalizeText - text normalizer SYNOPSIS use Lingua::JA::NormalizeText; use utf8; my @options = ( qw/nfkc decode_entities/, \&dearinsu_to_desu ); my $normalizer = Lingua::JA::NormalizeText->new(@options); print $normalizer->normalize('曈乓�䎚爀���扼���𨳍�瓐��♥'); # -> 曈乓�䎚���喋�剹�怒�扼�仮䐥 sub dearinsu_to_desu { my $text = shift; $text =~ s/�扼���𨳍�瓐��/�扼��/g; return $text; } # or use Lingua::JA::NormalizeText qw/nfkc decode_entities/; use utf8; my $text = '曈乓�䎚爀���扼���𨳍�瓐��♥'; print dearinsu_to_desu( decode_entities( nfkc($text) ) ); # -> 曈乓�䎚���喋�剹�怒�扼�仮䐥 sub dearinsu_to_desu { my $text = shift; $text =~ s/�扼���𨳍�瓐��/�扼��/g; return $text; } DESCRIPTION Lingua::JA::NormalizeText normalizes text. METHODS new(@options) Creates a new Lingua::JA::NormalizeText instance. The following options are available. OPTION SAMPLE INPUT OUTPUT FOR SAMPLE INPUT --------------------- ------------------ ----------------------- lc DdD ddd uc DdD DDD nfkc �� �剹�� (length: 2) nfkd �� ���踺�� (length: 3) nfc nfd decode_entities &hearts �䐥 strip_html <em>��</em> �� alnum_z2h 嚗∴慰嚗��𡢅�𡜐�� ABC123 alnum_h2z ABC123 嚗∴慰嚗��𡢅�𡜐�� space_z2h space_h2z katakana_z2h �譌�~�譌�� 嚝𠺪膚嚝𠺪膚 katakana_h2z 嚚踝蔑嚝𠺪蔑嚚踝蔑嚝𠺪蔑 �嫘�潦�譌�潦�嫘�潦�譌�� katakana2hiragana �㻫�喋�� �晞�瓐�� hiragana2katakana �晞�瓐�� �㻫�喋�� unify_3dots �胯�������� �胯���� wave2tilde �� 嚚� tilde2wave 嚚� �� wavetilde2long ��, 嚚� �� wave2long �� �� tilde2long 嚚� �� fullminus2long ��� �� dashes2long �� �� drawing_lines2long �� �� unify_long_repeats �氬�~�潦�潦�� �氬�~�� nl2space \n (space) unify_long_spaces (space)(space) (space) remove_head_space (space)��(space)�� ��(space)�� remove_tail_space ����(space)(space) ���� modernize_kana_usage �僐�啜�㻫�� ���扎���� The order these options are applied is according to the order of the elements of @options. (i.e., The first element is applied first, and the last element is applied finally.) External functions are also addable. (See dearinsu_to_desu function of SYNOPSIS section) normalize($text) normalizes $text. AUTHOR pawa <pawapawa@cpan.org> SEE ALSO LICENSE This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.