NAME Lingua::JA::NormalizeText - text normalizer SYNOPSIS use Lingua::JA::NormalizeText; use utf8; my @options = ( qw/nfkc decode_entities/, \&dearinsu_to_desu ); my $normalizer = Lingua::JA::NormalizeText->new(@options); print $normalizer->normalize('���������������������������♥'); # -> ��������������������������� sub dearinsu_to_desu { my $text = shift; $text =~ s/���������������/������/g; return $text; } # or use Lingua::JA::NormalizeText qw/nfkc decode_entities/; use utf8; my $text = '������������♥'; print decode_entities( nfkc($text) ); # -> (���)������������ DESCRIPTION Lingua::JA::NormalizeText normalizes text. METHODS new(@options) Creates a new Lingua::JA::NormalizeText instance. The following options are available. OPTION SAMPLE INPUT OUTPUT FOR SAMPLE INPUT --------------------- ------------------ ----------------------- lc DdD ddd uc DdD DDD nfkc ��� ������ (length: 2) nfkd ��� ��������� (length: 3) nfc nfd decode_entities ♥ ��� strip_html <em>���</em> ��� alnum_z2h ������������������ ABC123 alnum_h2z ABC123 ������������������ space_z2h space_h2z katakana_z2h ������������ ������������ katakana_h2z ������������������������ ������������������������ katakana2hiragana ��������� ��������� hiragana2katakana ��������� ��������� unify_3dots ��������������� ��������� wave2tilde ��� ��� tilde2wave ��� ��� wavetilde2long ���, ��� ��� wave2long ��� ��� tilde2long ��� ��� fullminus2long ��� ��� dashes2long ��� ��� drawing_lines2long ��� ��� unify_long_repeats ��������������� ��������� nl2space (new line) (space) unify_long_spaces (space)(space) (space) remove_head_space (space)���(space)��� ���(space)��� remove_tail_space ������(space)(space) ������ old2new_kana ������������ ������������ old2new_kanji ��������� ��������� The order these options are applied is according to the order of the elements of @options. (i.e., The first element is applied first, and the last element is applied last.) External functions are also addable. (See dearinsu_to_desu function of SYNOPSIS section.) normalize($text) normalizes $text. AUTHOR pawa <pawapawa@cpan.org> SEE ALSO <http://www.asahi-net.or.jp/~ax2s-kmtn/ref/old_chara.html> LICENSE This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.