内容へ移動

higuchi.com blog

文書の表示

管理最近の変更サイトマップ

差分

このページの2つのバージョン間の差分を表示します。

この比較画面へのリンク

--- dokuwiki:localize [2007/07/29 12:14] – osamu
+++ dokuwiki:localize [2007/07/31 15:36] (現在) – osamu
@@ 行 72: / 行 72: @@
 ^D</code>
 のように、入力した文字が分かち書きされて表示されればOK。
@@ 行 163: / 行 165: @@
 また、indexer.phpにwordlen()という関数があるが、これも同じくアジア圏の文字は１文字１単語とみなす処理が入っているので
 <code php>
-    if(preg_match('/'.IDX_ASIAN2.'/u',$w))
+function wordlen($w){
-        $l += ord($w) - 0xE1;  // Lead bytes from 0xE2-0xEF
+    // $l = strlen($w);
+    $l = utf8_strlen($w);
+    //// If left alone, all chinese "words" will get put into w3.idx
+    //// So the "length" of a "word" is faked
+    //if(preg_match('/'.IDX_ASIAN2.'/u',$w))
+    //    $l += ord($w) - 0xE1;  // Lead bytes from 0xE2-0xEF
+    return $l;
+}
 </code>
-の２行もコメントアウトする。
+と変更。
+それから、同じくindexer.php の idx_getIndexWordsSorted() 関数の中に、
+<code php>
+        if ($wlen < 3 && $wild == 0 && !is_numeric($xword)) continue;
+</code>
+という部分があるが、このままだと3文字より短い単語を検索できない。英語などではそれでもよいのだが、日本語の場合1～2文字の単語も検索できないと困るので、これを
+<code php>
+        if (preg_match('/[^0-9A-Za-z]/u', $string) && $wlen < 3 && $wild == 0 && !is_numeric($xword)) continue;
+</code>
+と書き換える。
 //2005-12-8 - Mecabのプロセスがハングアップして残ってしまうのを避けるために''stream_set_blocking()''を追加//\\

文書の表示以前のリビジョン

メディアマネージャー文書の先頭へ

CC Attribution-Noncommercial-Share Alike 4.0 International