Page 1 of 1

can't indexing a lot of sites

Posted: Wed May 15, 2019 12:18 pm
by conf

Re: can't indexing a lot of sites

Posted: Wed May 15, 2019 6:22 pm
by captquirk
Looking at it...
Which version of Sphider are you using?

UPDATE: If you are using Sphider 3.1.0, try this:
In commonfuncs.php, replace lines 404-423

Code: Select all

    $table = array(
        'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Ć'=>'C',
        'ć'=>'c', 'Č'=>'C', 'č'=>'c', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A',
        'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Œ'=>'OE', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
        'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N',
        'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O','Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
        'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a',
        'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
        'è'=>'e', 'é'=>'e','ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
        'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
        'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ü'=>'u', 'ÿ'=>'y',
        'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'R'=>'R', 'r'=>'r'
    );

    return strtr($string, $table);
with this

Code: Select all

    return (strtr(
        $string,
        "ÀÁÂÃÄÅÆàáâãäåæÒÓÔÕÕÖØòóôõöøÈÉÊËèéêëðÇçÐÌÍÎÏìíîïÙÚÛÜùúûüÑñÞßÿý",
        "aaaaaaaaaaaaaaoooooooooooooeeeeeeeeecceiiiiiiiiuuuuuuuunntsyy"
    ));
The removeAccents() function (where the above code is located) has the purpose to determine if a word should be indexed. It is also used in searching. It only affects the IF the word is indexed or searched and does not actually alter the word. For some reason, the longer (existing) code is interfering with the determination where Arabic words are concerned. The shorter replacement was in Sphider prior to 3.1.0, but proved to be problematic with some European languages!

In your instance, the older version works better. Now I have to find the right balance for the next release.

Re: can't indexing a lot of sites

Posted: Thu May 16, 2019 12:33 am
by captquirk
Sphider 3.1.1-MB eliminates the need for the troublesome removeAccents() function. Rather than trying to force the word to conform to an all too English "a to z" alphabet check, the word is now compared to a unicode alphabetical check.

The function had outlived its original purpose, which was for sites comprised of Western European languages.

Re: can't indexing a lot of sites

Posted: Sun May 26, 2019 2:34 am
by conf
thank you for help. It was from my local LAMP stack witch is AMPPS. I used XAMP and it works fine.