can't indexing a lot of sites

Come here for help or to post comments on Sphider
Post Reply
conf
Posts: 3
Joined: Tue May 14, 2019 9:44 pm

can't indexing a lot of sites

Post by conf »

User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: can't indexing a lot of sites

Post by captquirk »

Looking at it...
Which version of Sphider are you using?

UPDATE: If you are using Sphider 3.1.0, try this:
In commonfuncs.php, replace lines 404-423

Code: Select all

    $table = array(
        'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Ć'=>'C',
        'ć'=>'c', 'Č'=>'C', 'č'=>'c', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A',
        'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Œ'=>'OE', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
        'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N',
        'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O','Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
        'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a',
        'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
        'è'=>'e', 'é'=>'e','ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',
        'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
        'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ü'=>'u', 'ÿ'=>'y',
        'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'R'=>'R', 'r'=>'r'
    );

    return strtr($string, $table);
with this

Code: Select all

    return (strtr(
        $string,
        "ÀÁÂÃÄÅÆàáâãäåæÒÓÔÕÕÖØòóôõöøÈÉÊËèéêëðÇçÐÌÍÎÏìíîïÙÚÛÜùúûüÑñÞßÿý",
        "aaaaaaaaaaaaaaoooooooooooooeeeeeeeeecceiiiiiiiiuuuuuuuunntsyy"
    ));
The removeAccents() function (where the above code is located) has the purpose to determine if a word should be indexed. It is also used in searching. It only affects the IF the word is indexed or searched and does not actually alter the word. For some reason, the longer (existing) code is interfering with the determination where Arabic words are concerned. The shorter replacement was in Sphider prior to 3.1.0, but proved to be problematic with some European languages!

In your instance, the older version works better. Now I have to find the right balance for the next release.
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: can't indexing a lot of sites

Post by captquirk »

Sphider 3.1.1-MB eliminates the need for the troublesome removeAccents() function. Rather than trying to force the word to conform to an all too English "a to z" alphabet check, the word is now compared to a unicode alphabetical check.

The function had outlived its original purpose, which was for sites comprised of Western European languages.
conf
Posts: 3
Joined: Tue May 14, 2019 9:44 pm

Re: can't indexing a lot of sites

Post by conf »

thank you for help. It was from my local LAMP stack witch is AMPPS. I used XAMP and it works fine.
Post Reply