Search found 305 matches
- Fri Sep 29, 2023 4:40 pm
- Forum: Sphider Help
- Topic: unicode indexing
- Replies: 18
- Views: 12407
Re: unicode indexing
Your code looks perfect! We have already established your tables are utf8mb4. Is there some way the fields in the table are NOT utf8mb4??? This does not seem likely, but I suppose anything is possible. Go into mysql, connect to the database you are using, and run a 'show create table' for links and ...
- Thu Sep 28, 2023 3:32 pm
- Forum: Sphider Help
- Topic: unicode indexing
- Replies: 18
- Views: 12407
Re: unicode indexing
I sent you a PM with instructions.
- Wed Sep 27, 2023 5:17 pm
- Forum: Sphider Help
- Topic: unicode indexing
- Replies: 18
- Views: 12407
Re: unicode indexing
I have to think you should be able to override whatever your host has as a default... I would like to see a portion of source code for a page containing Unicode characters... Even better would be to see the site itself. If you are not comfortable sharing that publicly, you may do so by private messa...
- Tue Sep 26, 2023 7:23 pm
- Forum: Sphider Help
- Topic: unicode indexing
- Replies: 18
- Views: 12407
Re: unicode indexing
Your database does not seem to be the issue. With web pages, it is possible for a Unicode character to appear correctly in a browser window, BUT the source code behind that page is not Unicode, but a replacement. For example: Browser displays: Д Source code is: Д Another possible issue is ...
- Wed Sep 13, 2023 1:33 am
- Forum: Sphider Help
- Topic: An issue with PDF indexing.
- Replies: 16
- Views: 12506
Re: An issue with PDF indexing.
This issue has been resolved. It turns out that Sphider uses the PHP exec() function to run the pdftotext converter Some Linux installations, for security reasons, block some functions such as exec(). A slight change to php.ini allowed exec() to function and pdf's were then indexed. As I said, the d...
- Sun Sep 10, 2023 5:00 am
- Forum: Sphider Help
- Topic: Robots.txt - would allow support be useful for many
- Replies: 6
- Views: 7033
Re: Robots.txt - would allow support be useful for many
Check this: viewtopic.php?p=547#p547 While not a FINAL solution, user-agent: Sphider Allow: / should now allow access, disregarding all the disallows in user-agent: *. Any desired disallows need to be added to user_agent: Sphider, even if they are duplicates of some disallows in user-agent: *. Feedb...
- Sat Sep 09, 2023 10:53 pm
- Forum: Sphider Help
- Topic: Improvements to Sphider handling of robots.txt
- Replies: 1
- Views: 5928
Re: Improvements to Sphider handling of robots.txt
This is a partial solution to fixing checkRobotsTxt() function. It reads the robots.txt file, considering both the * user-agent and the Sphider user-agent, both allows and disallows. It produces a master array of denys and allows which are compiled based on the rules mentioned in a previous post. Th...
- Sat Sep 09, 2023 3:55 pm
- Forum: Sphider Help
- Topic: unicode indexing
- Replies: 18
- Views: 12407
Re: unicode indexing
It is possible for a particular database to have a character set and collation DIFFERENT than defaults. Before giving up completely, let's be sure that character set really is the issue. Both the "install.php" and manual "tables.sql" provided make every effort to be utf8mb4! Go t...
- Fri Sep 08, 2023 5:44 pm
- Forum: Sphider Help
- Topic: Improvements to Sphider handling of robots.txt
- Replies: 1
- Views: 5928
Improvements to Sphider handling of robots.txt
The checkRobotsTxt() function in Sphider is deficient. It is not case sensitive, but that is a minor problem easily corrected. Of more major cancern is the lack of support for the Allow directive. I have gathered some thoughts on what needs to be done and would appreciate any comment or suggestion a...
- Fri Sep 08, 2023 5:33 pm
- Forum: Sphider Help
- Topic: How to 'safely' access pdftotext from the http process
- Replies: 1
- Views: 5872
Re: How to 'safely' access pdftotext from the http process
Yes, Sphider can index PDF files. However, the translation of PDF to text is not native to Sphider. That is done by a utility, pdftotext. This is pretty much a standard executable in Linux based systems, typically residing at /usr/bin/pdftotext. On a Windows system, pdftotext.exe is NOT present by d...