Search found 306 matches
- Fri Sep 08, 2023 5:33 pm
- Forum: Sphider Help
- Topic: How to 'safely' access pdftotext from the http process
- Replies: 1
- Views: 5910
Re: How to 'safely' access pdftotext from the http process
Yes, Sphider can index PDF files. However, the translation of PDF to text is not native to Sphider. That is done by a utility, pdftotext. This is pretty much a standard executable in Linux based systems, typically residing at /usr/bin/pdftotext. On a Windows system, pdftotext.exe is NOT present by d...
- Fri Sep 08, 2023 5:17 pm
- Forum: Sphider Help
- Topic: unicode indexing
- Replies: 18
- Views: 12499
Re: unicode indexing
Yes. the utf8mb4 is the correct encoding. As for collation, I tend to use utf8mb4_general_ci, BUT --- it really doesn't matter that much. Different collations may present different sorting, but the IMPORTANT thing is that it be utf8mb4!!! Why MySQL doesn't have full 4 byte UTF8 encoding as a default...
- Fri Sep 08, 2023 3:49 pm
- Forum: Sphider Help
- Topic: unicode indexing
- Replies: 18
- Views: 12499
Re: unicode indexing
Sphider is fully Unicode capable. Example: If you index a Russian web site, then want to search for "грузовик", are you actually searching for "грузовик", or some numeric Unicode equivalent? Also, check your database. Is the default character set UTF8_mb4, or just UTF8? MySql def...
- Fri Sep 08, 2023 4:19 am
- Forum: Sphider Help
- Topic: Robots.txt - would allow support be useful for many
- Replies: 6
- Views: 7080
Re: Robots.txt - would allow support be useful for many
I tried you mod out and there are issues. Using the 5.3.0 checkRobotsTxt(): Disallowed files and directories in robots.txt: https://sphider.worldspaceflight.com/contact/ https://sphider.worldspaceflight.com/include/ https://sphider.worldspaceflight.com/download/ https://sphider.worldspaceflight.com/...
- Fri Sep 08, 2023 12:09 am
- Forum: Sphider Help
- Topic: Robots.txt - would allow support be useful for many
- Replies: 6
- Views: 7080
Re: Robots.txt - would allow support be useful for many
You are correct. Looking closer you can see the existing function ONLY looks for "disallows" and not allows". There is definitely room for improvement here. Also, thanks for your proposed improvements. I will look it over and may incorporate it, or some version of it, in a future (nex...
- Thu Sep 07, 2023 3:45 pm
- Forum: Sphider Help
- Topic: An issue with PDF indexing.
- Replies: 16
- Views: 12591
Re: An issue with PDF indexing.
This is crazy. I tried the url and got that page, and NO LINKS FOUND! By email, lets see the full log (looks like it started but then hung...?), the setting screen, and the site advanced edit screen. Also see if there is anything relevant in the error logs. (As shipped, Sphider has logging turned on...
- Thu Sep 07, 2023 2:43 am
- Forum: Sphider Help
- Topic: Robots.txt - would allow support be useful for many
- Replies: 6
- Views: 7080
Re: Robots.txt - would allow support be useful for many
First off, Sphider will index https sites. It follows robots.txt. To allow Sphider, but disallow all other bots, try something like this: User-agent: * Disallow: / User-agent: Sphider (sphidersearch.com) Allow: / You could also go into the Settings tab and change the User Agent string to something y...
- Mon Sep 04, 2023 6:56 pm
- Forum: Sphider Help
- Topic: Sphider 5.2.0 and SphiderLite 2.3.0 fail after install
- Replies: 1
- Views: 3407
Re: Sphider 5.2.0 and SphiderLite 2.3.0 fail after install
Permanent, automatic fix in Sphider 5.3.0, SphiderLite 2.4.0.
No more user code tweaking necessary!
No more user code tweaking necessary!
- Mon Sep 04, 2023 4:22 pm
- Forum: Announcements
- Topic: Sphider 5.3.0 and SphiderLite 2.4.0 released
- Replies: 0
- Views: 14743
Sphider 5.3.0 and SphiderLite 2.4.0 released
The newest releases now REQUIRE the mbstring PHP extension. The mbstring functions are no longer emulated if missing. Jquery has been updated, code has been cleaned up to PSR-2 standards (except some special modules). On the advanced search screen, references to categories has been removed when cate...
- Sun Sep 03, 2023 3:56 am
- Forum: Sphider Help
- Topic: An issue with PDF indexing.
- Replies: 16
- Views: 12591
Re: An issue with PDF indexing.
I tried various things to isolate the issue, such as stemming off, stemming on, various settings changes... to no avail. Then I upgraded the CentOS Sphider from 5.2.1 to 5.3.0.... Here is the log file: Spidering http://localhost/ 1. Retrieving: http://localhost/ at 03:44:36. Size of page: 10.78kb. S...