Page 1 of 1

Cannot index my site

Posted: Wed Oct 05, 2022 3:13 pm
by akmn
Hi, I cannot get the latest version of Sphider to index my website. It displays the code below. I've spend hours trying to get this to work but zero luck. This is driving me insane because the older version of Sphider worked just fine with PHP5 but now I have version 8.1.9 and can't get it to work. I have chmod777 the full sphider directory and even created a new user named 'www-data' from some google searching and made it owner of the directory but of course that didn't work.

Please help if you can.

Thanks.

Code: Select all

Permission denied
Fatal error: Uncaught TypeError: fclose(): Argument #1 ($stream) must be of type resource, bool given in /var/www/html/search/admin/spiderfuncs.php:229 Stack trace: #0 /var/www/html/search/admin/spiderfuncs.php(229): fclose() #1 /var/www/html/search/admin/spiderfuncs.php(248): urlStatus() #2 /var/www/html/search/admin/spider.php(916): checkRobotTxt() #3 /var/www/html/search/admin/spider.php(207): indexSite() #4 {main} thrown in /var/www/html/search/admin/spiderfuncs.php on line 229
Also, I don't know much about PHP extensions and whether they are on or off but if I run

Code: Select all

php -m
I don't see any results with mysql... mysqli.. mysqlnd... etc. I don't know if they are running or not. However, when running the code below

Code: Select all

<?php
$mysqlnd = function_exists('mysqli_fetch_all');

if($mysqlnd){
echo 'mysqlnd enabled';
}
I get the message that it IS enabled.

Re: Cannot index my site

Posted: Wed Oct 05, 2022 8:10 pm
by captquirk
Pretty sure your mySqlnd is indeed enabled enabled. Your permissions are fine and user change for data is correct.

Your PHP version 8.1.9 should not be the issue. Development machine is currently at 8.1.8, so there should be no issues there.

The problem has to do with opening a file stream on your website.

The error thrown says that argument #1 of fclose must be bool. It is. It all traces back to the instruction to begin the index ("indexSite"). The only thing I can imagine is that the URL is malformed?

Please provide the URL of the site you are trying to index and I will try to duplicate the issue.

Re: Cannot index my site

Posted: Wed Oct 05, 2022 10:25 pm
by akmn
Thank you for replying. It is a site that can only be accessed internally for now. I have it as...

Code: Select all

http://pub.co.xxxxx.mn.us
With the x denoting the county that it is for.

I have moved it to my local machine today with Xampp and it seems to work just fine. I have another issue with the filenames that have spaces in them not being indexed because it will only grab the filename leading up to the first space and then drop the rest of the filename. I would love a way to fix that because I use you search software on another internal site as well and was finally able to get it to scan inside of PDF files which is really great.

Thanks again!!

Re: Cannot index my site

Posted: Thu Oct 06, 2022 2:29 am
by captquirk
Spaces in filenames can be a real pain. Windows based servers typically replace a space with a "%20", but Linux?UNIX servers do not.

At this point, the initial error was likely indeed a URL name that resulted in a corrupted file pointer during the fsockopen.

My advice is to rename files with spaces by using the "_" (underscore character). IF you are using a Linux machine for editing, this can be done fairly easily by using sed in a batch process. This can also be done for the internal references. If you are using Windows, I am probably going to be useless in how to clean things up. (Sorry :oops: ).

Re: Cannot index my site

Posted: Thu Oct 06, 2022 5:52 am
by akmn
I was hoping there was a way to edit the spiderfuncs.php in the regex between the square brackets to magically grab the full filename up through the file extension and I have tried a few edits there but of course I only barely know what I'm doing in there. TWSS :lol: and have had no luck getting it to work.
At this point, the initial error was likely indeed a URL name that resulted in a corrupted file pointer during the fsockopen.
Do you mean that maybe I had http://pub during setup and then was trying to run the index with the full FQDN? Or if not, could you elaborate so I don't do the same thing again.

One more thing, I paid for the plus version a couple of years ago and now when I try to log in, it says that my username is too old. I don't think I ever used it because I found the free version back then to be much faster. Is version 5.0.0 the best version of the software?

Code: Select all

https://www.sphider.worldspaceflight.com/download.php

Re: Cannot index my site

Posted: Thu Oct 06, 2022 7:17 pm
by captquirk
Let's try something...
In spiderfuncs.php, starting at line 136, see:

Code: Select all

    if (mbsubstr($url, 0, 5) == "https") {
        $target = "ssl://".$host;
    } else {
        $target = $host;
    }

Add a line so it looks like:

Code: Select all

    if (mbsubstr($url, 0, 5) == "https") {
        $target = "ssl://".$host;
    } else {
        $target = $host;
    }
echo $target;

Let me know what this produces. (It should be the host name.

You mentioned before using Xampp. Just curious if this in on Linux or Windows? Regardless, good choice with Xampp. I personally use MySQL but MariaDB is perfectly compatible. (The MariaDB developers were actually the ORIGINAL developers of MySQL and started MariaDB out of fears Oracle would screw up MySQL!)

Anyway, using Sphider to index a LOCAL website can be a little tricky. It CAN be done, just a little tricky as Sphider likes (demands?) there be a "http(s)" in the URL.

You mention SphiderPlus, so a bit of history. The original Sphider was written by Ando Saabas. I am GUESSING it may have been something like a graduate project or something. At any rate, he abandoned it and moved on other endeavors. There were a couple people who took up where Saabas had left off. The most notable is SphiderPlus, and it is not free. (The original Sphider was.)
As time moved on, Sphider became non-functional, as it was not being updated. SphiderPlus was being updated, but again, not for free.
I took it upon myself to update the original Sphider and have been doing so for several years. Like the original, and in the spirit of the original, it remains free.
IN MY OPINION, SphiderPlus has deviated considerably from the original. My version, while the code has changed a lot, the logic and flow is still very close to what Ando Saabas originally created.
Which version is better? That depends on each individuals opinion, and as your know, opinions are like a certain part of human anatomy, and everyone has one!
MY OPINION is that SphiderPlus is unnecessarily complex and convoluted. That being said, I do believe 5.0.0 is the best version so far. I will fix issues as they are found, but also believe in the KISS principle.