NOHOST Plaguing Me

Come here for help or to post comments on Sphider
Post Reply
gxwalsh
Posts: 3
Joined: Thu Oct 07, 2021 6:03 pm

NOHOST Plaguing Me

Post by gxwalsh »

Hello! I'm a new user and was able to follow the install guide to get it up and running. My server (https://www.gregwalsh.com) is a virtual host using SSL. When I run the indexer from the admin panels, I get NOHOST. I have tried multiple combinations of http and https as well as www. vs gregwalsh.com. None work. The site automatically redirects from http to https so I would imagine the http site would fail to index. I was successful in indexing an http site but not the https version and was successful in indexing an RSS feed.

My certificate was created through Let's Encrypt. My gut is telling me that my certificates might be rejected by the crawler engine. I followed this site (https://vander.host/knowledgebase/softw ... _contents/) to disable the get file contents check for ssl cert within the indexURL function in spider.php but that didn't work. I also tried running the spider from the command prompt and had similar results. My server OS is Ubuntu 16.04.2 LTS.

Do you have any ideas what causes that NOHOST and how to work around it? I'm trying to use this with my personal site/blog/portfolio.

Thank you!
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: NOHOST Plaguing Me

Post by captquirk »

This is just an initial response....

By using "https://gregwalsh.com/", I was able to index your site with no problem. I retrieved 19 pages (including a pdf file), 27 images, and got 1614 unique keywords. There was a single redirect, but from experience I know that was normal for that url. So we know that you do have to specify "https" (Sphider does NOT follow redirects, so "http" would find nothing to index) and your SSL certificate is not an issue. In fact, I encourage people to use SSL and Let's Encrypt is an excellent choice and costs nothing.

I will attempt to duplicate NOHOST, or at least find what might be the cause and get back to you. We will figure it out.
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: NOHOST Plaguing Me

Post by captquirk »

I can't duplicate the NOHOST condition (as you seem to get it), but I know the cause is the inability on your side to contact https://gregwalsch.com.

To try to get to the bottom of WHY, copy and run this script and let me know the results:
<?php

$host = "gregwalsh.com";
$target = "ssl://".$host;
$port = 443;
$errno = 0;
$errstr = "";
$fsocket_timeout = 30;

$fp = fsockopen($target, $port, $errno, $errstr, $fsocket_timeout);
if (!$fp) {
print "NOHOST";
} else {
print "Success";
}

?>
You can name it something like "test.php".

Except in very unusual circumstances, the SSL port used by servers is 443. For simple "http", port 80 is used.
gxwalsh
Posts: 3
Joined: Thu Oct 07, 2021 6:03 pm

Re: NOHOST Plaguing Me

Post by gxwalsh »

Thank you for your help! I ran the code and received a "SUCCESS". It now can successfully spider. I did spider before running that code.

What changed: Nothing. I am completely stumped but it works<?> so we'll just leave it at that.
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: NOHOST Plaguing Me

Post by captquirk »

I am stumped as well! Wild guess, your web server "hiccupped" at just the "right" moment?

But I will accept that it is working. If, for some reason, it ceases to work, let me know.
gxwalsh
Posts: 3
Joined: Thu Oct 07, 2021 6:03 pm

Re: NOHOST Plaguing Me

Post by gxwalsh »

I installed a php zip library (unrelated to this) after having this problem and perhaps the package manager refreshed my ph install which fixed it?!?

Thank you for all your help!
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: NOHOST Plaguing Me

Post by captquirk »

Possibly related... hard to know for sure. MAYBE you were lacking a PHP module that Sphider uses and the library installed it? Without comparing before and after results of phpinfo() query, that would be pure conjecture. (Seeing the phpinfo() results WOULD have been one of my next steps.)

The important thing is,PROBLEM SOLVED! You are up and running.

Any questions, problems... I am here to help.
Post Reply