.com search at Sphider 1.42

Come here for help or to post comments on Sphider
exploro
Posts: 6
Joined: Sun Jan 27, 2019 2:10 pm

.com search at Sphider 1.42

Post by exploro » Sun Jan 27, 2019 2:59 pm

I have been trying to install Sphider version later than 1.42 but no luck. Al thought installations was working and database was good; spider didn't fetch results. It just start spidering and then stopped immediately (the problem wasn't with all domains but with a lot of them). So now we have version 1.42 in fully working order. The problem is that you cannot search for a domain with any tld extension for example .com or .eu etc... I have tried to implement a mod found in the original Sphider forum to bypass this problem. It seems that the code for the mod is depreciated or it's incompatible with your version of Spider. Your script work just fine and I do appreciate the effort and the time you are spending. Any help will be appreciated.

User avatar
captquirk
Site Admin
Posts: 119
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: .com search at Sphider 1.42

Post by captquirk » Mon Jan 28, 2019 3:30 am

Most likely the code is deprecated. The original Sphider used the OLD mysql extension. Sphider 1.42 was converted to mysqli.

If you will post the mod code, I will see if I can update it to mysqli for you.
Sphider 1.42 is no longer maintained, but I DO support requests for help with it.

On a side note, later versions of Sphider use both mysqli AND mysqlnd. A number of hosts do not provide mysqlnd for clients on shared hosting. The PDO version gets around that. https://www.sphider.worldspaceflight.co ... _check.zip is a script that will help determine what should work for you. Possibly the solution is the PDO version. If you have to use Sphider 1.42, I will still help you out.

exploro
Posts: 6
Joined: Sun Jan 27, 2019 2:10 pm

Re: .com search at Sphider 1.42

Post by exploro » Mon Jan 28, 2019 10:22 pm

I will try Sphider PDO in a subdomain and will update you with the results. Is it possible to convert MySQLi database to PDO?

About the mod, I cannot find it anymore in the original Sphider forum. It was dated June 2006 and for some reason disappeared.
A workaround is from the file "spiderfuncs.php" to edit line 619 to look like this:

Code: Select all

$file = preg_replace("/[\*\^\+\?\[\]\^\$\|\{\)\(\}~!\"\/@#£$%&=`?;><:,]+/", " ", $file);
But afterwards you need to delete the website from the database and re-index it!

The following code from the same forum can do the trick maybe? It is totally different Mod that will show all links for a website in the database if you search for it this way

Code: Select all

site:www.mydomain.com
1) In search.php find:

Code: Select all

$search_results = get_search_results($query, $start, $category, $type, $results, $domain);
and above this row include:

Code: Select all

$pos = strstr(strtolower($query),"site:");
if ($pos) include ("$include_dir/search_links.php");
2)In .../include/ folder create a new file called search_links.php with contents:

Code: Select all

<?
$starttime = getmicrotime();
$notitle = "No meta title available for this site";
$nodes = "No meta description available for this site";
$query = strtolower($query);
$pos = strpos($query,":");
$urlquery = strip_tags(trim(substr($query,$pos+1)));

// Search for URLs that were already indexed.
$res=mysql_query("select * from ".$mysql_table_prefix."sites where url like '%$urlquery%' AND indexdate != ''");
echo mysql_error();
$num_rows = mysql_num_rows($res);

if ($num_rows == 0) { // Nothing found
print "<br><div id =\"result_report\">The site search \"$urlquery\" didn't match any indexed URL</div>";
die('');
}
if ($num_rows > '1') { // Multiple choice
print "<br><br><b><font color=\"red\">Multiple choice. Please select one domain: </font></b><br><br>";
for ($i=0; $i<$num_rows; $i++) {
$url2 = mysql_result($res, $i, "url");
$indexdate = mysql_result($res, $i, "indexdate");

?>
<b><?php print $i+1?>.</b>
<a href="./search.php?query=site:<?php print $url2?>&search=1" class="title"><?php print $url2 ?></a><a class="description"><?php print "&nbsp;&nbsp;&nbsp;indexed: $indexdate<br><br>"?></a>
<?
}
die('');
}

// Get all links of this URL.
$site_id = mysql_result($res,"site_id");
$res=mysql_query("select * from ".$mysql_table_prefix."links where site_id like '$site_id'");
echo mysql_error();
$num_rows = mysql_num_rows($res);

if ($num_rows == 0) print "<br><div id =\"result_report\">The search \"$urlquery\" didn't match any indexed links</div>";
if ($num_rows > 0) { // Display header row and all results
$endtime = getmicrotime() - $starttime;
$time = round($endtime*100)/100;
print "<br><div id =\"result_report\">Displaying $num_rows site results for \"$urlquery\" ($time seconds)</div>";

for ($i=0; $i<$num_rows; $i++) {
$url2 = mysql_result($res, $i, "url");
$title = mysql_result($res, $i, "title");
$description = mysql_result($res, $i, ");
$page_size = mysql_result($res, $i, "size");

?>
<b><?php print $i+1?>.</b>
<a href="<?php print $url?>" class="title"> <?php print $title; if (!$title) print $notitle?></a><br/>
<div class="description"><?php print $description; if (!$description) print $nodes?></div>
<div class="url"><?php print $url2?> - <?php print $page_size?> kB<br><br></div>

<?
}
}
die ('');
?> 

User avatar
captquirk
Site Admin
Posts: 119
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: .com search at Sphider 1.42

Post by captquirk » Tue Jan 29, 2019 1:55 am

Is it possible to convert MySQLi database to PDO?
The beauty of it all is that the database doesn't need to be converted! The database is MySQL (or for some people, MariaDB). What IS different between MySQLi and PDO is how the database is accessed. When using MySQLi, you can access a MySQL database. IF your MySQLi ALSO uses prepared statements (which all versions of Sphider since 1.5.0 do), then MySQLnd must also be available. Since some hosts do not provide MySQLnd to all customers, the MySQLi versions of Sphider (what I call "classic") just aren't going to work.

PDO is different. PDO stands for "PHP data objects" and is generalized in nature. It is NOT MySQL specific but depends on the connection code as to exactly what type of database it connects to. Sphider (the PDO version) targets a MySQL database. The underlying sql differs, in some cases quite substantially, from MySQL/MariaDB sql. Hence, Sphider classic to access the MySQL database using MySQL queries, and Sphider PDO to access perhaps the SAME database only using PDO compliant queries.

Sphider has a few instances that require some special, MySQL targeted PDO queries. For this reason, PDO Sphider cannot directly be ported to use PostgreSQL or SQLite (or any other database other than MySQL). Sphider has to be edited to work around those special MySQL requirements. We actually did that for PostgreSQL and SQLite, but demand was insufficient to keep them current.

Anyway, back on topic, if your installation supports PDO, you can use the database created with Sphider 1.42, but you WILL need to run the update_rollup.php to make alterations to the database (adding new tables, adding/changing table columns, updating the settings table, etc.).

I'll look at the rest of the info you gave me and I'll get back to you.

User avatar
captquirk
Site Admin
Posts: 119
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: .com search at Sphider 1.42

Post by captquirk » Tue Jan 29, 2019 2:50 am

Looking at the posted code, specifically the last section (everything before is preparation for making the last section work), I notice:
// Search for URL's that were already indexed.
What is actually being search for is URL's that have been indexed. That is already available. When you look on the Sites tab, you see all sites and whether or not they have been indexed. What if the SPECIFIC URL is not a site, thus not listed? Well, if you know the URL, you know which site. Click on "Options" for that site. On the next screen, there are several options. One of them is "Browse pages." Click on that. There you see a list of each and every URL indexed for that site. If you do not want to scrool through the entire list, you can enter a piece of the URL in the text box and click "Filter." You get URL's containing your filter (if any). So you can already find URL's containing a specific word or string.

What if you know only a word or string of a URL, but not the domain part of the URL? Well, you can do a typical search from search.php, BUT the site must be indexed with "Index words in domain name and url path" checked on the Settings tab.
// Get all links of this URL.
This query is actually simply finding all links discovered in a site (using the site_id, which is EXACTLY what "Browse pages" is doing). Again, if the site is in the database and appears on the Sites tab, you can go into "Options", "Browse pages", and there they are! You can also, from "Options", click "Stats." There you will see the number of pages (links) indexed for that site.

The MOD appears to be trying to reinvent the wheel. Unless I'm really missing something here, whoever wrote the MOD is trying to get in through a side window when the front door is wide open. LOL!

User avatar
captquirk
Site Admin
Posts: 119
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: .com search at Sphider 1.42

Post by captquirk » Tue Jan 29, 2019 4:59 am

Here is the code translated from mysql to mysqli (like that found in 1.42).
I have no way of testing this. My PHP version (7+) is such that 1.42 is too antiquated to run! But the code does look correct as far as the sql is concerned.
The other code fragments do not need converting. It would just be a matter of finding the correct placement in the files since line numbers have undoubtedly changed in 1.42.
A final note of caution... in the first code snippet about altering the preg_replace... this allows characters into the database that may have detrimental effects, such as making malicious sql injection easier.

Code: Select all

<?php
$starttime = getmicrotime();
$notitle = "No meta title available for this site";
$nodes = "No meta description available for this site";
$query = strtolower($query);
$pos = strpos($query,":");
$urlquery = strip_tags(trim(substr($query,$pos+1)));

// Search for URLs that were already indexed.
$res=$db->query("select * from ".$mysql_table_prefix."sites where url like '%$urlquery%' AND indexdate != ''");
echo $db->error;
$num_rows = $res->num_rows;

if ($num_rows == 0) { // Nothing found
print "<br><div id =\"result_report\">The site search \"$urlquery\" didn't match any indexed URL</div>";
die('');
}
if ($num_rows > '1') { // Multiple choice
print "<br><br><b><font color=\"red\">Multiple choice. Please select one domain: </font></b><br><br>";
while ($row = $res->fetch_assoc()) {
$url2 = $row['url'];
$indexdate = $row['indexdate'];

?>
<b><?php print $i+1?>.</b>
<a href="./search.php?query=site:<?php print $url2?>&search=1" class="title"><?php print $url2 ?></a><a class="description"><?php print "&nbsp;&nbsp;&nbsp;indexed: $indexdate<br><br>"?></a>
<?php
}
die('');
}

// Get all links of this URL.
$row = $res->fetch_assoc();
$site_id = $row['site_id'];
$res=$db->query("select * from ".$mysql_table_prefix."links where site_id like '$site_id'");
echo $db->error;
$num_rows = $res->num_rows;

if ($num_rows == 0) print "<br><div id =\"result_report\">The search \"$urlquery\" didn't match any indexed links</div>";
if ($num_rows > 0) { // Display header row and all results
$endtime = getmicrotime() - $starttime;
$time = round($endtime*100)/100;
print "<br><div id =\"result_report\">Displaying $num_rows site results for \"$urlquery\" ($time seconds)</div>";

while ($row = $res->fetch_assoc()) {
$url2 = $row['url'];
$title = $row['title'];
$description = $row['description'];
$page_size = $row['size'];

?>
<b><?php print $i+1?>.</b>
<a href="<?php print $url?>" class="title"> <?php print $title; if (!$title) print $notitle?></a><br/>
<div class="description"><?php print $description; if (!$description) print $nodes?></div>
<div class="url"><?php print $url2?> - <?php print $page_size?> kB<br><br></div>

<?php
}
}
die ('');
?>
{Edited once]

exploro
Posts: 6
Joined: Sun Jan 27, 2019 2:10 pm

Re: .com search at Sphider 1.42

Post by exploro » Wed Jan 30, 2019 4:33 pm

The mod as you modified is working fine with Sphider 1.42!
The usage is from the front end. If you have to search for "site:https://www.tutorialspoint.com/" it does print out a multiple choice from where you can choose the domain that interest you, in case the database contains a domain with sub-domains it will display them too.
Afterwards, it will display all links for the domain found in the database. But the results hasn't any increasing numbers! all links have the same number and all of them will print out in the same page (no pagination). As you have noticed if you choose from settings "Index words in domain name and url path" the mod isn't very necessary.
I also did install Sphider 2.2.0 PDO in a subdomain with a new database and it's working just fine! Very impressive compared with 1.42
The usage of server resources is much more optimized and the admin controls responds much quicker than the old version. Also the PDO database works like charm! Is there a way to make the Images tab look like the Search tab? Without all the advanced choices? Thanks to your post I managed to remove the RSS tab.

User avatar
captquirk
Site Admin
Posts: 119
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: .com search at Sphider 1.42

Post by captquirk » Wed Jan 30, 2019 10:41 pm

Is there a way to make the Images tab look like the Search tab? Without all the advanced choices?
YES! In fact, that is one of the changes coming with Sphider 2.3. You will have the ability to mix and match with the search tabs, so an Image search can be a stand alone choice. [** I believe I misunderstood what you asked. See my update below.]

I am anticipating 2.3 to be release by the end of February. Coding changes are essentially complete and will only be altered if testing shows a problem. Final testing of the classic version is well underway, and for the PDO version testing has started. Also some updating of the User Guide reflecting new features or changed features is underway.

In regards to the MOD, I now understand what you were seeking to do. Sorry if I was a bit slow catching on. They is a chance that mod could be made to work with later versions, including PDO. I suspect it might be a little more complex because of advanced filtering being done on the search side. IF that is something you would want, I could give it a look after I get 2.3 released.
--------------------------
[Update] What you ask about the image search is if it can be just a query box and "Search" button. I didn't code in an advanced/hide advanced feature because so often websites use rather non descriptive names for their images. What you ask isn't impossible. Just a quick look and img_search_form.php would need to be altered to remove the options. There would probably need to be a query parameter added in search.php, and the form would need to take into account the "missing" items in the form of hidden inputs (so that the actual search function doesn't go berserk looking for something that isn't there). Again, I can look at this after the next release is put to bed.

exploro
Posts: 6
Joined: Sun Jan 27, 2019 2:10 pm

Re: .com search at Sphider 1.42

Post by exploro » Wed Jan 30, 2019 11:50 pm

captquirk wrote:
Wed Jan 30, 2019 10:41 pm
What you ask about the image search is if it can be just a query box and "Search" button.
yes exactly, that's what I'm looking for! and it isn't very urgent. I will look forward for the next update of the PDO version. Also the MOD can wait for the next update. More important for me now, is where I can set the maximum number of links per domain? I mean to tell Sphider he has to stop when fetch 40 or 50 links from a domain. What file I need to edit to set a limit for links per domain? I need that because I'm using a batch file to run indexing.

User avatar
captquirk
Site Admin
Posts: 119
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: .com search at Sphider 1.42

Post by captquirk » Thu Jan 31, 2019 2:55 am

I mean to tell Sphider he has to stop when fetch 40 or 50 links from a domain.
Hmmmm. Interesting. I'll have to dig into that one when I get some time.

As far as eliminating advance options for an image search, that may be easier than I thought.

In PDO 2.2.0, in search.php, find on line 331:

Code: Select all

global $query, $search, $scope, $type, $results, $start, $img_ok;
Change this to:

Code: Select all

global $query, $search, $scope, $type, $results, $start, $img_ok, $adv;
Now, go to img_search_form.php, starting on line 17, find the block of code:

Code: Select all

    echo $sph_messages['Search'];
    echo "'><br><br>
    <div class='advanced'>
    <div class='left'>
    <input type='radio' name='type' value='name' checked>";
    echo $sph_messages['SearchImgName'];
    echo "<br><br>
Change this to:

Code: Select all

    echo $sph_messages['Search'];
    echo "'><br><br>";
if ($adv==1 || $advanced_search==1) {
    echo "<div class='advanced'>
    <div class='left'>
    <input type='radio' name='type' value='name' checked>";
    echo $sph_messages['SearchImgName'];
    echo "<br><br>
Further down to what is currently line 80, find this block:

Code: Select all

    echo "</select>
    </div>
    <input type='hidden' name='search' value='1'>
    <input type='hidden' name='s' value='3'>
    </form><br>
Change this to:

Code: Select all

    echo "</select>
    </div>";
 } else {
echo "<input type='hidden' name='type' value='name'>
	<input type='hidden' name='results' value='10'>
	<input type='hidden' name='scope' value='all'>";
}
echo "<input type='hidden' name='search' value='1'>
    <input type='hidden' name='s' value='3'>
    </form><br>
Now the image search should follow the same "Advanced search" toggle found in settings. Also, when settings has "Advanced search" off, you can still get an image advanced search by adding "adv=1" to the URL parameter string.

Post Reply