Hack to restart an interrupted re-index run

Post Reply
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Hack to restart an interrupted re-index run

Post by captquirk »

When an indexing run gets interrupted, Sphider has always had the ability to pick up where it left off. Re-indexing, however, is a different process. If a re-index gets interrupted, the only option has been to start over. Sphider 4.0.0, 4.0.1, and 4.0.2 (SphiderLite 2.0.0, 2.0.1, and 2.0.2) introducing a hack to attempt to pick off where the interrupted run left off. Due to the fact that this process is cumbersome, often not possible due to other circumstances, and was generally caused more issues than it solved, it was removed in Sphider 4.1.0 (SphiderLite 2.1.0).

Recognizing that SOME users might have actually found the process beneficial, this will explain how to add the hack to the most recent versions.

What needs to be done is 1) Create a new database table, 2) Edit admin.php to administer that new table, and 3) Edit spider.php to implement the process.

Step 1: Create a new database table.

In the admin folder, create a new file named make-temp2.php. The contents will be:
<?php
$settings_dir = "../settings";
require "$settings_dir/database.php";
// Create temp2 table
$db->query(
"CREATE TABLE ".$mysql_table_prefix."temp2 (
link varchar(255),
id varchar (32)
) ENGINE = InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci"
);

if ($db->errno > 0) {
print "Error: ";
print $db->error;
print "<br>";
$error += $db->errno;
}
After saving, run make-temp2.php from a browser. There will be not output, but you can confirm it worked by going into Sphider admin and see the temp2 table on the Database tab. You may now discard make-temp2.php as it is no longer needed.

Step 2: Edit admin.php to administer the new table.

ABOUT line 56 in admin.php, find this line:
$clean_funcs = Array ("clean" => "default", 15=>"default", 16=>"default",
17=>"default", 23=>"default", "clean_doms" => "default", "clean_imgs" => "default",
"clean_feeds" => "default");
In SphiderLite, the line will be a bit different:
$clean_funcs = Array ("clean" => "default", 15=>"default", 16=>"default",
17=>"default", 23=>"default", "clean_doms" => "default");
ADD one more entry to appear such:
$clean_funcs = Array ("clean" => "default", 15=>"default", 16=>"default",
17=>"default", 23=>"default", "clean_doms" => "default", "clean_imgs" => "default",
"clean_feeds" => "default", 50=>"default");
(Same idea is SphiderLite).

Around line 900 to 950 (depending on Sphider or SphiderLite), find this:
/**
* Function to truncate the temp table
*
* @return void
*/
function cleanTemp()
{
global $mysql_table_prefix, $db, $Submit, $key;

if (!isset($Submit) || $Submit != $key) {
return;
}
$query = "DELETE FROM ".$mysql_table_prefix."temp WHERE level >= 0";
$stmt = $db->prepare($query);
if ($stmt) {
$stmt->execute() or die("Execution failed: ".$stmt->error);
$del = $stmt->affected_rows;
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
echo "<div id='submenu'>
</div>
<br><div style='text-align:center;'><b>Temp table cleared, "
.$del." items deleted.</b></div>";
}


/**
* Function to truncate the query_log table
*
* @return void
*/
function clearLog()
You want to add a NEW function (cleanTemp2) after cleanTemp and before clearLog:
/**
* Function to truncate the temp table
*
* @return void
*/
function cleanTemp()
{
global $mysql_table_prefix, $db, $Submit, $key;

if (!isset($Submit) || $Submit != $key) {
return;
}
$query = "DELETE FROM ".$mysql_table_prefix."temp WHERE level >= 0";
$stmt = $db->prepare($query);
if ($stmt) {
$stmt->execute() or die("Execution failed: ".$stmt->error);
$del = $stmt->affected_rows;
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
echo "<div id='submenu'>
</div>
<br><div style='text-align:center;'><b>Temp table cleared, "
.$del." items deleted.</b></div>";
}


/**
* Function to truncate the temp2 table
*
* @return void
*/
function cleanTemp2()
{
global $mysql_table_prefix, $db, $Submit, $key;

if (!isset($Submit) || $Submit != $key) {
return;
}
$query = "DELETE FROM ".$mysql_table_prefix."temp2";
$stmt = $db->prepare($query);
if ($stmt) {
$stmt->execute() or die("Execution failed: ".$stmt->error);
$del = $stmt->affected_rows;
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
echo "<div id='submenu'>
</div>
<br><div style='text-align:center;'><b>Temp2 table cleared, "
.$del." items deleted.</b></div>";
}


/**
* Function to truncate the query_log table
*
* @return void
*/
function clearLog()
ABOUT line 1915 to 2165 (depending on Sphider or SphiderLite), withing the function cleanForm, find:
$stmt = $db->prepare("SELECT COUNT(*) FROM ".$mysql_table_prefix."temp");
if ($stmt) {
$stmt->execute() or die("Execution failed: ".$stmt->error);
$result = $stmt->get_result();
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
if ($row = $result->fetch_array(MYSQLI_NUM)) {
$temp = $row[0];
}
echo "<div id='submenu'>
Add code to appear as:
$stmt = $db->prepare("SELECT COUNT(*) FROM ".$mysql_table_prefix."temp");
if ($stmt) {
$stmt->execute() or die("Execution failed: ".$stmt->error);
$result = $stmt->get_result();
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
if ($row = $result->fetch_array(MYSQLI_NUM)) {
$temp = $row[0];
}
$stmt = $db->prepare("SELECT COUNT(*) FROM ".$mysql_table_prefix."temp2");
if ($stmt) {
$stmt->execute() or die("Execution failed: ".$stmt->error);
$result = $stmt->get_result();
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
if ($row = $result->fetch_array(MYSQLI_NUM)) {
$temp2 = $row[0];
}
echo "<div id='submenu'>
Near the end of the same function, find:
<a href='admin.php?f=17&amp;Submit=".$key."'
class='small_button'>Clear temp table </a></div>
<div class='stat_col' style='width:70%;'>".$temp." items in
temporary table.</div>
</div>
<div class='row' style='text-align:left;'>
<div class='stat_col' style='width:30%;'>
<a href='admin.php?f=23&amp;Submit=".$key."'
class='small_button'>Clear search log </a></div>
Add code to appear as:
class='small_button'>Clear temp table </a></div>
<div class='stat_col' style='width:70%;'>".$temp." items in
temporary table.</div>
</div>
<div class='row' style='text-align:left;'>
<div class='stat_col' style='width:30%;'>
<a href='admin.php?f=50&amp;Submit=".$key."'
class='small_button'>Clear temp2 table </a></div>
<div class='stat_col' style='width:70%;'>".$temp2." items in
temporary table.</div>
</div>
<div class='row' style='text-align:left;'>
<div class='stat_col' style='width:30%;'>
<a href='admin.php?f=23&amp;Submit=".$key."'
class='small_button'>Clear search log </a></div>
At nearly the end of admin.php, find:
case 'delete_map':
unlink("sitemaps/".$file);
statisticsForm('sitemaps');
break;
case '':
showSites('');
break;
Add code as such:
case 'delete_map':
unlink("sitemaps/".$file);
statisticsForm('sitemaps');
break;
case 50:
cleanTemp2();
break;
case '':
showSites('');
break;
Step 3: Edit spider.php to implement the hack.

Avout line 690 (620 in SphiderLite), find
$t = microtime();
$a = getenv("REMOTE_ADDR");
$sessid = md5($t.$a);

$urlparts = parse_url($url);
Add code to appear as:
$t = microtime();
$a = getenv("REMOTE_ADDR");
$sessid = md5($t.$a);

$prevsessid = "";
$interrupted = 0;
if ($reindex == 1) {
$usesitemap = 0;
if (isset($_SESSION['prevsessid'])) {
$prevsessid = $_SESSION['prevsessid'];
}
$_SESSION['prevsessid'] = $sessid;
if ($prevsessid != "") {
$stmt = $db->prepare(
"SELECT COUNT(*) FROM ".$mysql_table_prefix
."temp where id = ? "
);
if ($stmt) {
$stmt->bind_param("i", $prevsessid);
$stmt->execute() or die("Execution failed: ".$stmt->error);
$result = $stmt->get_result();
$stmt->close();
} else {
trigger_error('Statement failed " '.$db->error, E_USER_ERROR);
}
if ($row = $result->fetch_array(MYSQLI_NUM)) {
$tmprowcnt = $row[0];
}
if ($tmprowcnt > 0) {
$interrupted = 1;
}
}
}

$urlparts = parse_url($url);
About line 885 (845 Lite), find:
$t = microtime();
$a = getenv("REMOTE_ADDR");
$sessid = md5($t.$a);

$prevsessid = "";
$interrupted = 0;
if ($reindex == 1) {
$usesitemap = 0;
if (isset($_SESSION['prevsessid'])) {
$prevsessid = $_SESSION['prevsessid'];
}
$_SESSION['prevsessid'] = $sessid;
if ($prevsessid != "") {
$stmt = $db->prepare(
"SELECT COUNT(*) FROM ".$mysql_table_prefix
."temp where id = ? "
);
if ($stmt) {
$stmt->bind_param("i", $prevsessid);
$stmt->execute() or die("Execution failed: ".$stmt->error);
$result = $stmt->get_result();
$stmt->close();
} else {
trigger_error('Statement failed " '.$db->error, E_USER_ERROR);
}
if ($row = $result->fetch_array(MYSQLI_NUM)) {
$tmprowcnt = $row[0];
}
if ($tmprowcnt > 0) {
$interrupted = 1;
}
}
}

$urlparts = parse_url($url);
Add code as such:
while (($level <= $maxlevel && $soption == 'level') || ($soption == 'full')) {
if ($pending == 1) {
$count = $pend_count;
$pending = 0;
} else {
$count = 0;
}

$links = array();

if ($interrupted == 1) {
pareTemp($sessid, $prevsessid);
}

$stmt = $db->prepare(
"SELECT DISTINCT link FROM ".$mysql_table_prefix
."temp WHERE level = ? AND id = ? ORDER BY link"
);
About line 920 (885 Lite), find:
while ($count < count($links)) {
$num++;
$thislink = $links[$count];
$urlparts = parse_url($thislink);
if (is_array($omit)) {
reset($omit);
}
$forbidden = 0;
if (is_array($omit)) {
foreach ($omit as $omiturl) {
Add code to appear as:
while ($count < count($links)) {
$num++;
$thislink = $links[$count];
$stmt = $db->prepare(
"INSERT INTO ".$mysql_table_prefix
."temp2 (link, id) Values ( ? , ? )"
);
if ($count > 0) {
if ($stmt) {
$stmt->bind_param("ss", $thislink, $sessid);
$stmt->execute() or die("Execution failed: ".$stmt->error);
$stmt->close();
} else {
trigger_error("Execution failed: ".$db->error, E_USER_ERROR);
}
}
$urlparts = parse_url($thislink);
if (is_array($omit)) {
reset($omit);
}
$forbidden = 0;
if (is_array($omit)) {
foreach ($omit as $omiturl) {
About line 1035 (1000 Lite), find:
$stmt = $db->prepare(
"DELETE FROM ".$mysql_table_prefix
."temp WHERE id = ? "
);
if ($stmt) {
$stmt->bind_param("s", $sessid);
$stmt->execute() or die("Execution failed: ".$stmt->error);
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
$stmt = $db->prepare(
"DELETE FROM "
.$mysql_table_prefix."pending WHERE site_id = ? "
);
Add code to appear as:
$stmt = $db->prepare(
"DELETE FROM ".$mysql_table_prefix
."temp WHERE id = ? "
);
if ($stmt) {
$stmt->bind_param("s", $sessid);
$stmt->execute() or die("Execution failed: ".$stmt->error);
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
$stmt = $db->prepare(
"DELETE FROM ".$mysql_table_prefix
."temp2 WHERE id = ? "
);
if ($stmt) {
$stmt->bind_param("s", $sessid);
$stmt->execute() or die("Execution failed: ".$stmt->error);
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
$stmt = $db->prepare(
"DELETE FROM "
.$mysql_table_prefix."pending WHERE site_id = ? "
);
About line 1070 (1030 Lite), find the function indexAll():
/**
* Function to initiate indexing of ALL sites
*
* @return void
*/
function indexAll()
{
global $mysql_table_prefix, $db, $Submit, $key;

if (!isset($Submit) || $Submit != $key) {
return;
}
$stmt = $db->prepare(
"SELECT url, spider_depth, required, disallowed, "
."usesitemap, can_leave_domain FROM "
.$mysql_table_prefix."sites"
);
if ($stmt) {
$stmt->execute() or die("Execution failed: ".$stmt->error);
$result = $stmt->get_result();
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
while ($row=$result->fetch_array(MYSQLI_NUM)) {
$url = $row[0];
$depth = $row[1];
$include = $row[2];
$not_include = $row[3];
$usesitemap = $row[4];
if ($sitemap=='') {
$usesitemap=0;
}
$can_leave_domain = $row[5];
if ($can_leave_domain=='') {
$can_leave_domain=0;
}
if ($depth == -1) {
$soption = 'full';
} else {
$soption = 'level';
}
indexSite(
$url, 1, $depth, $soption, $include, $not_include, $usesitemap,
$can_leave_domain
);
}
}
AFTER this function, add the new function pareTemp():
/**
* Function to initiate indexing of ALL sites
*
* @return void
*/
function indexAll()
{
global $mysql_table_prefix, $db, $Submit, $key;

if (!isset($Submit) || $Submit != $key) {
return;
}
$stmt = $db->prepare(
"SELECT url, spider_depth, required, disallowed, "
."usesitemap, ignore_robots, can_leave_domain, foreign_images FROM "
.$mysql_table_prefix."sites"
);
if ($stmt) {
$stmt->execute() or die("Execution failed: ".$stmt->error);
$result = $stmt->get_result();
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
while ($row=$result->fetch_array(MYSQLI_NUM)) {
$url = $row[0];
$depth = $row[1];
$include = $row[2];
$not_include = $row[3];
$usesitemap = $row[4];
if ($sitemap=='') {
$usesitemap=0;
}
$ignore_robots = $row[5];
if ($ignore_robots=='') {
$ignore_robots = 0;
}
$can_leave_domain = $row[6];
if ($can_leave_domain=='') {
$can_leave_domain=0;
}
if ($depth == -1) {
$soption = 'full';
} else {
$soption = 'level';
}
$foreignimgs = $row[7];
if ($foreignimgs=='') {
$foreignimgs = 0;
}
indexSite(
$url, 1, $depth, $soption, $include, $not_include, $usesitemap,
$ignore_robots, $can_leave_domain, $foreignimgs
);
}
}


/**
* Function to pare the temp table of url's already re-indexed
*
* @param string $sessid The id of the current session
* @param string $prevsessid The id of the interrupted session
*
* @return void
*/
function pareTemp($sessid, $prevsessid)
{
global $mysql_table_prefix, $db;

$stmt= $db->prepare("DELETE FROM ".$mysql_table_prefix."temp WHERE id <> ? ");
if ($stmt) {
$stmt->bind_param("s", $sessid);
$stmt->execute() or die("Execution failed: ".$db->error);
$stmt->close;
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
// Following block is troublesome
/* $stmt= $db->prepare("DELETE FROM ".$mysql_table_prefix
."temp2 WHERE id <> ? ");
if ($stmt) {
$stmt->bind_param("s", $prevsessid);
$stmt->execute() or die("Execution failed: ".$db->error);
$stmt->close;
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}*/
$stmt = $db->prepare("UPDATE".$mysql_table_prefix."temp2 SET id = ? ");
if ($stmt) {
$stmt->bind_param("s", $sessid);
$stmt->execute() or die("Execution failed: ".$db->error);
$stmt->close;
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
$stmt = $db->prepare("SELECT link, id FROM ".$mysql_table_prefix."temp2");
if ($stmt) {
$stmt->execute() or die("Execution failed: ".$db->error);
$result = $stmt->get_result();
$stmt->close;
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
while ($row = $result->fetch_array(MYSQLI_ASSOC)) {
$stmt = $db->prepare(
"DELETE FROM ".$mysql_table_prefix
."temp WHERE link = ? "
); //AND id = ? ");
if ($stmt) {
// Also troublesome
// $stmt->bind_param("ss", $row['link'], $row['id']);
$stmt->bind_param("s", $row['link']);
$stmt->execute() or die("Execution failed: ".$db_error);
$del = $stmt->affected_rows;
$stmt->close();
} else {
trigger_error('Statement failed : '.$db->error, E_USER_ERROR);
}
}
}
This will complete the modifications needed to restart an interrupted re-index run. Be aware that this restart MUST be done during the SAME session as the initial run, with no intervening actions. This may not always be possible, but does give you a chance.

After any indexing is complete, it is a good idea to go to Clean Table tab and clean both temp and temp2. Every time a re-index stalls, data is left behind. A restart should clear them, but only if the session fully complete a run. A session interrupt orphans those items
Post Reply