Cause spidering to pause after indexing a specified number of pages
Posted: Sat Mar 16, 2019 6:00 pm
If you wish to cause spidering to pause for a short time after indexing a certain number of pages, an easy edit can make this possible.
Let's say that after indexing 30 pages, you want Sphider to take a five minute break, then resume for another 30 pages.
Open spider.php.
Find "function indexUrl". Right after the globals at the top of the function, add this:
Further down in the same function, find:
Add a few lines:
Now, still in the same function, find:
Add one line:
And finally, a bit further down, find:
Add one line:
Adjust the number of pages and sleep period (in seconds) to suit your needs. Note that only pages actually indexed are counted.
Let's say that after indexing 30 pages, you want Sphider to take a five minute break, then resume for another 30 pages.
Open spider.php.
Find "function indexUrl". Right after the globals at the top of the function, add this:
Code: Select all
static $indexctr;
Code: Select all
if ($url_status['state'] == 'ok') {
$OKtoIndex = 1;
$file_read_error = 0;
if (time() - $delay_time < $min_delay) {
sleep($min_delay - (time() - $delay_time));
}
$delay_time = time();
$file = file_get_contents($url);
Code: Select all
if ($url_status['state'] == 'ok') {
$OKtoIndex = 1;
$file_read_error = 0;
if ($indexctr == 30) {
$indexctr = 0;
sleep(300);
}
if (time() - $delay_time < $min_delay) {
sleep($min_delay - (time() - $delay_time));
}
$delay_time = time();
$file = file_get_contents($url);
Code: Select all
printStandardReport('indexed', $command_line);
if ($index_images == 1) {
Code: Select all
printStandardReport('indexed', $command_line);
++$indexctr;
if ($index_images == 1) {
Code: Select all
printStandardReport('re-indexed', $command_line);
if ($index_images == 1) {https://www.forum.worldspaceflight.com/viewforum.php?f=4
Code: Select all
printStandardReport('re-indexed', $command_line);
++$indexctr;
if ($index_images == 1) {