Cause spidering to pause after indexing a specified number of pages

Post Reply
User avatar
captquirk
Site Admin
Posts: 97
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Cause spidering to pause after indexing a specified number of pages

Post by captquirk » Sat Mar 16, 2019 6:00 pm

If you wish to cause spidering to pause for a short time after indexing a certain number of pages, an easy edit can make this possible.

Let's say that after indexing 30 pages, you want Sphider to take a five minute break, then resume for another 30 pages.

Open spider.php.
Find "function indexUrl". Right after the globals at the top of the function, add this:

Code: Select all

static $indexctr;
Further down in the same function, find:

Code: Select all

    if ($url_status['state'] == 'ok') {
        $OKtoIndex = 1;
        $file_read_error = 0;
        if (time() - $delay_time < $min_delay) {
            sleep($min_delay - (time() - $delay_time));
        }
        $delay_time = time();
        $file = file_get_contents($url);
Add a few lines:

Code: Select all

    if ($url_status['state'] == 'ok') {
        $OKtoIndex = 1;
        $file_read_error = 0;
        if ($indexctr == 30) {
            $indexctr = 0;
            sleep(300);
        }
        if (time() - $delay_time < $min_delay) {
            sleep($min_delay - (time() - $delay_time));
        }
        $delay_time = time();
        $file = file_get_contents($url);
Now, still in the same function, find:

Code: Select all

                        printStandardReport('indexed', $command_line);
                        if ($index_images == 1) {
Add one line:

Code: Select all

                        printStandardReport('indexed', $command_line);
                        ++$indexctr;
                        if ($index_images == 1) {
And finally, a bit further dowqn, find:

Code: Select all

                        printStandardReport('re-indexed', $command_line);
                        if ($index_images == 1) {
Add one line:

Code: Select all

                        printStandardReport('re-indexed', $command_line);
                        ++$indexctr;
                        if ($index_images == 1) {
Adjust the number of pages and sleep period (in seconds) to suit your needs. Not that only pages actually indexed are counted.

Post Reply