Page 1 of 1

Cause spidering to pause after indexing a specified number of pages

Posted: Sat Mar 16, 2019 6:00 pm
by captquirk
If you wish to cause spidering to pause for a short time after indexing a certain number of pages, an easy edit can make this possible.

Let's say that after indexing 30 pages, you want Sphider to take a five minute break, then resume for another 30 pages.

Open spider.php.
Find "function indexUrl". Right after the globals at the top of the function, add this:

Code: Select all

static $indexctr;
Further down in the same function, find:

Code: Select all

    if ($url_status['state'] == 'ok') {
        $OKtoIndex = 1;
        $file_read_error = 0;
        if (time() - $delay_time < $min_delay) {
            sleep($min_delay - (time() - $delay_time));
        }
        $delay_time = time();
        $file = file_get_contents($url);
Add a few lines:

Code: Select all

    if ($url_status['state'] == 'ok') {
        $OKtoIndex = 1;
        $file_read_error = 0;
        if ($indexctr == 30) {
            $indexctr = 0;
            sleep(300);
        }
        if (time() - $delay_time < $min_delay) {
            sleep($min_delay - (time() - $delay_time));
        }
        $delay_time = time();
        $file = file_get_contents($url);
Now, still in the same function, find:

Code: Select all

                        printStandardReport('indexed', $command_line);
                        if ($index_images == 1) {
Add one line:

Code: Select all

                        printStandardReport('indexed', $command_line);
                        ++$indexctr;
                        if ($index_images == 1) {
And finally, a bit further down, find:

Code: Select all

                        printStandardReport('re-indexed', $command_line);
                        if ($index_images == 1) {https://www.forum.worldspaceflight.com/viewforum.php?f=4
Add one line:

Code: Select all

                        printStandardReport('re-indexed', $command_line);
                        ++$indexctr;
                        if ($index_images == 1) {
Adjust the number of pages and sleep period (in seconds) to suit your needs. Note that only pages actually indexed are counted.

Re: Cause spidering to pause after indexing a specified number of pages

Posted: Sat Oct 07, 2023 3:39 pm
by usabilitest
Perhaps this could be one of the settings.

Re: Cause spidering to pause after indexing a specified number of pages

Posted: Sat Oct 07, 2023 7:42 pm
by captquirk
That is a suggestion I will keep in mind. Thanks!