admin.php error 500

Come here for help or to post comments on Sphider
Crash
Posts: 8
Joined: Sat Jul 09, 2022 9:24 pm

Re: admin.php error 500

Post by Crash »

That's great info, thanks for that.

It does create a limitation in that, if you have a page responding to a querystring that no longer pulls up a valid record, but the page is still there to handle it (ie. you could remove the link but the page handling the querystring is still there, so no 404 error), then that doesn't get cleaned up.

The "clear site" option under "indexing" is very useful to get around this.
It seems to allow Sphider to start from scratch each time, which is exactly what I like.
I don't suppose there's a way of automating that, or copying that function out to a separate file to run a cron against it?

if you run the spider.php with "-all", does it do the same as the "Reindex all sites" option in the admin GUI?
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: admin.php error 500

Post by captquirk »

Yes, -all should reindex EVERYTHING!

As to creating a separate process to clear all INDEX data from the database wouldn't really be that hard to create. Essentially, it would just be the "Clear site" code expanded to to cover all sites. You don't want to clear the entire database because then you would have to redo ALL the settings and reenter all the site data.

Do note that clearing a site clears all links and all link-keyword relationships. It does NOT clear keywords, since any given word can occur on more than one site. But the Clean Keywords function clears up any orphaned words.

As to an previous question about links found on each page...
In spider.php, line 408(?) you see:
$links = distinctArray($links);
If immediately after you added a line:
print_r($links);
in the output (on screen, not the log file) you will get a crude but accurate list of all links found on that page. A little bit of PHP coding could make an even more readable list A bit more work and you could get that into the log file, and a bit more work beyond that you could get a separate links report! As I said, I am not inclined to be creating any new features for Sphider, but something like this IS tempting!
Crash
Posts: 8
Joined: Sat Jul 09, 2022 9:24 pm

Re: admin.php error 500

Post by Crash »

That's very helpful - perfect.
Ultimately, it doesn't need to flush the site contents every time.
If it finds a page that's removed, then the site will present its own error page of some sort, even if it's not triggering an actual 404.
That's not likely to show up in the search results then, anyway.

I'm doing something stupid with my cron job. I wondered if you could spot anything wrong.
I have the following set to run at the top of every hour (0 * * * *)

/usr/local/bin/php /home/jessicaa/public_html/xbomber.co.uk/sphider/admin/spider.php -all

I'm certain that the paths are right.
I had another search spider that was triggering fine off the cron job, but that spider.php ran without it taking any arguments.
Do the flags need to be encapsulated in quotes or similar?

To check the cron logs needs root access to the server: not just cPanel but I can always ask the hosts to take a look at it.
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: admin.php error 500

Post by captquirk »

What I do for cron jobs is create a shell command, then run the shell command in the cron. Example for 'spider.sh":
#!/bin/bash
cd full-path-to-sphider/admin
php spider.php -all
The the cron would be 0 * * * * full-path-to/spider.sh

Getting back to your deleted pages, if the page still exists but is just a empty placeholder, no 404 will be created. But if the page is empty, it won't have any relevant content either and as you surmise, should never be in a search result. There ARE actions you can take that will remove the page without clearing any tables.

1. Add the page as a no-go in robots.txt. Sphider will see that and drop the page.
2. Add the page to the sites "Must not include" section. Sphider will drop the page.
3. Put a "do not index" meta tag on the shell page. Spider will drop the page.
4. In the site settings, go to "Browse pages". You can use the filter to find that specific page, then just hit "Delete". If there is no content (or not enough content) it will not be re-indexed. (Page contains fewer than X words).
Crash
Posts: 8
Joined: Sat Jul 09, 2022 9:24 pm

Re: admin.php error 500

Post by Crash »

This is great. I'm really sorted now.

Thanks very much for all your time on this,

This has been a great opportunity as well to optimise the site that's being indexed as well. The spiders don't like spaces in URLs since the server passes a soft 301 error which only a browser can bypass onto the "%20" version of the URL.
The site uses a neat script called SLIR to resize images on the fly and it runs incredibly fast on pages with 100+ images, now that I tightened up the site's code to only ever pass the user to a path with spaces replaced with "%20", presumably because it's saving SLIR from going through the redirect process.

It's also been an opportunity to tidy up error handling and slap "noindex" meta tags on all the error pages, including ones for 'errors' handled at the PHP level, rather than the server level.

The indexing reports also help you pin down bogus/broken links all over the site.

The other opportunity was to try and get rid of as many query strings so as to reduce the number of duplicate URLs and send more variables between pages using sessions.

This is a great project and it's perfect for anyone with a site that's not an out-of-the-box script.

I wish I'd found this sooner.
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: admin.php error 500

Post by captquirk »

Check out the NEW Sphider 5.0.0! It can generate a links report!
Crash
Posts: 8
Joined: Sat Jul 09, 2022 9:24 pm

Re: admin.php error 500

Post by Crash »

Will do, thanks!
Post Reply