index only the domain

Come here for help or to post comments on Sphider
Post Reply
chef-olaf
Posts: 13
Joined: Wed Dec 06, 2023 7:38 am

index only the domain

Post by chef-olaf »

hello and good afternoon,

once again i have the feeling that i have done something wrong.

1. in my database there are 585 domains but 718 sites

2. i only index via cli and cron job but for some domains only the domain itself is indexed - no subpages
it doesn't matter whether I use the -s switch or not
even with the php spider.php -all function, only the domain itself is indexed for a number of domains

what have i done wrong??

lg Olaf
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: index only the domain

Post by captquirk »

Is it possible on the Site's "Edit" page that either "Sphider can leave domain" or "Index foreign images" is checked?

Also, on the "Clean tables" tab, run "Clean domains". This will remove any domains which are not in active use. This will not clean any domains indexed because of the "Sphider can leave domain" setting, but will remove any that WERE but no longer are referenced in any of the links.

Now, it is also possible for sites to share a domain! For example mysite.com/pots and mysite.com/plants can be two separate sites (for indexing purposes) but share the domain mysite.com.

Periodically running the various utilities in "Clean tables" is never a bad idea. You may want to refrain from "Clean search log" unless you REALLY do want to. And with the number of sites you have, be aware that running "Clean keywords" is going to take awhile! I mean AWHILE! LOL! Very useful, but not something you want to do quickly! The others run pre3tty fast.
chef-olaf
Posts: 13
Joined: Wed Dec 06, 2023 7:38 am

Re: index only the domain

Post by chef-olaf »

Hallo,

ich glaube ich bin meinem Problem ein wenig näher gekommen!

Wenn ich bei der Domain clean site danach delete ausführe, danach add domain dieselbe domain einfüge wird diese ordnungsgemäß indiziert.

Kann ich in der Datenbank Tabelle sites irgend etwas ändern / löschen damit alle domains ordnungsgemäß indiziert werden?

Spider can leave domain ist ausgeschaltet
Index depth ist auf Full gestellt
Index Bilder ist ausgeschaltet
Clean tables wurde ausgeführt

lg Olaf
chef-olaf
Posts: 13
Joined: Wed Dec 06, 2023 7:38 am

Re: index only the domain

Post by chef-olaf »

Hello,

I think I have come a little closer to my problem!

If I execute clean site on the domain, then delete, then add domain the same domain is indexed correctly.

Can I change / delete anything in the database table sites so that all domains are indexed properly?

Spider can leave domain is switched off
Index depth is set to Full
Index images is switched off
Clean tables was executed

lg Olaf
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: index only the domain

Post by captquirk »

Trying to do anything to a individual table (mainly deletes) is difficult to impossible. Sphider tables are linked to each other using keys. For example, the sites table is linked to the links table by way of a key. The links table is attached to the link-keyword tables in the same way.
Only through the Sphider clean tables utilities can deletions be made. Changes other than deletions can be made to the sites table through the "Edit site" screen.

From an SQL prompt, one could do individual table deletions by first running a command to disable "foreign keys", but I recommend AGAINST this as it will just leave lost data in other tables and create a mess.

While time consuming, the method you have described is probably your best option. With so many sites indexed, that is going to be a lot of work!

Sphider was originally intended for a user to index his own personal site or sites. It was designed well enough by the original author to do that quite well and Sphider can be used to exceed the original purpose. I thought I was stretching it, but you have managed to index a LOT of sites!

Congratulations! This is amazing.
chef-olaf
Posts: 13
Joined: Wed Dec 06, 2023 7:38 am

Re: index only the domain

Post by chef-olaf »

Hello and good morning (in germany)

I have changed a few things and now the indexing runs (at the moment) without any problems.

Here are my steps
1. updated PHP from 8.2 to 8.3
2. run clean tables again
3. tested indexing but still not working properly
4. emptied the Domains table in the database
5. like 3
6. copied the complete admin directory of version 5.4 into my installation (5.5).
7. database repair and optimisation performed
8 restarted server

I don't know why - but now the indexing works perfectly again

Thanks for your work on the script and your help here in the forum
Post Reply