Page contains less than 10 words. error in terminal

Come here for help or to post comments on Sphider
Post Reply
kas
Posts: 10
Joined: Fri Dec 17, 2021 3:36 pm

Page contains less than 10 words. error in terminal

Post by kas »

Searching through all the forums on the web and tried changing in spider settings to 0, still same error occurs infact it says 'Page contains less than 0 words' even after changing the settings in admin almost all websites same error showing up. Sphider intalled 5 times fresh completely still error:

localhost@website.com# php spider.php -u http://knowstartup.com/ -d 0

1. Retrieving: http://knowstartup.com/ at 08:45:26.

Size of page: 0.60kb. Starting indexing at 08:45:26.

Page contains less than 10 words.

Legit links found: 0. New links found: 0

Completed at 08:45:26.
User avatar
captquirk
Site Admin
Posts: 188
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Page contains less than 10 words. error in terminal

Post by captquirk »

I will investigate and get back to you.
User avatar
captquirk
Site Admin
Posts: 188
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Page contains less than 10 words. error in terminal

Post by captquirk »

I was unable to duplicate this problem. I indexed both from the admin form and from the command prompt.
With setting of 10 for "Required number of words in a page in order to be indexed" and 5 for "Minimum word length in order to be indexed", I obtain 17 keywords.
The page actually contain 18 words with a length of 5 or more, but the word "thank" is a common word and not indexed.

My only thought is that maybe you have the minimum word length set too high???
kas
Posts: 10
Joined: Fri Dec 17, 2021 3:36 pm

Re: Page contains less than 10 words. error in terminal

Post by kas »

Hey My spider settings has 10 for Number of words for index and 5 for Minimum words to index as it is default untouched. Don't know where to check in for such errors almost out of 10 weblinks only 3 get index rest shows same error .
User avatar
captquirk
Site Admin
Posts: 188
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Page contains less than 10 words. error in terminal

Post by captquirk »

Seems your setting are correct. That you CAN index SOME sites also tells me your installation is valid.

List a couple more URLs that are giving you the too few words message. I'll keep playing around on my end and maybe I'll finally see a clue.
kas
Posts: 10
Joined: Fri Dec 17, 2021 3:36 pm

Re: Page contains less than 10 words. error in terminal

Post by kas »

Here is an simple another link tried to index via admin interface with max depth: 2 throws an Error again

Spidering https://www.startmeup.hk/
1. Retrieving: https://www.startmeup.hk/ at 14:04:24.
Size of page: 161.00kb. Starting indexing at 14:04:26. No-follow flag set. Page contains less than 10 words
Links found: 0. New links: 0
Completed at 14:04:26.
User avatar
captquirk
Site Admin
Posts: 188
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Page contains less than 10 words. error in terminal

Post by captquirk »

This page has content which is made up of primarily JavaScript and references to other content.
Sphider does not index JavaScript and the no-follow flag prevents following the references.

I will check further as to whether there is any content Sphider could (or should) index.

EDIT/UPDATE: I do get the No-follow message, which is legit. Thus, only one page gets indexed. Settings are for 10 words minimum, word size is 5.
I get 12 page (expected) and a total of 336 words.

Don't know why you get the "less than 10 words" error and I do not.

In spider.php, turn on error reporting by un-commenting line 34 and commenting out line 35 (lines 33 and 34 in the lite version). Clear the index and run again. Are there any messages in the php error log?
Post Reply