Page 1 of 1

Page contains less than 10 words. error in terminal

Posted: Fri Dec 17, 2021 7:53 pm
by kas
Searching through all the forums on the web and tried changing in spider settings to 0, still same error occurs infact it says 'Page contains less than 0 words' even after changing the settings in admin almost all websites same error showing up. Sphider intalled 5 times fresh completely still error:

localhost@website.com# php spider.php -u http://knowstartup.com/ -d 0

1. Retrieving: http://knowstartup.com/ at 08:45:26.

Size of page: 0.60kb. Starting indexing at 08:45:26.

Page contains less than 10 words.

Legit links found: 0. New links found: 0

Completed at 08:45:26.

Re: Page contains less than 10 words. error in terminal

Posted: Sun Dec 19, 2021 5:08 pm
by captquirk
I will investigate and get back to you.

Re: Page contains less than 10 words. error in terminal

Posted: Sun Dec 19, 2021 7:32 pm
by captquirk
I was unable to duplicate this problem. I indexed both from the admin form and from the command prompt.
With setting of 10 for "Required number of words in a page in order to be indexed" and 5 for "Minimum word length in order to be indexed", I obtain 17 keywords.
The page actually contain 18 words with a length of 5 or more, but the word "thank" is a common word and not indexed.

My only thought is that maybe you have the minimum word length set too high???

Re: Page contains less than 10 words. error in terminal

Posted: Sun Dec 19, 2021 8:53 pm
by kas
Hey My spider settings has 10 for Number of words for index and 5 for Minimum words to index as it is default untouched. Don't know where to check in for such errors almost out of 10 weblinks only 3 get index rest shows same error .

Re: Page contains less than 10 words. error in terminal

Posted: Mon Dec 20, 2021 12:07 am
by captquirk
Seems your setting are correct. That you CAN index SOME sites also tells me your installation is valid.

List a couple more URLs that are giving you the too few words message. I'll keep playing around on my end and maybe I'll finally see a clue.

Re: Page contains less than 10 words. error in terminal

Posted: Thu Jan 06, 2022 8:10 pm
by kas
Here is an simple another link tried to index via admin interface with max depth: 2 throws an Error again

Spidering https://www.startmeup.hk/
1. Retrieving: https://www.startmeup.hk/ at 14:04:24.
Size of page: 161.00kb. Starting indexing at 14:04:26. No-follow flag set. Page contains less than 10 words
Links found: 0. New links: 0
Completed at 14:04:26.

Re: Page contains less than 10 words. error in terminal

Posted: Sat Jan 15, 2022 4:11 pm
by captquirk
This page has content which is made up of primarily JavaScript and references to other content.
Sphider does not index JavaScript and the no-follow flag prevents following the references.

I will check further as to whether there is any content Sphider could (or should) index.

EDIT/UPDATE: I do get the No-follow message, which is legit. Thus, only one page gets indexed. Settings are for 10 words minimum, word size is 5.
I get 12 page (expected) and a total of 336 words.

Don't know why you get the "less than 10 words" error and I do not.

In spider.php, turn on error reporting by un-commenting line 34 and commenting out line 35 (lines 33 and 34 in the lite version). Clear the index and run again. Are there any messages in the php error log?