Thanks for the follow up. But I'm not quite with you yet.
For the robots.txt, I had exactly what you recommended, but it isn't working for me. This is what led me to look at the "checkRobotTxt($url)" function in spiderfuncs.php. I observed that it does not have a check for "allow:", only disallow.
I downloaded the latest version to make sure. I then sprinked in a few diagnostic print statements. It reads the robots.txt:
After these lines, it will insert "/" into the $omit array.
Then it reads these -
User-agent: Sphider (sphidersearch.com)
Allow: /
This one does not add anything to the omit array, but that array is already 'tainted' from the first set.
Then it returns the omit array with What I think it needs to do is remove the "/" from the $omit array.
I think I understand it better. It has to 'return null' to permit indexing. But it can only get there if the preg_match is true, but it is never true for the allow record.
There may be a better way, but I've attached my modified version of that function.
For the https, I'll turn to that as a separate item - I should not have combined two things into one thread.