Search found 4 matches

by wiringmaze
Thu Sep 07, 2023 8:35 pm
Forum: Sphider Help
Topic: How to 'safely' access pdftotext from the http process
Replies: 1
Views: 5785

How to 'safely' access pdftotext from the http process

pdftotext is essential to Sphider. And there is a CLI using admin/spider.php, but I'm not yet familiar with all the commands I would need to replicate the web interface (e.g., to initiate reindexing). From the standard web interface to the Sphider/admin/ portal, I can initiate indexing, but it seems...
by wiringmaze
Thu Sep 07, 2023 7:36 pm
Forum: Sphider Help
Topic: Robots.txt - would allow support be useful for many
Replies: 6
Views: 6945

Re: Robots.txt - would allow support be useful for many

CheckRobot_Function.7z
Proposed improvement to checkRobotTxt function.
(1.89 KiB) Downloaded 768 times
Attachment for the prior message.
by wiringmaze
Thu Sep 07, 2023 7:33 pm
Forum: Sphider Help
Topic: Robots.txt - would allow support be useful for many
Replies: 6
Views: 6945

Re: Robots.txt - would allow support be useful for many

Thanks for the follow up. But I'm not quite with you yet. For the robots.txt, I had exactly what you recommended, but it isn't working for me. This is what led me to look at the "checkRobotTxt($url)" function in spiderfuncs.php. I observed that it does not have a check for "allow:&quo...
by wiringmaze
Wed Sep 06, 2023 5:45 pm
Forum: Sphider Help
Topic: Robots.txt - would allow support be useful for many
Replies: 6
Views: 6945

Robots.txt - would allow support be useful for many

I'm not sure the best way to handle this - I want to spider parts of my local server, while generally disallowing external robots. There is a second challenge in that I would prefer to the secure path like https://local.server.com/path/ Allow: With a bit of digging, I see that the code supports &quo...