Search found 4 matches
- Thu Sep 07, 2023 8:35 pm
- Forum: Sphider Help
- Topic: How to 'safely' access pdftotext from the http process
- Replies: 1
- Views: 6106
How to 'safely' access pdftotext from the http process
pdftotext is essential to Sphider. And there is a CLI using admin/spider.php, but I'm not yet familiar with all the commands I would need to replicate the web interface (e.g., to initiate reindexing). From the standard web interface to the Sphider/admin/ portal, I can initiate indexing, but it seems...
- Thu Sep 07, 2023 7:36 pm
- Forum: Sphider Help
- Topic: Robots.txt - would allow support be useful for many
- Replies: 6
- Views: 7313
Re: Robots.txt - would allow support be useful for many
Attachment for the prior message.
- Thu Sep 07, 2023 7:33 pm
- Forum: Sphider Help
- Topic: Robots.txt - would allow support be useful for many
- Replies: 6
- Views: 7313
Re: Robots.txt - would allow support be useful for many
Thanks for the follow up. But I'm not quite with you yet. For the robots.txt, I had exactly what you recommended, but it isn't working for me. This is what led me to look at the "checkRobotTxt($url)" function in spiderfuncs.php. I observed that it does not have a check for "allow:&quo...
- Wed Sep 06, 2023 5:45 pm
- Forum: Sphider Help
- Topic: Robots.txt - would allow support be useful for many
- Replies: 6
- Views: 7313
Robots.txt - would allow support be useful for many
I'm not sure the best way to handle this - I want to spider parts of my local server, while generally disallowing external robots. There is a second challenge in that I would prefer to the secure path like https://local.server.com/path/ Allow: With a bit of digging, I see that the code supports &quo...