New user, new installation: remarks and suggestions

Come here for help or to post comments on Sphider
hapx
Posts: 11
Joined: Sun Jan 19, 2025 11:11 am

New user, new installation: remarks and suggestions

Post by hapx »

Hello you all,

I just discover and use Sphider. Its features and demo are impressive, so I just install and try. I have some remarks and suggestions:

- I get error 500 in admin.php.
The problem is due to fsockopen() not available. In my hardened server, this function is disabled. So I suggest to improve the check tool
https://www.sphider.worldspaceflight.co ... _check.zip
to add check for avalability of fsockopen().

Code: Select all

  PHP Fatal error: Uncaught Error: Call to undefined function fsockopen(l in /home/www/xxx/admin/spiderfuncs.php:147
  
- I get no error but the spider (index process) does nothing.
The problem is that I test on a virtual machine in https://, using a certificate that does not match the Common name of the domain:

Code: Select all

 PHP Warning: fsockopen(): Peer certificate CN=`www.example.com' did not match expected CN=`vmdebian.example.com' in /home/www/xxx/admin/spiderfuncs.php on line 147 
 PHP Warning: fsockopen(): Failed to enable crypto in /home/www/xxx/admin/spiderfuncs.php on line 147
 PHP Warning: fsockopen(): Unable to connect to ssl://vmdebian.example.com:443 (Unknown error) in /home/www/xxx/admin/spiderfuncs.php on line 147
 PHP Fatal error: Uncaught TypeError: fclose(): Argument #1 ($stream) must be of type resource, bool given in /home/www/xxx/admin/spiderfuncs.php:247
 
All the above problems are difficult to investigate, I have to manually change error_reporting() level from 0 to E_ALL to track them.


- I found a PHP fatal error message:

Code: Select all

 PHP Fatal error:  Uncaught mysqli_sql_exception: Data too long for column 'title' at row 1 in /home/www/xxx/admin/spider.php:576
Stack trace:
#0 /home/www/xxx/admin/spider.php(576): mysqli_stmt->execute()
#1 /home/www/xxx/admin/spider.php(1077): indexUrl()
#2 /home/www/xxx/admin/spider.php(217): indexSite()
#3 {main}
  thrown in /home/www/xxx/admin/spider.php on line 576
  
I assume that additional check on max length of data must be done before inserting in column 'title' ?

- Other suggestion : use in every PHP, at least during development/test phase, use error_reporting(E_ALL), and eliminate all PHP messages (warning, notice...) issued (use PHP error log). This would help strengthen the code in the future. Then even the error_reporting(E_ALL) could remain in production phase, and so such error 500 or other "should not occur" errors could be investigated easily by the user.
hapx
Posts: 11
Joined: Sun Jan 19, 2025 11:11 am

Re: New user, new installation: remarks and suggestions

Post by hapx »

Two other remarks:

1) The link https://www.sphider.worldspaceflight.com/docs.php presents the The user guide
https://www.sphider.worldspaceflight.co ... rGuide.pdf
This PDF has an error in its instruction of creating the database.
(cf. viewtopic.php?t=207 )
In creating the database, the correct collate sequence must be utf8mb4_general_ci.

The instructions in install.txt is correct though.


2) Other problem with install.txt:
It says:
H) index.php is the default search page.

=> probably search.php has to be renamed to index.php?
User avatar
captquirk
Site Admin
Posts: 321
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: New user, new installation: remarks and suggestions

Post by captquirk »

First off, thanks for the input!

Requirements_check.php has been updated to check for fsockopen(). Thanks for that suggestion.

Regarding the PHP fatal error (Data too long), this involves the links table. As it stands, the likely affected fields are:
url, which is 255 characters,
title, which is 200 characters,
description, which is 64Kb, and
fulltxt, which is 4Gb.
My guess would be either url or title. I have seen this error before where the cause was actually some data corruption from earlier in the process. But then, it IS possible for some site to have a very long url or an outrageous page description. Most users will likely never encounter such a situation, but if one does, the solution would to change the table links to accommodate such data.

Concerning the discrepancy between the User Guide PDF and the text in install.txt, the PDF is actually correct! The install.txt has been updated. Now, just to be clear, as now noted in the install.txt, utf8mb4_general_ai is perfectly acceptable. Utf8mb4_0900_ai_ci is a better collation than utf8mb4_general_ai, the vast majority of users will never know the difference. The differences wouldn't show unless you are indexing some exotic UTF8 characters! Nevertheless, thank you for pointing out the discrepancy.

As to the reference to "index.php"... HUH??? That's not right! It should be "search.php!"
I was wondering just when I put in "index.php...
I found that I had copied it over and over from Sphider 1.3.6 which was the last version from Ando Saabas, and I just never caught it! Thanks! At least somebody was paying attention.
hapx
Posts: 11
Joined: Sun Jan 19, 2025 11:11 am

Re: New user, new installation: remarks and suggestions

Post by hapx »

I went back to the forum after few days due to migration of server (change hoster options to get more disk space), since my first unifinished
indexing (maybe 10% completion) with Sphider already takes 5GB of data tables, causing 100% disk space occupation on my previous server :-( .

Now I continue with my remarks/suggestions:

- Syggestion: requirements_check.php: need to add check for "allow_url_fopen = On" .

- Suggestion: no need for requirements_check.php, use its code at beginning of install.php. If all the requirements are not satisfied, exit without continuing installation. This avoids user to hit installation problem, and without requiring downloading additional zip (and running the check first).

- Remark: Edit sites: if address (url) has spaces before https:// (user typo error!), parsing problem!

- Suggestion: need to sanitize input, e.g. limit size of data (e.g. title, description) to be inserted in tables according to the declared size of the columns.

- Suggestion: for /admin/admin.php and /admin/spider.php, the default should be:
error_reporting(E_ERROR | E_DEPRECATED); //Development only
There is no need for error_reporting(0) which hides all error in PHP/mySQL.
This allows quick investigation of problems like Server error 500, or like the "data too long" problem below.

- Suggestion: In "links" table, use same varchar(255) to title column as in all other tables, instead of varchar(200).

In my case, I got fatal error due to an URL with 196 characters in title. With UTF-8 multibytes encoding, this exceeds title varchar(200) causing PHP fatal error:

Code: Select all

#0001 [02-Feb-2025 17:00:03 Europe/Paris] PHP Fatal error: Uncaught mysqli_sql_exception: Data too long for column 'title' at row 1 in /home/XXX/admin/spider.php:576
#0002 Stack trace:
#0003 #0 /home/XXX/admin/spider.php(576): mysqli_stmt->execute()
#0004 #1 /home/XXX/admin/spider.php(1077): indexUrl()
#0005 #2 /home/XXX/admin/spider.php(217): indexSite()
#0006 #3 {main}
#0007 thrown in /home/XXX/admin/spider.php on line 576

This fatal error is not noticed initially due to error_reporting(0) set by default in spider.php.

- Suggestion: avoid multiple, redundant information to ease maintenance.
For mySQL tables, the most reliable is in code install.php, so no need for /sql/tables.sql (even if this could be used for mysql command line).
User should use only phpMyAdmin to create/maintain database and table.

- Suggestion: without correction to the "title varchar(200)" problem, the indexing problem cannot progress regardless of resuming indexation (because the spider will try always to index the same "faulty" URL), and with the default "error_reporting(0)" in spider.php, no indication for common user to understand the problem.
The temporary solution for me is to use phpMyAdmin to change title column to varchar(255) once the problem is understood.

- After upload to the server from the zip content (using FTP, SSH etc...), just do (assuming /home/XXX/ is your base directory):
chown -R www-data:www-data /home/XXX/admin/backup/
chown -R www-data:www-data /home/XXX/admin/log/
chown -R www-data:www-data /home/XXX/admin/reports/
chown -R www-data:www-data /home/XXX/admin/sitemaps/
chown -R www-data:www-data /home/XXX/admin/tmp/
chown -R www-data:www-data /home/XXX/tmp/
This seems better for security than doing chmod 777 for these directories.
You can still upgrade/update Sphider normally in the future (the above directories are for dynamic or temporary data, and would not be part of the zip package).
Note: www-data/www-data is the user/group of Apache2 under Debian Linux. Other Linux OSes have similar user/group.

- Remark: To follow spider.php indexing progression, I found that it is easier to use phpMyAdmin and check links table for number of rows,
which is updated in real-time by phpMyAdmin by just clicking on "links" table name at the left column of phpMyAdmin.
Note: the output window of sphider.php during indexing is buffered, and so the display is not in real-time (with delay).

- Suggestion: in addition to /admin/install.php deletion after installation, delete also the follwing files:
/changelog
/README_FIRST
/install.txt
/sql/
These files might leak information about Sphider version, tables name etc... for attackers (version vulnerability, SQL injection...).

That's all for now, certainly more questions (about accented characters problem I encountered) and maybe other remarks/suggestions in few days :-).

Thank you a lot for maintaining Sphider and its forum.
User avatar
captquirk
Site Admin
Posts: 321
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: New user, new installation: remarks and suggestions

Post by captquirk »

Once again. thanks for the input.

My observation is that you are not a user that needs help, but a user who can provide help to others! That's why I value your suggestions, even if may disagree. It at least makes me think!

Add a check for allow_url_ope3n to requirements_check.php: Will be considered.

Eliminate requirementsd_check.php and integrate it into install.php: Disagree. This script can help a user determine WHICH Sphider is best for them, and may even eliminate the need to create a database in the event Sphider just can't be run. A database has to excist to run install.php.

Issue with "Edit sites" if the url inadvertently has leading white space: Valid concern, easy fix. Just before line 615 in admin.php (570 in SphiderLite admin.php), add:
$url=trim($url);

This will be in the next release.

Delete error_reporting(0); : No. Skilled users such as you have no need for such a line. But people like you (and me) push the limits of Sphider. The majority of websites using Sphider do not encounter issues and won't get errors. Many users I have encountered simply ask for help when things go wrong and have no idea what the messages would mean or where to find the log. If they were to get a non-fatal error like a deprecation warning, they think the world has ended!

Increase the title column in the links table to 255: Reasonable. The size of 200 is a rather odd choice and is a holdover from the ORIGINAL Sphider. Of course, you know that somewhere out there you will find a web page with a 300 character title! :lol:

Eliminate sql/tables.sql: Not everyone has access to phpMyAdmin. Plus, I have a few users who are very hands on. Personally, tables.sql is just another pain in the arse to keep up to date and useless if the user wants to add a table prefix... But I also like happy users.

Changing ownership of the various sub-directories in the admin directory (and XXX/tmp): This is good guidance in that proper ownership is needed to gain read/write access. However, just what/who the owner should be varies! On my own Ubuntu/apache setup, the owner is "www-data" as you noted. However, the Debian/apache server I use for my personal sites requires my username to be the owner. I found a DIFFERENT owner needed in a CENTOS system.

Deletion of certain files after installation: Totally agree! I might also add that for security purposes, the "admin" folder should be password protected! Common_template and include as well as settings should be as well. DO NOT password protect templates or js_suggest.
(If you are skilled enough, move database.php from setting to one level about your DOCUMENT_ROOT! you need to modify the path in auth.php and spider.php. This really protects your database location/name/password!)

Glad other people have an interest in keeping Sphider going. At some point I'm going to have to find someone to continue it. Not ready to throw in the towel just yet, though!!!
hapx
Posts: 11
Joined: Sun Jan 19, 2025 11:11 am

Re: New user, new installation: remarks and suggestions

Post by hapx »

Thank you for your answer and support.

- I made some progress with Sphider. At first I was afraid of accented characters problem as mentioned by several users,
since in links table I see strange UTF-8 characters 'title' column (e.g. é representing French accented character "é"),
but in search page there is no problem. This is fantastic! Following is my configuration:

Code: Select all

character_set_client : utf8mb4
character_set_connection : utf8mb4
character_set_database : utf8mb4
character_set_filesystem : binary
character_set_results : utf8mb4
character_set_server : latin1
character_set_system : utf8mb3
collation_connection : utf8mb4_general_ci
collation_database : utf8mb4_unicode_ci
collation_server : latin1_swedish_ci 
- I am running an unfinished long indexing (more than 24 hours already).
Suggestion: at start of indexing, give the link to the corresponding newly created log (html), e.g.
https://www.example.com/XXX/admin/log/2502021712.html
This way, the user can click on that link an follow the indexing progression quickier and easier (just refresh the browser window, presing F5), with more recent data.
Of course, if the user has access to the server, he/she can also use 'tail' command to browse this html file.
Since it seems that after starting the indexing, the admin is no more available. If you ever click or start admin.php session, it seems to hang?
Other way I used: use phpMyAdmin and click on "links" table, the number of rows will be refreshed reflecting the indexing progression.

- I found out that even if the indexing is not yet finished, I can start using search.php to search, but not inside a new browser tab (cookies ?).
To make it works I have to use a new private browser window.

- Question: links table: could the number of rows be diminished during indexing? (saw with phpMyAdmin), while the number displayed by spider.php is still increasing.
Since at the beginning of indexing: links_nbrows > number_by_spider. Refer to screenshots.

- Remark: spider.php got error 404 due to truncation of URL having either spaces, accented chars, single quote... in its name
No problem for browser to handle these URL though: they seems to replace automatically e.g. space with %20, or accented characters with %XX (XX=some hex code).

- Problem: Search for "andre" : no display for "andré, or "andréa" etc... ! Note: stemming works well though. Is there a way to solve this?

- Question/Suggestion: Is there a possibility to ignore indexing content for specific HTML block tag like <section> ... </section>, <nav> ... </nav> blocks?

- Question: Is there a Donation button somewhere?
Attachments
sphider_2025-02-03_190635.jpg
sphider_2025-02-03_190635.jpg (30.97 KiB) Viewed 10034 times
sphider_2025-02-03_190517.jpg
sphider_2025-02-03_190517.jpg (12.96 KiB) Viewed 10034 times
User avatar
captquirk
Site Admin
Posts: 321
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: New user, new installation: remarks and suggestions

Post by captquirk »

Concerning accented characters, this has been a real struggle bus. You've heard of "Listen to what I mean, not what I say?"
When it comes to web content, modern browsers have advance enough that they do a very good job of rendering a page creators intent. Sphider isn't that smart and just does what it's told.
So when a creator declares a page to be utf8 encoded, and then procedures to use non-utf8 rendering of utf8 characters --- all bets are off!
The same is true in the reverse, when a page is declared to be, for example, latin-1, but then has a true utf8 character! Converting utf8 to utf8 produces some really weird stuff!
I have fiddled with trying to be sure that what Sphider ultimately tries to index is utf8 in the end, with both successes and failures. Not gonna claim it's perfect now, but the results are the most encouraging ever. But compensating for human error is --- well, difficult if not impossible without rewriting Sphider into the next Google.

As to indexing and searching at the same time...
While cookies are used to kee3p track of data, the real issue is --- the session! Using a private window, or even a different browser, creates a new session. Another workaround is to index from the command prompt, and search in a browser.

You mention indexing large sites. I don't know what Sphiders' limit is. I do know it is larger than I could have anticipated for such a simple tool! I have also found that interrupted indexing can be an issue. Some websites, never. Other websites, count on it :(
The most common issue is the 500 error. Buffering may be the problem, sometimes. But often it is a mystery. What I have found, and not all users have this ability, is to index from a command prompt. 500 errors, whatever the cause, are browser interruptions of some kind/cause/reason. The command prompt bypasses the browser.

When searching, the search is indeed literal! Yes, stemming works. Maybe try a wildcard (*) in place of an accented character?

Anyway, appreciate the feedback.
User avatar
captquirk
Site Admin
Posts: 321
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: New user, new installation: remarks and suggestions

Post by captquirk »

Question/Suggestion: Is there a possibility to ignore indexing content for specific HTML block tag like <section> ... </section>, <nav> ... </nav> blocks?
IF you have control of the page(s) in question, you can surround a block with <!--sphider_noindex--> and <!--/sphider_noindex-->.

Otherwise, I can't think of a way offhand. Sphider strips tags before indexing and the result is just text. I suppose Sphider could be altered to consider a specific set of tags (such as <section> and </section> to be the same as <!--sphider_noindex--> and <!--/sphider_noindex-->, thus ignoring the content between, but then that is going to ALWAYS apply. Maybe not what one would want.


Regarding donations - There is currently no such thing. I have thought about it, but for me Sphider is more of a hobby and not a business. I am retired and costs for Sphider are minimal --- mostly just my time.
hapx
Posts: 11
Joined: Sun Jan 19, 2025 11:11 am

Re: New user, new installation: remarks and suggestions

Post by hapx »

Thank you for your answers that helps me making big progress in using and understanding this excellent with great potential tool such as Sphider.

More Questions/Remarks/Suggestions:

- Remark: Information about spider error 404 with URI with special characters (space, accented characters...) mentioned in previous post but would be underlooked.
Example of URL/URI causing problem:

Code: Select all

https://www.example.com/facts-about-André-and-others.html
Example of entry in Web server log file (a.b.c.d is the IP):

Code: Select all

www.example.com:443 a.b.c.d - - [02/Feb/2025:05:30:13 +0100] "GET /facts-about-Andr HTTP/1.1" 404 35991 "-" "Sphider (sphidersearch.com)"

So Sphider stops analyzing the URI at first unorthodox character found, and so of course it gets error 404 from the server.
It appears that Sphider follows the old school way, and accepts only alphanumeric characters (upper case, lower case), and hyphen: [A_Za_z-0_9].
Although today modern browsers, bots, search bots and link checkers all accept and tolerate such URL, since I am also old school person too, I agree with the way Sphider does.
So I correct all the URL with errors revealed by Sphider: thank you Sphider! If your URI satisfy Sphider criterion, it will satisfy any other applications.

- Suggestion: add availability for exec() function for requirements_check.php:.
On my hardened server, the exec() function is disabled, so the tables backup gets blank page without any information nor message.
I have to look at the Web server access.log to discover that admin.php request got error 500, so I edit admin.php to switch to error_reporting(E_ERROR | E_DEPRECATED); //Development only
But the problem still remains without any message. So this time I have to edit all *.php and set them to development error_level.
This time I got the right message in PHP error log:
[05-Feb-2025 13:32:36 Europe/Paris] PHP Fatal error: Uncaught Error: Call to undefined function exec() in /home/XXX/admin/db_backup.php:166
So then the fix is easy, enable exec() function in php.ini.

This problem leads back to my suggestion (not approved) to use developement error_level(E_ERROR | E_DEPRECATED) as default, since:
E_ERROR and E_DEPRECATED should never be hit by user, due to your non-regression test before releasing a new version. So it is equivalent in practice to error_level(0).
- In case of "should not occur" or "unpredictable" errors like the above, the plain user will be helpless due to 0 message with error_level(0).
With developement error_level(E_ERROR | E_DEPRECATED), at least he/she can copy/paste the error message and ask for help. The plain user would not even be aware that there is server error 500 since he/she would not know or look at Web server acces.log. Without the message (hidden by error_level(0)), it would be very difficult to investigate and help the user (and he/she would also have to reproduce the problem again to get data messages if guided : waste of time).

The annoying manual editing of all *.php to development error_level (and reset them back later) leads me to the next suggestion :-) :

- Suggestion: single use error_reporting.php to be included (require_once) by all other *.php to ease change of error_reporting level.
Note: but this would break granularity of error_level() setting if the intention was to have different level for each php.
Content for error_reporting.php:

Code: Select all

//error_reporting(E_ERROR | E_DEPRECATED); //Development only
error_reporting(0);
- Tip: if the site uses Content Security Policy (CSP), to allow javascript (script-src) and styles (style-src) used by Sphider, verify that your CSP directives have:

Code: Select all

  script-src 'self' https://ajax.googleapis.com 
and
  style-src 'self' 'unsafe-inline'
Note: I have a site with no javascript, and only native stylesheet (.css): so I have a very strict CSP.
But in order to run Sphider, now I have to relax my CSP. So it raises my next question:


- Question: From Sphider main page:
"Sphider is a lightweight web spider and search engine written in PHP. It can be implemented using a MySQL (or MariaDB) database using the MySQLi and MySQLnd PHP components. PHP 8.x is supported.".
There is no mention of javascript, and it seems effectively that javascript is not mandatory, with the sacrifice of loosing the spelling suggestion/autocompletion capability ("Did you mean...").
Am I correct? Is it easy to get rid of javascript? How? Just by not invoking in the <script> tag? Or there is some interaction between javscript and PHP to adapt (in /js_suggest/) ?

I got some answers by myself by practicing, to be corrected:
Without javascript:
- admin.php: no more /admin/dbmain.js, used in Database tab of admin panel. So no more tables backup/restore, manual check case for table selection.
This is not a problem if you can use either phpMyAdmin or command line to invoke mysqldump.
- search.php: no more jquery.min.js from https://ajax.googleapis.com, /calendar/calendar.js, /js_suggest/autocomplete.js. The search still works but without suggestion/autocompletion.

- Remark: for command line for indexing:
I have just to be careful to run the requirements_check.php also in command line, since there is different php.ini in command line mode (cli) and Web mode (fpm).
/etc/php/8.2/cli/php.ini
/etc/php/8.2/fpm/php.ini

Code: Select all

php requirements_check.php
Fsockopen - CHECK!<br>Mysqlnd - CHECK!<br>PHP 7 or greater - CHECK!<br>Curl - CHECK!<br>Iconv - CHECK!<br>Mbstring - CHECK!<br>Imagick - CHECK!<br>(Imagick is not needed for Sphiderlite.)<br><br><strong>Congratulations! You can use either Sphider or Sphiderlite.</strong><br>
Big advantage for me: during legnthy indexation, I can play with search.php, and with admin.php (to change some settings)
or even to follow progession of indexing ("Sites" tab shows number of links and keywords), statistics on tables etc...

- Remark: always read the documentation (RTFM)
The SphiderUserGuide.pdf has this interesting information that solves one of my problem raised in my earlier post:

Code: Select all

"Ignoring parts of a page
Sphider includes an option to exclude parts of pages from being indexed. This can, for example,
be used to prevent search result flooding when certain keywords appear on certain part in most
pages (like a header, footer or a menu). Any part of a page between
<!--sphider_noindex--> and <!--/sphider_noindex--> tags is not indexed, however links in it are followed."

I discover this, and in the meantime I see that you already answer my question with post viewtopic.php?p=682#p682 . Thank you!

Other interesting information about search capability mentioned in the documentation:
Wildcard search (*)
"-" for negate search
AND/OR/Phrase search

- Remark: SphiderUserGuide.pdf, page 5/52: minor formatting error. Extra bullet introduced.

Code: Select all

  (bullet) Supports excluding words (by putting a '-' in front of a word, any page including that word
  (bullet) will be omitted from the results).
Should be:

Code: Select all

  (bullet) Supports excluding words (by putting a '-' in front of a word, any page including that word will be omitted from the results). 

- Remark: the mobile CSS might not be completely 100% responsive with dynamic width shrinking on desktop (cf. screenshot).
Sphider_2025-02-04_215123.jpg
Sphider_2025-02-04_215123.jpg (73.47 KiB) Viewed 6969 times
Sphider_2025-02-04_215248.jpg
Sphider_2025-02-04_215248.jpg (118.5 KiB) Viewed 6969 times
- Remark about security:
I would like to share some security tool and practices I use.
I recommend using the excellent and free "ConfigServer Security and Firewall" (csf/lfd) which is much better than fail2ban etc...
https://configserver.com/configserver-s ... -firewall/

/admin and /settings are .htaccess protected, but with csf/lfd if more than 3 failed login attempts (number customizable) the IP will be blocked,
and notification sent to you by e-mail.

To complement, I always insert a small PHP code snippet (front-end to admin.php for example) that sends an e-mail to notice first successful login (one e-mail per day per IP per successful login).
Same thing: the same code snippet is added e.g. to phpMyAdmin entry point. So although my phpMyAdmin has been protected by .htaccess too and notified by csf/lfd on failed attempts, I also receive e-mail notifying first sucessful login per day to phpMyAdmin.

The code snippet is really simple, it takes the IP of the visitor, and search in a text file is this IP is already logged for the day. If yes, nothing to do.
If no, appends the IP and the current date (YYYY-MM-DD) to the fext file, and sends an e-mail to the webmaster/technician.

- Remark: Sphider front-end/back-end
Thanks to your answer, I think I understand now that the admin.php and spider.php do not need to be run on the server being indexed.
This is great, since then I can run admin.php/spider.php on for example a virtual machine, without the restrictions on the target on production server,
that could be hardened without possibility to satisfy all requirements_check.php (fsockopen, all_url_fopen, exec...).
Afterwards, it just a matter to backup all tables and import to the target server, and upload all files and directories (except /admin) to the target server.

Note: we still need to protect /settings (with .htaccess) since they contain sensitive files: my.cnf and database.php.
This also leads to my 2 next suggestions :-) :

- Suggestion: get rid of my.cnf (content redundant with database.php) so the user will no need to fill it.
Sphider should and could be able to build on the fly the parameters from database.php and provide them to execute either "mysql" or "mysqldump" command line.

- Suggestion: to access database, search.php uses another file for example /settings/search_database.php with another mySQL user and password. This user has only strictly needed permissions (for example read only, no create database or table, no drop table...), in anycase much less permissions than the one declared in /settings/database.php, and only /settings/search_database.php will be uploaded. This ways the security would be a little better

- Question: I would like to use Sphider search.php as embedded inside existing HTML page. The provided templates (css and code) would be a good start for this.
It seems that it is easier to start from source code as seen by a browser (instead of starting from search.php source code), then
take just the <form> code then adapt to existing look and feel (css), without using existing css. Then after click on Search button,
adapt progressively the output according to existing look and feel. Is it the right way to do it?
User avatar
captquirk
Site Admin
Posts: 321
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: New user, new installation: remarks and suggestions

Post by captquirk »

If it's okay, I will answer in a couple of posts, with no particular order!
- Suggestion: get rid of my.cnf (content redundant with database.php)
The my.cnf file is specific to database backup and recovery. It has not always been present, and I agree it is duplicitous. A bit of history...
When I first started with Sphider, I found the backup/restore feature to be, at best, inoperative. I rewrote the process to model the process used by MySQLWorkbench. It work great! At least, it worked great until it didn't!
For a smallish (certainly not tiny) database, the process was reliable. But eventually I keep adding sites and data, and eventually the size got to a point that things started to go sideways. The backups still LOOKED like they were good, restore became a disaster! Good thing is that I can be a bit paranoid and made backups outside of Sphider. The issue was certainly size related. If Sphider was used to to index and search a single website for a business, the process was fine. But I found there definite3ly was a limitation. At some point, a restore could destroy the entire database.
After trail and error and research, I concluded that the best, fastest, and most reliable backup process lies within MySQL itself. And so that is what Sphider now uses. Since the backup/restore is strictly MySQL driven, and not MySQL within PHP, it is a separate process. And unfortunately, that separate process is VERY SEPARATE. MySQL doesn't know Sphider exists and can't use Sphider settings. Thus the need for a separate file, my.cnf, solely for the benefit of that process. If I could eliminate that file and keep a reliable backup/restore feature, I would do so in an instant.

Which leads me to the subject of exec(). I despise the use of exec()! I consider it to be a target for people with malicious intent. But, some things either need to be done outside of Sphider or not done at all. Indexing pdf files, doc, docx, xls and ppt are such things. And, thanks to me ( :oops: ), the backup/restore process.
- Suggestion: to access database, search.php uses another file for example /settings/search_database.php with another mySQL user and password.
Really not a bad idea at all! I may have considered this before and not acted for some reason? Certainly something to think about.... and maybe it will happen. (We'll see how ambitious this old guy is. :D )
- Question: I would like to use Sphider search.php as embedded inside existing HTML page.
I think you are probably on the right track. Building your own page and incorporating Sphider code as you go, or staying with Sphider and modifying to your own use are both valid approaches. Which one is best? I think that can best be answered only by the person/team doing the task. Whichever way bests fits your skill level and comfort level is the best approach. Personally, I'd probably lean your way.
Then it might also depend on just what you want to achieve. Do you want a truly unique and customized page for a specialty website? Or are the example templates close but not quite right? Then just a new css layout may be all that's needed.

404 errors. You are correct. Sphider is definitely old school. And so am I. Still, I may look into just how Sphider views a URL. MAYBE it should be modernized, at least to some extent. I am still dead set against the use of a space in a URL or file name! (Thanks, Microsoft! :evil: ) Let me look at it, but no promises on this one...

Sphider search appearance on mobile devices, phones in particular. Yeah. Definitely room for improvement here. And programming for mobile devices is not my strong suite. (The fact I can do anything on the web is a wonder! I'm and old school mainframe/mini programmer/developer. :D )

Maybe more tomorrow....
Post Reply