Sphider-5.0.0 - Indexing Keyword issues

Come here for help or to post comments on Sphider
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Sphider-5.0.0 - Indexing Keyword issues

Post by captquirk »

Post a snapshot of your settings. I'll see if anything jumps out at me.

Try spidering with a sitemap.xml, also. In a lab environment, creating a usable sitemap may be a a challenge, but give it a try. If you do that, it isw best done with an index and not a re-index.
scorney
Posts: 8
Joined: Wed Jul 19, 2023 3:44 pm

Re: Sphider-5.0.0 - Indexing Keyword issues

Post by scorney »

Here you go! The settings.
Screenshot 2023-08-10 at 6.57.31 AM.png
Screenshot 2023-08-10 at 6.57.31 AM.png (109.8 KiB) Viewed 2818 times
Screenshot 2023-08-10 at 6.58.23 AM.png
Screenshot 2023-08-10 at 6.58.23 AM.png (115.22 KiB) Viewed 2819 times
Last edited by scorney on Thu Aug 10, 2023 11:05 am, edited 1 time in total.
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Sphider-5.0.0 - Indexing Keyword issues

Post by captquirk »

I am going to try a special scan from my end, focusing on a particular problematic page. If we can get those short words to index... fingers crossed!

Your settings look good. Only suggestion, which has nothing to do with indexing, is to un-check "Display the RSS search form " in the "Search settings" section. This will unclutter your search page.
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Sphider-5.0.0 - Indexing Keyword issues

Post by captquirk »

It appears the FULL pages are not being indexed, only the top portion.
Now, I have to figure out WHY!!!
scorney
Posts: 8
Joined: Wed Jul 19, 2023 3:44 pm

Re: Sphider-5.0.0 - Indexing Keyword issues

Post by scorney »

Were you able to find something about the indexing is done only the top portion ?
captquirk wrote: Thu Aug 10, 2023 6:29 pm It appears the FULL pages are not being indexed, only the top portion.
Now, I have to figure out WHY!!!
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Sphider-5.0.0 - Indexing Keyword issues

Post by captquirk »

At least for the acronyms page, yes! HTML error, which really surprised me because I didn't think it was that serious of an error! I can send you the details by private email if you wish. We can discuss in the message area.
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Sphider-5.0.0 - Indexing Keyword issues

Post by captquirk »

HEY!!! BIG update!
I believe the error is common to ALL the pages!!!

There is one line of code containing:
value='Click here to Search Page'
Modify this slightly to:
value='Click here to Search Page'><button>
Just a simple little oversight...
scorney
Posts: 8
Joined: Wed Jul 19, 2023 3:44 pm

Re: Sphider-5.0.0 - Indexing Keyword issues

Post by scorney »

I did the addition in each pages, but still same results in Keywords quantity

Image

Same procedure in removing index.html - Clear Site - Index ... no improvement sorry...

I'm curious to know how you realized that only the top was looked at... Is there a way I could monitor and debug it too ? a Tool ?
Thanks!
Attachments
Screenshot 2023-08-18 at 1.28.28 PM.png
Screenshot 2023-08-18 at 1.28.28 PM.png (22.33 KiB) Viewed 2778 times
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Sphider-5.0.0 - Indexing Keyword issues

Post by captquirk »

In the Sphider database, there is a table called "links". One of the columns in "links" is "fulltxt". This should contain all of the text on the page (link) with all tags and other extraneous data removed.I realized this column was extremely short on content!
Most likely, when tags were being stripped, the missing "><button>" caused massive stripping.
This morning I did look at the index.html and a couple other pages and found the same issue.

For my local test, I copied the code for your acronyms page, then modified the code to an acronyms2. I then created an local index.html which ONLY referenced the two variations of acronyms. I got 506 keywords, primarily from the acronyms2 page. The original had a very short fulltxt, the modified page much longer. Searches yielded results for "mmr", "tbd", "sbc", 'Tpm, and "olp" just as random examples.

Acronyms fulltxt:
Nokia NAM NI PM Guide - Acronyms & Definitions Photos Videos About Contact

Acronyms2 fulltxt:
Nokia NAM NI PM Guide - Acronyms & Definitions Photos Videos About Contact Common acronyms in PM's life Acronym DEFINITION A&A APPLICATIONS AND ANALYTICS ACP ANNUAL COMPENSATION PLANNING ALU ALCATEL-LUCENT AMS ADVANCED MOBILE NETWORK SOLUTIONS AP ANNUAL PLAN APJ ASIA PACIFIC AND JAPAN ASB ALCATEL SHANGHAI BELL ASBL AS SOLD BASE LINE -(FORMER IPIS) ATG AIR-TO-GROUND ATP ACCEPTANCE TEST PROCEDURE B/(W) BETTER/ WORSE B2B BOOKED TO BILL BAR BALANCED ACCOUNTS RECEIVABLES BB BACKBONE BBU BASEBAND UNIT Bell Labs BELL LABS (NO ABBREVIATION) BG BUSINESS GROUP BGDM BUSINESS GROUP DELIVERY MANAGER BGPM BUSINESS GROUP PROJECT MANAGER BoM BILL OF MATERIAL BoQ BILL OF QUANTITY BPM BUSINESS PRODUCT MANAGEMENT BPP BLUE PLANET PRODUCTION BTB BUSINESS TRANSFORMATION BOARD BW BUSINESS WAREHOUSE CALA CENTRAL AND LATIN AMERICA Cat M MACHINE-TO-MACHINE CAPABILITY FOR LTE CBT CUSTOMER BUSINESS TEAM CC COMMERCIAL COMMITTEE CCI CUSTOMER CRITICAL ISSUE PROCESS CCy COUNTRY CURRENCY CDE CUSTOMER DESIGN ENGINEERING (=TECHNICAL COST OF A CUSTOMER SPECIFIC PRODUCT DEVELOPMENT) CIP CONTRACT IMPLEMENTATION PROCESS cIPIS PRE-SALES P&L TOOL INSTANCE SHOWING THE CONSOLIDATED IPIS CLAC CONTRACT LOSS AT COMPLETION CMD CUSTOMER MASTER DATA CO CUSTOMER OPERATIONS CPLS CLOSING PROFIT AND LOSS STATEMENT CPRI COMMON PUBLIC RADIO INTERFACE CQ CURRENT QUARTER CSO CUSTOMER SALES ORDER CTO CHIEF TECHNICAL OFFICER CVC CONTRACT VARIABLE COST DAA DISTRIBUTED ACCESS ARCHITECTURE DD DEPLOYMENT DELIVERY DEE DEPLOYEMENT EXECUTION DoA DEAD ON ARRIVAL DQT DEPLOYMENT QUOTATION TOOL DSO DAYS SALES OUTSTANDING DtC DESIGN TO COST E2E END TO END ECPLS ESTIMATE AT COMPLETION PROFIT AND LOSS STATEMENT EMEA EUROPE MIDDLE EAST AFRICA EMS ELEMENT MANAGEMENT SYSTEM EPT / NTP EQUIPMENT / NETWORK PLANNING TOOL ETL EXTERNAL TEMPORARY LABOUR FAS FINANCIAL ACCOUNTING STANDARD (=ALU EXCHANGE RATE USED FOR ACCOUNTING PURPOSES) FBB FINANCE BACKBONE FI FEATURE INTEGRATION FN FIXED NETWORKS FO FIBER OPTICS FOC FIBER OPTICAL CHARACTERIZATION FOC FIXED OPERATING COSTS FP&A FINANCIAL PLANNING AND ANALYSIS FPA FINANCIAL PLANNING AND ANALYSIS FPO FIXED PRODUCTION OVERHEAD FY FISCAL YEAR GA GENERAL AVAILABILITY GAAP GENERAL ACCOUNTING ACCEPTANCES PRINCIPALS GM GROSS MARGIN GNE GLOBAL NETWORK ENGINEERING GNEIC GLOBAL NETWORK ENGINEERING AND INTEGRATION CENTER GNPI GLOBAL NEW PRODUCT INTRODUCTION GSC GLOBAL STANDARD COST GSLT GLOBAL SALES LEADERSHIP TEAM Gx/Ex SALES EXECUTE GATES HFM … FINANCIAL MANAGEMENT (=ALU FINANCIAL REPORTING TOOL) HLD HIGH LEVEL DESIGN HoS HEAD OF SALES HSE HEALTH SAFETY & ENVIRONMENT HW HARDWARE HWSC HARDWARE SUPPLY CHAIN I&C INSTALLATION & COMMISSIONING IC INVESTMENT COMMITTEE ICT INFORMATION COMMUNICATION TECHNOLOGY IEH INTEGRATION ENGINEERING HANDBOOK IFRS INTERNATIONAL FINANCIAL REPORTING STANDARDS IMS INTERNET PROTOCOL MULTIMEDIA SYSTEMS ION INTERNET PROTOCOL AND OPTICAL NETWORKS IP INTELLECTUAL PROPERTY IPIS INITIAL PROJECT INCOME STATEMENT (STRATEGY) IPP INTERNET PROTOCOL PRODUCTS IPR IP ROUTING IPR/T INTERNET PROTOCOL ROUTING/ TRANSPORT IPT IP TRANSPORT IRD INVENTORY ROTATION DAYS IT IB INFORMATION TECHNOLOGY INVESTMENT BOARD ITO INTEGRATION AND TRANSFORMATION OFFICE KAM KEY ACCOUNT MANAGER LCE "LOCAL CURRENCY EXCHANGE (=ALU BUDGET EXCHANGE RATE, ALSO CALLED “BUDGET RATE”)" Acronym Definition LE LATEST ESTIMATE LECA LAW ENFORCEMENT COMMUNICATIONS ASSISTANT LEP LATEST ESTIMATE PREVIEW LII LOGISTICAL INTEGRATED ITEM LLD LOW LEVEL DESIGN LoA LIMIT OF AUTHORITY LP LOCATION PLAN LRP LONG RANGE PLAN MBR MANAGEMENT BUSINESS REVIEW META "MIDDLE EAST, TURKEY AND AFRICA" MGCF MEDIA GATEWAY …? MMR MONTHLY MANAGEMENT REVIEW MN MOBILE NETWORKS MoM MODE OF OPERATION MoO MODE OF OPERATIONS MSA MASTER SERVICE AGREEMENT MSR MULTI-STANDARD RADIO MWT MICROWAVE TRANSPORT NBR NOKIA BUSINESS REVIEW NE NETWORK ELEMENT NET NETWORKS NLT NOKIA LEADERSHIP TEAM NLV NETWORK LEVEL VERIFICATION NMS NETWORK MANAGEMENT SYSTEM NOC NETWORK OPERATION CENTER NPI NEW PRODUCT INTRODUCTION NRE NON-RECURRING ENGINEERING ODM OFFICE DATA MANUAL OLCS ONLINE CUSTOMER SUPPORT OLP OPPORTUNITY LIFE CYCLE PROCESS OVC OTHER VARIABLE COST OWC OTHER WORKING CAPITAL Oxygen N/A P/N PART NUMBER PARD PROJECT ASSET ROTATION DAYS PC PERSONNEL COMMITTEE PCFS PROJECT CASH FLOW STATEMENT PDCA PLAN-DO-CHECK-ACT PLC PRODUCT LIFE CYCLE PLE PREVIOUS LATEST ESTIMATE PLP PRODUCT LIFECYCLE PLAN PMB PORTFOLIO MANAGEMENT BOARD PNF PHYSICAL NETWORK FUNCTION POE PROJECT EXECUTION OWNER PP PLAN PASS PPV PURCHASE PRICE VARIATION PRS PROFITABILITY REPORTING SYSTEM PS PROFESSIONAL SERVICES PSP PROJECT SELLING PRICE pts POINTS PY PERSON YEAR QBI QUICK BUSINESS INTELLIGENCE QIPP QUALITY IMPROVEMENT PROJECT PORTFOLIO QOP QUALITY OPERATION PRINCIPLE QTD QUARTER TO DATE R&O RISKS AND OPPORTUNITIES RA RESULT ANALYSIS RACI RESPONSIBILITY MATRIX RBC REGIONAL BUSINESS CONTROLLER / CENTER RES REPAIR & EXCHANGE SERVICES RF ROLLING FORECAST RM Tool RECEIVABLE MANAGEMENT TOOL ROM ROLL OUT MANAGER RPIS REFERENCE PROJECT INCOME STATEMENT (STRATEGY) REVISED RSMT RESTATEMENT RTU RIGHT TO USE (=FEE TO USE A SOFTWARE PACKET) SAE SYSTEM ARCHITECTUERE AND ENGINEERING SBC SESSION BORDER CONTROLLER SCN SUPPLY CHAIN SDM SUBSCRIBER DATA MANAGEMENT SDN SOFTWARE DEFINED NETWORKING SDP SERVICE DELIVERY PROCESS SFDC SALESFORCE.COM SHR STANDARD HOURLY RATE SI SALES ITEM SIOP SERVICE INFRASTRUCTURE OUTSIDE PLANT SIP SALES INCENTIVE PLAN SLA SERVICE LEVEL AGREEMENT SLI SINGLE LOGISTICAL ITEM SM SALES MARGIN SMI SALARY MERIT INCREASE SON SELF-ORGANIZING NETWORKS SOP STANDARD OPERATING PROCEDURE SOW SCOPE OF WORK SSO SINGLE SIGN ON ST SYSTEM TEST SVM STANDARD VARIABLE MARGIN TAS TELCO APPLICATION SERVICES TECH TECHNOLOGY TM TECHNICAL MANAGER TOM TARGET OPERATING MODEL TPM TECHNICAL PRODUCT MANAGEMENT UTAS UNIFIED TELCO APPLICATION SERVICES VC VALUE CAPTURE VOC VOICE OF CUSTOMER vRAN VIRTUAL RADIO ACCESS NODE WBS WORK BREAKDOWN STRUCTURE WLS WIRELESS XCOM EXECUTIVE COMMITTEE YE YEAR END YoY YEAR OVER YEAR Links: New Customer Setup Sharepoint Support Contact: PMO – NAM & LAT Daily Operations: Mr. Miss Happiness TBD Related Topics Supply Chain Supply Chain Quality Quality CARES CARES JPC JPC RDP JPC Delivery Delivery SAP-QTC-BPP SAP-QTC-BPP Engineering Engineering About Nokia NAM NI PM Guide Nokia NAM NI PM Guide. Our Links Advertise Support Our Company Contact Terms of Use Privacy Policy Copyright 2023 Nokia. All rights reserved. Designed by Nokia NI
User avatar
captquirk
Site Admin
Posts: 299
Joined: Sun Apr 09, 2017 8:49 pm
Location: Arizona, USA
Contact:

Re: Sphider-5.0.0 - Indexing Keyword issues

Post by captquirk »

I just did another scan. Poor results, just as you got. Focusing just on the Acronyms.hyml, I fopund this:
<input type=button onClick="parent.open('/sphider-5.1.0/search.php')" value='Click here to Search Page'
<button>
Still a very easy to miss error!
<input type=button onClick="parent.open('/sphider-5.1.0/search.php')" value='Click here to Search Page'>
<button>
A very easy to miss ">" to close out the "input" tag!!!

Do NOT be embarrassed!!! I can't tell you how many times I have missed such a tiny little detail!!!
Post Reply