- erd.png (246.79 KiB) Viewed 18625 times
This is no where near a proper ER, but it should give an idea of what's what.
The settings table stands on its own and controls the workings of the admin, spidering, and search functionality.
The sites table has the details of each site in the catalog. The site_id connects it to the images table and the links table. It also connects to the site_category table, which is also connected to the categories table. The images, links, and site_category table can all have multiple records using site_id. Site_category can also have multiple links to categories by way of the category_id.
The domains table derives it content from url column of the sites table, although there is no other direct relationship.
The keywords tables is a unique list of keywords derived from all the indexed links. It has no direct relationship to either links or sites. It is just that: a list of words. Each word in the table is hashed, and the final character in the hash (0-f) is used to determine which of the link_keywordX tables it should be referenced. A single keyword can occur multiple times in one of the 16 link_keywordX tables. For each occurrence in one of the 16 tables, a single keyword can be related to an individual link containing that word. The word is also associated with domain (subet of the site url) and assigned a weight. Notice, link_id is unique in the links table, but has many occurrences in the lin_keywordX tables. Keyword_id is unique in the keyword table, but may occur multiple times in one SPECIFIC link_keywordX table (determined by the hash). HOWEVER, within the link_keywordX tables, each keyword_id/link_id IS unique.
The query_log is another stand alone table and used to record queries made by users during searches.
The rss_sites and rss_links are independent also, rss_sites having a 1 to many relationship to rss_links.
The pending table is used during indexing to build a list of links found, and the temp table has information on the link currently being processed.
MySQL Workbench has a reverse engineering toll which is SUPPOSED to build a more proper ER diagram, but it fails to draw the actual relationships.
This may not be exactly what you were looking for, but hopefully will aid in understanding the relationships a little better.