Gossamer Forum
Quote Reply
More spider questions

As you KNOW, I've been salivating for this spider for a long time. Here are my opening questions (hopefully this will be all of them prior to taking the plunge):

1) When I initially migrated from my previous engine, I imported from a flatfile the URL, Title and Description, even though that flatfile also contains Categories and Keyword meta data that I would have liked to import as well at that time. When I imported and reindexed, NG autoassigned ID numbers to each link. I now have 13,964 fully searchable links in the MAIN database even though categories have been assigned to only 509 of them. Searches point to the links being contained in the Home category even though Home is not browsable (Nor do I care to make it so). But it does beg the question, why can't I add to the Home category? Suppose someone wants to add a link to the general database that doesn't fit into a specific category. Importing from MySQLMan does it automatically. How can one do this manually? Would the spider be able to?

2) I have since added a Keyword column to the Links table, updated the def files and assigned it a weight for searching together with URL, Title and Description, even though Keywords will not be displayed on the search results page.

a) Can the spider gather Keyword meta data from pages into a separate column in the spider database?
b) Can the spider build from a pipe delimited text flatfile? or would it just use the URLs from that file to go out and spider those pages?
c) If it can build directly from a flatfile and it contains URL|Title|Category|Description|Keywords, can the spider make it fairly simple to add records to the main database complete with Categories and Keywords? (Remember, Category will often be blank so those records will need to associate with the Home category when transferring to the main database.) I tried this initially from MySQLMan but couldn't figure it out because Links and Catlinks are separate tables. Importing simultaneously to both or synchronizing them afterward is where I get lost. If there's a way to do it here I'd still like to know how but it seems to me the spider might handle it better.

3) Can the spider be used as the manual submission engine so that it prevalidates manual entries, leaving them in the spider database until validated and transferred to the main? I'm still using my old spider to prevalidate, dump to a flatfile and import later in MySQLMan. It's cumbersome at best, and it does not assign categories. I then have to go back in, search the URLs and assign categories manually. Ugh. Frown

Bottom line, do you think this spider is my ticket?

Mark Brasche
Quote Reply
Re: More spider questions In reply to

1. I don't think the spider would help you here. Links SQL is designed to be categorized. Currently there is no way for a user to submit a link to the spider, except through Links SQL (which would require placing it in a category). I don't think it would be too hard to add an interface so links could get submitted directly to the spider.

2a. Yes, the spider parses out the Title, Meta Description and Meta Keywords into separate columns.
2b. You can cut and paste a list of URL's (As long as you like) into a textarea box to seed the spider with links to search.
2c. I understand what you mean about Links SQL and importing, but not sure what you mean about the spider?

3. Not really, as there is no user interface for users to directly add to the spider, however this could get changed.

I'm not sure if it's the best fit, as I still don't understand your process exactly, and what you are trying to achieve in the end. =)



Gossamer Threads Inc.
Quote Reply
Re: More spider questions In reply to
In Reply To:
2c. I understand what you mean about Links SQL and importing, but not sure what you mean about the spider?
When gathering information from a page, the spider has no way of knowing what category to assign to that page. Therefore, by default the spider database probably has no "Category" column. Categories are assigned to records at the time they are transferred to the Links database, either individually or through a bulk operation.

I will want to add a category column to the spider database even though most of the time it will be blank. The times it will not be blank is when someone manually submits a link to my directory through a form that sends it to the spider and specifies the category in a form field at the time of submission. When transferring to the Links database, I want those records that have already been assigned a category to automatically be placed into that Links category during a bulk transfer operation; Spider database records that have not been assigned a category to automatically go into the Home directory, in the exact same fashion as it does when Importing records through MySQLMan.

I think the best way to understand what I want out of the spider is to try submitting some pages which you've authored to my index as it stands right now. Sites that pass the spider's rules dump to one flatfile; Sites that fail to another flatfile. The URL for my submission page is http://surfsafely.com/urladd.html

I'll gladly do the grunt work as long as I know it's not an impossibility, and who better to ask than the author? What do you think? Will this be doable with your spider?

Mark Brasche