Gossamer Forum
Home : Products : Gossamer Links : Version 1.x :

Splitting off a tree from a database, etc

Quote Reply
Splitting off a tree from a database, etc
Right now, some people may want to prune their database, but it's not easy if you did something like import a DMOZ tree, then decide it was not in the best interests of your site.

For example:

http://girlfriends.com was set up with two trees imported /Society/Relationships and /Adult (more or less).

In operating the site (as purely an advertising vehicle) the obvious result is the "dating" links are the most popular, so how to kill the "adult" tree, or, specifically kill _JUST_ the links to the really offensive and 90% scam "image galleries" areas.

Here's the process:

It was easy to see that all the files I wanted to delete were in the categories that began with "General Adult Sites/Image Galleries".

Not a problem to find all the categories:

Select ID
from Category
WHERE Name RLIKE "General Adult Sites/Image Galleries"
That yielded a list of all the ID's that a link would have if it belonged to one of those categories.

The problem was how to get all the links, without entering 1300 different category numbers, or something equally absurd. I also complicated the problem by wanting to keep the categories, just empty them out and manually add back sites that had "quality" (Maybe?) later on.

I searched the MySQL site, and help was scattered. I played with the "Join" command, (Using the commands already in Links to attach the categories to the links) and came up with:

FROM Links as L, Category as C
WHERE C.Name RLIKE "General Adult Sites/Image Galleries" and L.CategoryID=C.ID
So as not to crash the browser, I limited this test to just the link ID number from the file to make sure it worked.

To see a dump of whether or not you got it working on your database, try this:

SELECT C.ID, C.Name, L.ID, L.Title
FROM Links as L, Category as C
WHERE C.Name RLIKE "General Adult Sites/Image Galleries" and L.CategoryID=C.ID
Limit 100
Or, you could send the output to a file, by adding the following line after the SELCT line above:

INTO OUTFILE '/some/file/path/with/write/access/files.txt'
Here's where I hit the problem. It was all well and good to DISPLY the links, now I wanted to DELTE them, but DELETe in MySQL does not support multiple tables, sub queries, etc. Everything I wanted to do to my table! <G> I looked for help, but found some commands that used Unix pipes, and such, and I was very, very leery of them. My database was more important than the risk.

--- not debuged or tested ---------
FROM Links, Category
WHERE tab1.col1 = tab2.col2;
WHERE Category.Name RLIKE "General Adult Sites/Image Galleries" and LInks.CategoryID=Category.ID

You can place this query in a script file and redirect input from it to the
mysql command-line interpreter, piping its output back to a second instance of
the interpreter:

prompt> mysql --skip-column-names mydb < myscript.sql | mysql mydb
--- not debuged or tested ---------
I thought about it for a while.

I was able to export the links above to a new TABLE, by copying the table structure to a new table (say Links_Images_Galleries) and then using the following command:

INSERT INTO Links_Images_Galleries (colum1, colum2, ...)
SELECT L.colum1, L.column2, etc....
FROM Links as L, Category as C
WHERE C.Name NOT RLIKE "General Adult Sites/Image Galleries" and L.CategoryID=C.ID
This worked, and would insert the _LINKS_ I wanted to delete into a new table. I also played with adding the C.Name into the file so that I had the category name, and ID if I needed to rebuild it. (Extraneous, but someone was going to ask!<G>)

So, I figured, if I could get the links I WANTED into a new table, what about putting the links I _DIDN'T_ want into a new table? Well, that was as simple as adding a "NOT" in front of the RLIKE and INSERTing them into a table called Links_2 (or whatever).

I counted records, and between the RLIKE and NOT RLIKE tables, I had the same number of records as the "Links" Table.

Cool! I was half way there!

I then renamed my Links table to Links_Old and renamed the Links_2 table to Links.

MAGIC! I had just removed 19,000 links from my database! (And kept them too!)
Nothing was touched, except I had a new Links table that had all the records I wanted, but none of the ones I didn't.

Remember -- I wanted to keep the categories -- I just wanted them empty. If you wanted to delete the categories as well, you would just enter a:

DELETE from Category
WHERE Name RLIKE "General Adult Sites/Image Galleries"
and they will disappear!

Ok. Now what?

1) Delete all your "Detailed" pages
2) Delete everything in the "General Adult Sites/Images Gallery" tree
3) RE-INDEX the site
4) REBUILD the site.

Everything is working, and should continue to work!

No ID's were changed.

You might end up with "Phantoms" in the CategoryRelations table (Alt-Links)

For my needs I just deleted everything in that table before re-indexing and rebuilding, but you might want to keep the rest of your alt-links, so you'll want to come up with an automated way of doing it.

I can't post that here (since I didn't do it) but two things:

1) you can use the join to pick out the records that don't have matching entries
2) The new version of Links SQL will have a new way of handling alt-links

But, with a little study of the way the implicit "join" works, and how you can pick out records on two tables, you can do some heavy duty maintennence on your site.

So far I haven't seen any problems, but it's only a test site, so I could be missing something.

Good luck!!! :)

FAQ: http://www.postcards.com/FAQ/LinkSQL/