Gossamer Forum
Home : Products : Gossamer Links : Development, Plugins and Globals :

Using INNER JOIN to duplicate CatLinks

Quote Reply
Using INNER JOIN to duplicate CatLinks
I need to adjust the INNER JOIN below so I can select links within a certain category.
I really want "at category and below", but will settle for in category for starts.

Here's my perfectly working Distance SELECT.
Code:
my $code = "
SELECT ".$DB->prefix."Links.*, o.*, ".$DB->prefix."ZCS_Countries.ZCS_Country,
(".$earth_radius." * (2 * ASIN(SQRT(
POWER(SIN(((z.ZCS_Latitude-o.ZCS_Latitude)*0.017453293)/2),2) +
COS(z.ZCS_Latitude*0.017453293) *
COS(o.ZCS_Latitude*0.017453293) *
POWER(SIN(((z.ZCS_Longitude-o.ZCS_Longitude)*0.017453293)/2),2)
)))) AS Distance

FROM ".$DB->prefix."ZipCodes z,
".$DB->prefix."ZipCodes o

INNER JOIN ".$DB->prefix."Links ON o.ZCS_ZipCode = ".$DB->prefix."Links.".$opts->{'links_zipcode_col'}."
INNER JOIN ".$DB->prefix."ZCS_Countries ON o.ZCS_CC = ".$DB->prefix."ZCS_Countries.ZCS_CC

WHERE z.ZCS_ZipCode = ".$query."
GROUP BY Distance SORT_ORDER, Distance having Distance <= ".$dist
;

$code =~ s/SORT_ORDER/$sort_order/;
my $zipdb = $DB->table('ZipCodes') || print $GT::SQL::error;
my $zip_sth = $zipdb->prepare($code) || print "Unable to PREPARE query - ".$GT::SQL::error;
$zip_sth->execute () || print "Unable to EXECUTE query - ".$GT::SQL::error;

I need help with the code colored red. I have simplified the JOIN's by removing the db prefix for easier reading.
Code:
my $code = "
SELECT Links.*, o.*, ZCS_Countries.ZCS_Country, CatLinks.CategoryID,
(".$earth_radius." * (2 * ASIN(SQRT(
POWER(SIN(((z.ZCS_Latitude-o.ZCS_Latitude)*0.017453293)/2),2) +
COS(z.ZCS_Latitude*0.017453293) *
COS(o.ZCS_Latitude*0.017453293) *
POWER(SIN(((z.ZCS_Longitude-o.ZCS_Longitude)*0.017453293)/2),2)
)))) AS Distance

FROM ZipCodes z,
ZipCodes o

INNER JOIN Links ON o.ZCS_ZipCode = Links.".$opts->{'links_zipcode_col'}."
INNER JOIN ZCS_Countries ON o.ZCS_CC = ZCS_Countries.ZCS_CC

INNER JOIN CatLinks ON Links.ID = CatLinks.LinkID

WHERE z.ZCS_ZipCode = ".$query." AND CatLinks.CategoryID = ".$catid."
GROUP BY Distance SORT_ORDER, Distance having Distance <= ".$dist
;

I am not sure if I add the AND CatLinks.CategoryID = ".$catid." to the WHERE clause, or if I just do the JOIN so that each link returned in the results has a catid joined to it. Then while looping through results just qualify for catid.

[edit]
I guess I should ask a question...
How do I imitate CatLinks via an INNER JOIN?

Hope it makes sense, also hope someone can help.

Thanks,
Chris
RGB World, Inc. - Software &amp; Web Development.
rgbworld.com

Last edited by:

rgbworld: Jul 6, 2006, 11:22 AM
Quote Reply
Re: [rgbworld] Using INNER JOIN to duplicate CatLinks In reply to
You would pick up significant speed by using the pre-generated tables like most of the programs/systems out there offer.

I've migrated my version to using one table, but adding a "distance" field to the record, so selects are on both zip code and distance.

If you try to recalculate each and every record each and every time, you have a far, far greater system load than just doing a simple select:

select * where ZIP = zip and DISTANCE = distance

You get ONE record returned from an indexed database, with a field "INCLUDED_ZIPS" or whatever.

In that record is a data-base normalization defying list of zipcodes (your list separator can be anything you want, a space is fine, but I think the ', ' turns it into an IN list automatically.

Then, you can just search on all records whose ZIP is IN (INCLUDED_ZIPS)

According to the MySQL manual, IN searches are blindingly fast.

I think you can even make this a single query, via SQL, but maybe not in the GT:: SQL engine using the most current MySQL, but separate searches are not horribly penalizing.

You calculate once, then search many times. You sacrifice a bit of disk/database space for the major per-search overhead. Unless you have a majorly overloaded system, you wouldn't need to cache results, though if you keep tabs on what searches are being done most often, and it's something you can cache..... though I'm not sure if in most situations that would actually end up saving anything - especially running GLinks under mod_perl


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] Using INNER JOIN to duplicate CatLinks In reply to
Thanks pugdog,

I did get the "At category and below" working.
I ended up using IN just like you have suggested, but for categories only.
When I tried to using a list of zips as "IN" while calculating distances, it was too slow.

So what ended up working for me was eliminating some of the INNER JOINs,
and having the distance calculation do ONLY the distance calc, and return just
the ID's of the links (I was returning the entire link record).

On a business listing directory with 150,000 links, I get the following results.
Searching within 20m of 60611 (chicago) returns 8000 matches, sorted by distance
in 8-10 seconds (no mod-perl). I am fairly happy with the results.

Here's my current code that seems to work very well. I do think I am getting better at this :-)
Code:
# Start with: Distance
my $sort_order = $args->{so}; # Add ASC or DESC: Distance ASC
$sort_order .= ','; # Add comma : Distance ASC,
$sort_order .= $args->{sb}; # Add other Columns: Distance ASC, Title DESC, Add_Date

my $earth_radius = "3957"; # in Miles
if ($unitcode eq "K") {
$earth_radius = $earth_radius * 1.609344; # in Kilometers (6378)
} elsif ($unitcode eq "N") {
$earth_radius = $earth_radius * 0.86897; # in Nautical Miles (3439)
}

my $code;

if ($catid) {

# This is just the short category name for display on template results
$catname = $categories_db->select( ['Name'], { ID => $catid } )->fetchrow;

# We want all links in catid and below. Found at GT forums under <%load_lower_links($ID)%>
my $catname_full = $categories_db->select( ['Full_Name'], { ID => $catid } )->fetchrow;
my $cond = GT::SQL::Condition->new(
Full_Name => '=' => "$catname_full",
Full_Name => 'LIKE' => "$catname_full/%"
);
$cond->bool('OR');
my $sth = $categories_db->select('ID', $cond ) || die $GT::SQL::error;

# Create a list of CategoryID (s) to use as IN below
my $results = $sth->fetchall_hashref('ID');
my @cats = map { $_->{ID} } @$results;
my $cat_list = join " ", @cats;
$cat_list =~ s/ /,/g;

$code = "
SELECT ".$DB->prefix."Links.ID,
(".$earth_radius." * (2 * ASIN(SQRT(
POWER(SIN(((z.ZCS_Latitude-o.ZCS_Latitude)*0.017453293)/2),2) +
COS(z.ZCS_Latitude*0.017453293) *
COS(o.ZCS_Latitude*0.017453293) *
POWER(SIN(((z.ZCS_Longitude-o.ZCS_Longitude)*0.017453293)/2),2)
)))) AS ZCS_Distance

FROM ".$DB->prefix."ZCS_Zips z,
".$DB->prefix."ZCS_Zips o

INNER JOIN ".$DB->prefix."Links ON o.ZCS_ZipCode = ".$DB->prefix."Links.".$opts->{'links_zipcode_col'}."
INNER JOIN ".$DB->prefix."CatLinks ON ".$DB->prefix."CatLinks.LinkID = ".$DB->prefix."Links.ID

WHERE z.ZCS_ZipCode = ".$query." AND ".$DB->prefix."CatLinks.CategoryID IN (".$cat_list.")".$and."
GROUP BY ZCS_Distance SORT_ORDER, ZCS_Distance having ZCS_Distance <= ".$dist
;
}
# If we are not limiting to categories, we can drop one of the INNER JOINs
else {

$code = "
SELECT ".$DB->prefix."Links.ID,
(".$earth_radius." * (2 * ASIN(SQRT(
POWER(SIN(((z.ZCS_Latitude-o.ZCS_Latitude)*0.017453293)/2),2) +
COS(z.ZCS_Latitude*0.017453293) *
COS(o.ZCS_Latitude*0.017453293) *
POWER(SIN(((z.ZCS_Longitude-o.ZCS_Longitude)*0.017453293)/2),2)
)))) AS ZCS_Distance

FROM ".$DB->prefix."ZCS_Zips z,
".$DB->prefix."ZCS_Zips o

INNER JOIN ".$DB->prefix."Links ON o.ZCS_ZipCode = ".$DB->prefix."Links.".$opts->{'links_zipcode_col'}."

WHERE z.ZCS_ZipCode = ".$query. $and."
GROUP BY ZCS_Distance SORT_ORDER, ZCS_Distance having ZCS_Distance <= ".$dist
;
}

$code =~ s/SORT_ORDER/$sort_order/;

my $offset = ($args->{nh} - 1) * $args->{mh};
my $sth = $links_db->prepare($code) || return { error => "Unable to PREPARE query - ".$GT::SQL::error };
$sth->execute () || return { error => "Unable to EXECUTE query - ".$GT::SQL::error };

while (my $row = $sth->fetchrow_hashref)
{
if($link_count >= $offset and ( $link_count < ($offset + $args->{mh})) ) {

# Grab the entire link and zipcode records. All I got from SELECT above is link id.
my $link = $links_db->select ('*', { ID => $row->{ID} })->fetchrow_hashref;
my $zip = $zipcode_db->select ('*', { ZCS_ZipCode => $link->{$opts->{'links_zipcode_col'}} })->fetchrow_hashref;

$link = Links::SiteHTML::tags('link', $link);

# ZCS_Location tag - Just a combination of City, State, CC and Zip
$link->{ZCS_Location} = $zip->{ZCS_City} . ", " . $zip->{ZCS_State} . " (" . $zip->{ZCS_CC} . ") " . $zip->{ZCS_ZipCode};

# Grab the full country name from it's code to use as ZCS_Country tag.
# Do this INSTEAD of the 2nd or 3rd INNER JOIN in the SELECT statement above.
# This is waaay faster than a 3 table join. This method is also used by GetLocationTags.
$link->{ZCS_Country} = $DB->table('ZCS_Countries')->select('ZCS_Country', { ZCS_CC => $zip->{ZCS_CC} })->fetchrow;

if ($row->{ZCS_Distance} == 0) {
$link->{ZCS_Distance} = 0;
}
else {
$link->{ZCS_Distance} = sprintf("%.2f", $row->{ZCS_Distance}); # Round distance to 2 decimal places
}
push (@link_results_loop, $link);
}
$link_count++;
}

I was able to keep my zipcodes/lat/long and all other city/state/country info in separate tables.
When cycling through the results, I then grab the entire link record, and city/state/country country
info, but only on the results that are being displayed.

If I knew how to cache results, it could get even faster.

What do you think of my solution/code?
It even provides for additional "and" conditions by simply adding
a Links field to the ZipCode search form.

Thanks,
Chris
RGB World, Inc. - Software &amp; Web Development.
rgbworld.com

Last edited by:

rgbworld: Jul 21, 2006, 11:20 AM
Quote Reply
Re: [pugdog] Using INNER JOIN to duplicate CatLinks In reply to
To reply to some of your suggestions
In Reply To:

You would pick up significant speed by using the pre-generated tables like most of the programs/systems out there offer.[/quote]
How would you know what to "pregenerate"? With 70,000 zipcodes, at distances of say 10,20,50,100 would amount to 240,000 pregenerated tables. Obviously I don't understand the idea Unsure

In Reply To:
I've migrated my version to using one table, but adding a "distance" field to the record, so selects are on both zip code and distance.[/quote]I've gone back and forth on the issue a dozen times, and IMHO, if you are duplicating information from one table to another (which I was doing), then it should be re-thought. I now use just a ZipCodes table and the Links table, and results are joined on the only common field, which is the ZipCode.

In Reply To:
If you try to recalculate each and every record each and every time, you have a far, far greater system load than just doing a simple select:

select * where ZIP = zip and DISTANCE = distance

You get ONE record returned from an indexed database, with a field "INCLUDED_ZIPS" or whatever.

In that record is a data-base normalization defying list of zipcodes (your list separator can be anything you want, a space is fine, but I think the ', ' turns it into an IN list automatically.

Then, you can just search on all records whose ZIP is IN (INCLUDED_ZIPS)[/quote]Well, of course if every Zipcode knew how far it was from every other zipcode a single select
like your example would be easy. How do you plan to update the list of
INCLUDED_ZIPS associated with
each zipcode

In Reply To:
According to the MySQL manual, IN searches are blindingly fast.[/quote]IN is fast, but not when comparing say 1000 zipcodes against 150,000 records each individually


In Reply To:
I think you can even make this a single query, via SQL, but maybe not in the GT:: SQL engine using the most current MySQL, but separate searches are not horribly penalizing.[/quote]I get my results from a single query by JOINing 2 tables. I expect that is the best way to
do it as there is no redundent data in either table.


In Reply To:
You calculate once, then search many times. You sacrifice a bit of disk/database space for the major per-search overhead. Unless you have a majorly overloaded system, you wouldn't need to cache results, though if you keep tabs on what searches are being done most often, and it's something you can cache..... though I'm not sure if in most situations that would actually end up saving anything - especially running GLinks under mod_perl[/quote]
I think that is what I am doing now, I will have to benchmark it under mod-perl to see if it is faster.

Thanks again for your discussion of the matter.

Chris

RGB World, Inc. - Software &amp; Web Development.
rgbworld.com