Gossamer Forum
Quote Reply
Duplicates
Hi

I want to delete duplicates - 100 at a time.

I have changed the admin.cgi and 100 display.

I would like the checkbox of the highest value to be checked automatically - so I then just click the delete button.

Below is the code from Admin_HTML.pm, Anyone any ideas on what to change. Say the highest ID of the duplicated links.
sub html_check_duplicates {
# --------------------------------------------------------
# This routine checks to see if there are any duplicate links.
#
my ($in, $db, $duplicates) = @_;
my ($footer, $set, $link, $id, $title, $cat, $key);

$footer = &html_footer ($in, $db);
print qq|
<html>
<head>
<title>$TITLE: Check Duplicate Links.</title>
</head>

<body bgcolor="#DDDDDD">
<form action="$SCRIPT_URL" METHOD="POST">
<input type=hidden name="db" value="$DATABASE">
<input type=hidden name="do" value="delete_records">

<table border=1 bgcolor="#FFFFFF" cellpadding=5 cellspacing=3 width=500 valign=top>
<tr><td colspan=2 bgcolor="navy">
<FONT FACE="MS Sans Serif, arial,helvetica" size=1 COLOR="#FFFFFF">
<b>$TITLE: Check Duplicate Links</b>
</td></tr>
</table>

<blockquote><p><$FONT>
|;
if (keys %$duplicates == 0) {
print qq~ No duplicates found.
<p>
<table border=0 bgcolor="#DDDDDD" cellpadding=5 cellspacing=3 width=500 valign=top>
<tr><td> $footer</td></tr>
</table>
</blockquote>
~;
}
else {
print qq~
<p>To delete the offending links, click on the checkboxes and click delete:</p>
<TABLE BORDER=1>
~;
$key = $db->{'db_key'};
my $count = 0;
foreach $set (keys %$duplicates) {
print qq~<tr><td colspan=2><$FONT><b>$set</b></font></td></tr>
<tr><td> </td>
<td>
~;
foreach $link (@{${$duplicates}{$set}}) {
$link = $db->array_to_hash ($link);
$id = $link->{$db->{'db_key'}};
$title = $link->{'Title'};
$cat = &get_category_name ($link->{'CategoryID'});
print qq~<input type=checkbox name="delete" value="$id"> <$FONT>(<a href="$SCRIPT_URL?db=Links&do=view_records&$key=$id&ww=1" target="_blank">$id</a> ) $title in <em>$cat</em><br>~;
}
print qq~</td></tr>~;
$count++;
}
my $offset = $in->param('offset') &#0124; &#0124; 0;
print "<tr><td colspan=2 align=center>";
($count >= 9) and print " <$FONT><b><a href='$SCRIPT_URL?db=Links&do=check_duplicates&offset=", ($offset+10), "'>Next 10</a></b></FONT> ";
($count >= 9) and print " <$FONT><b><a href='$SCRIPT_URL?db=Links&do=check_duplicates&offset=", ($offset-10 > 0) ? ($offset-10) : 0, "'>Prev 10</a></b></FONT> ";
print qq~
</td></tr>
</table>
<p>
</blockquote>
<table border=0 bgcolor="#DDDDDD" cellpadding=5 cellspacing=3 width=500 valign=top>
<tr><td>
<p><center><INPUT TYPE="SUBMIT" VALUE="Delete Checked $OBJECT(s)"> <INPUT TYPE="RESET" VALUE="Reset Form"></center></p>
$footer
</td></tr>
</table>
~;
}
print qq~
</body>
</html>
~;
}

Cheers

Tony
PS Or is there a quicker way from SQL Monitor

[This message has been edited by chilli (edited January 15, 2000).]
Quote Reply
Re: Duplicates In reply to
the query is actually in the routine that points to this routine..

sub check_duplicates

inside of admin.cgi

Code:
$query = qq!
SELECT URL, COUNT(*) as hits
FROM Links
GROUP BY URL
HAVING hits > 1
ORDER BY hits DESC
LIMIT $offset, 10
!;

change the LIMIT $offset, 10 to LIMIT $offset, 100

but 100 is a lot and the page will load slow..

this is a test signature Smile

------------------
Jerry Su
Links SQL Licensed
Quote Reply
Re: Duplicates In reply to
Hi widgetz

I had already changed the admin.cgi to 100 to display 100 duplicates - what I am unable to do is the hard bit:

i.e when it displays the 100, display the duplicate tick boxes as ticked where the ID of the duplicate is higher.

So for example if you choose 2 URL's from the 100 diplayed and the ID's are as follows:

5264 http://www.hotchilli.co.uk
5966 http://www.hotchilli.co.uk

It has the ticked box ticked for 5966 and I just click the delete button to delete the duplicate here.

The benefit is when you have a large number of duplicates, it displays 100 records and say 50 are duplicates as pairs - then I do not have to do the time consuming bit of ticking 50 boxes.

The piece of code is I think part of the Admin_HTML.pm above which dislays the duplicates and then lets you delete them by ticking the boxes:

Snippit

print qq~<input type=checkbox name="delete" value="$id"> <$FONT>(<a href="$SCRIPT_URL?

The problem here is it just loops over the URL's and the output from the SQL groups the duplicates - so I need to change this so that the dupliacted URL with the highest ID from the pair has the box ticked already.

The 100 URL's are displayed and 50 ticked boxes are ticked.

Any ideas

Cheers

Tony

[This message has been edited by chilli (edited January 15, 2000).]
Quote Reply
Re: Duplicates In reply to
oh... i get you Smile

change

Code:
foreach $link (@{${$duplicates}{$set}}) {
$link = $db->array_to_hash ($link);
$id = $link->{$db->{'db_key'}};
$title = $link->{'Title'};
$cat = &get_category_name ($link->{'CategoryID'});
print qq~<input type=checkbox name="delete" value="$id"> <$FONT>(<a
href="$SCRIPT_URL?db=Links&do=view_records&$key=$id&ww=1" target="_blank">$id</a> ) $title in
<em>$cat</em><br>~;
}

to

Code:
for (my $i = 0; $i < $#{${$duplicates}{$set}}; $i++) {
$link = $db->array_to_hash (${${$duplicates}{$set}}[$i]);
$id = $link->{$db->{'db_key'}};
$title = $link->{'Title'};
$cat = &get_category_name ($link->{'CategoryID'});
($i == 0) ?
(print qq~<input type=checkbox name="delete" value="$id">~) :
(print qq~<input type=checkbox name="delete" value="$id" checked>~);
print qq~ <$FONT>(<a href="$SCRIPT_URL?db=Links&do=view_records&$key=$id&ww=1" target="_blank">$id</a> ) $title in <em>$cat</em><br>~;
}

jerry
Quote Reply
Re: Duplicates In reply to
Hi Jerry

I just tried that, instead of 100 links returning only 50 displayed. None of the check boxes are ticked, I guess it's displaying the unticked 50 and not the ticked 50.

Here is the code I used as suggested:
(I just commented out the code to be replaced)


print qq~
<p>To delete the offending links, click on the checkboxes and click delete:</p>
<TABLE BORDER=1>
~;
$key = $db->{'db_key'};
my $count = 0;
foreach $set (keys %$duplicates) {
print qq~<tr><td colspan=2><$FONT><b>$set</b></font></td></tr>
<tr><td>   </td>
<td>
~;
for (my $i = 0; $i < $#{${$duplicates}{$set}}; $i++) {
$link = $db->array_to_hash (${${$duplicates}{$set}}[$i]);
$id = $link->{$db->{'db_key'}};
$title = $link->{'Title'};
$cat = &get_category_name ($link->{'CategoryID'});
($i == 0) ?
(print qq~<input type=checkbox name="delete" value="$id">~) :
(print qq~<input type=checkbox name="delete" value="$id" checked>~);
print qq~ <$FONT>(<a href="$SCRIPT_URL?db=Links&do=view_records&$key=$id&ww=1" target="_blank">$id</a> ) $title in <em>$cat</em><br>~;
}

#foreach $link (@{${$duplicates}{$set}}) {
# $link = $db->array_to_hash ($link);
# $id = $link->{$db->{'db_key'}};
# $title = $link->{'Title'};
# $cat = &get_category_name ($link->{'CategoryID'});
# print qq~<input type=checkbox name="delete" value="$id"> <$FONT>(<a href="$SCRIPT_URL?db=Links&do=view_records&$key=$id&ww=1" target="_blank">$id</a> ) $title in <em>$cat</em><br>~;
# }
print qq~</td></tr>~;
$count++;
}
my $offset = $in->param('offset') | | 0;
print "<tr><td colspan=2 align=center>";

Many Thanks

Tony


[This message has been edited by chilli (edited January 16, 2000).]
Quote Reply
Re: Duplicates In reply to
Hi,

This worked for me. Edit Admin_HTML.pm and change:

Code:
foreach $link (@{${$duplicates}{$set}}) {
$link = $db->array_to_hash ($link);
$id = $link->{$db->{'db_key'}};
$title = $link->{'Title'};
$cat = &get_category_name ($link->{'CategoryID'});
print qq~<input type=checkbox name="delete" value="$id"> <$FONT>(<a href="$SCRIPT_URL?db=Links&do=view_records&$key=$id&ww=1" target="_blank">$id</a> ) $title in <em>$cat</em><br>~;
}


to:

Code:
my $checked = 1;
my @link_set = @{${$duplicates}{$set}};
my $key_pos = $db->position ($db->{'db_key'}) - 1;
foreach $link (sort { $b->[$key_pos] <=> $a->[$key_pos] } @link_set) {
($checked == 1) ? ($checked = ' checked') : ($checked = '');
$link = $db->array_to_hash ($link);
$id = $link->{$db->{'db_key'}};
$title = $link->{'Title'};
$cat = &get_category_name ($link->{'CategoryID'});
print qq~<input type=checkbox name="delete" value="$id"$checked> <$FONT>(<a href="$SCRIPT_URL?db=Links&do=view_records&$key=$id&ww=1" target="_blank">$id</a> ) $title in <em>$cat</em><br>~;
}

Cheers,

Alex
Quote Reply
Re: Duplicates In reply to
 

Thanks Alex

It worked a treat

Tony