Gossamer Forum
Home : General : Perl Programming :

multiple sort on hash of arrays

Quote Reply
multiple sort on hash of arrays
Hi all, I kinda solved some of that last problem I had, but have run into more... basically I'm trying to do a multiple sort on my multidimensional data structure ( a hash of arrays of arrays!).... hre's the deal


file example
##########################################################
score start end identifier
9e-63 285 1001 ENSP00000218091
3e-58 293 996 ENSP00000218091
1e-57 280 1003 ENSP00000218091
1e-42 280 1003 ENSP00000218091
1e-42 280 1003 ENSP00000218092
1e-42 284 1004 ENSP00000218092
##########################################################

code:
##########################################################
while (<IN>){
@array = ();
if(/^\d/){
chomp $_;
@fields = split(/\s+/,$_);
if ($fields[3] =~ /ENSP/ && $fields[0] < 0.0005){
push (@array, $fields[0],
$fields[1],
$fields[2],
$fields[3],);
$match = $fields[8];
push (@{$hash{$match}}, [@array]);
}#end if
}#end if
close (IN);

############################################################
This gives me a hash with each identifier as a new key. The value for each key is an array of arrays. In this case, there are 2 arrays, one holding 4 arrays (each holding individual values corresponding to the values in the file fields) for all rows that match ENSP00000218091, the other holding 2 arrays for matches to ENSP00000218092.

I'm trying to sort firstly by identifier. This works with:

############
code:

foreach $match (sort keys %hash){
print "\n\n$match: ", scalar(@{$hash{$match}}), "hits.";
}#end foreach.

######################

But I'm trying to sort by firstly identifier, and then start position. I used the following code, but it doesn't seem to work. Can anyone help me?

######################

foreach (sort {@{$hash{$match}}->[$b]->[1] <=> @{$hash{$match}}->[$a]->[1]} keys %hash){
print "\n$match: ", scalar(@{$hash{$match}}), "hits.";
}#end foreach

###########################

Is it possible to use the 'or' operator to fit both of these into one command?

ThanksSmile
Quote Reply
Re: [tintin1978] multiple sort on hash of arrays In reply to
Maybe try this (sorry about reformatting your code, I ended up doing it while I was trying to understand how script worked):

Code:
open IN, "data.txt";

# get headers: score, start, end, etc
my $headers = <IN>;

my @matches;
my %unsorted_matches;

while ( <IN> ) {

# split into easily accessible fields
chomp;
my @hit = split /\s+/;

# basic testing
next unless $hit[3] =~ /ENSP/;
next unless $hit[0] < 0.0005;

# add our match to the lookup structure
$match = $hit[3];
push @{$unsorted_matches{$match}}, \@hit;

}

close (IN);

# print the sorted results

# first by match key : cmp compares strings
foreach my $match ( sort { $a cmp $b } keys %unsorted_matches ) {

my $match_list = $unsorted_matches{$match};

# then group counts by the start position
my %start_pos_lookup;
foreach my $p ( @$match_list ) {
$start_pos_lookup{ $p->[1] }++;
}

# now print the counts for each start position
foreach my $stp ( sort { int $a <=> int $b } keys %start_pos_lookup ) {
print "$match : $stp : $start_pos_lookup{$stp}\n";
}

}
Quote Reply
Re: [Aki] multiple sort on hash of arrays In reply to
HI, yeah ok, I'll give that a try...

I was actually working on doing something involving a Schwartzian transformation, but I've run into a little bug.... here's the code..

#########################################

#foreach indentifier...

foreach $match (sort keys %hash){

print "\n$match: ", scalar(@{$hash{$match}}), "hits."; # print

for($i=0; $i<=$#{$hash{$match}}; $i++){ #for each different identifier...

#Schwartzian transformation
#create an anonymous list of each array, plus the value of the corresponding start point... this is where I have trouble... the array reference is accessed ok, but I keep getting the same start point... i'll paste in the output below so you can see...basically, i need a way to access @{$hash{$match}}->[$i]->[1] without specifying $i each time...

@precomputed = map {[$_, @{$hash{$match}}->[$i]->[1]]} @{$hash{$match}};

for ($z=0; $z<@precomputed; $z++){
print "\n$z, $precomputed[$z]->[0], $precomputed[$z]->[1]";
}#end for

@ordered_precomputed = sort {$a->[1] <=> $b->[1]} @precomputed;
@ordered = map { $_->[0] } @ordered_precomputed;

}#end for

for ($j=0; $j<@ordered; $j++){
print "\n@ordered->[$j]->[0]";
print "\n@ordered->[$j]->[1]";
}#end for

}#end foreach

##############################################
output....


ENSP00000218091: 4hits.

Array ref: ARRAY(0x810802c)
e_value: 9e-63
q_start: 285
q_end: 1001
m_name: ENSP00000218091

Schwartz:
precomp len: 4
0, ARRAY(0x810802c), 285 ######################
1, ARRAY(0x8110098), 285 #here, the arrays are accessed,
2, ARRAY(0x81100e0), 285 #but not the correct start point
3, ARRAY(0x8110128), 285 #########################

Array ref: ARRAY(0x8110098)
e_value: 3e-58
q_start: 293
q_end: 996
m_name: ENSP00000218091

Schwartz:
precomp len: 4
0, ARRAY(0x810802c), 293
1, ARRAY(0x8110098), 293
2, ARRAY(0x81100e0), 293
3, ARRAY(0x8110128), 293

Array ref: ARRAY(0x81100e0)
e_value: 1e-57
q_start: 280
q_end: 1003
m_name: ENSP00000218091

Schwartz:
precomp len: 4
0, ARRAY(0x810802c), 280
1, ARRAY(0x8110098), 280
2, ARRAY(0x81100e0), 280
3, ARRAY(0x8110128), 280

Array ref: ARRAY(0x8110128)
e_value: 1e-42
q_start: 280
q_end: 1003
m_name: ENSP00000218091

Schwartz:
precomp len: 4
0, ARRAY(0x810802c), 280
1, ARRAY(0x8110098), 280
2, ARRAY(0x81100e0), 280
3, ARRAY(0x8110128), 280
9e-63
285
3e-58
293
1e-57
280
1e-42
280
ENSP00000218092: 2hits.

Array ref: ARRAY(0x8110194)
e_value: 1e-43
q_start: 280
q_end: 1003
m_name: ENSP00000218092

Schwartz:
precomp len: 2
0, ARRAY(0x8110194), 280
1, ARRAY(0x81101d0), 280

Array ref: ARRAY(0x81101d0)
e_value: 1e-40
q_start: 284
q_end: 1004
m_name: ENSP00000218092

Schwartz:
precomp len: 2
0, ARRAY(0x8110194), 284
1, ARRAY(0x81101d0), 284
1e-43
280
1e-40
284
Quote Reply
Re: [tintin1978] multiple sort on hash of arrays In reply to
Thanks Aki... it works great!

Smile
Quote Reply
Re: [tintin1978] multiple sort on hash of arrays In reply to
I'm still a little curious about that last problem though... in the Schwartzian transformation... the problem with accessing the start point, without using a 'for' loop (which was causing @{$hash{$match}}->[$i]->[1] to be accessed each time instead of ...->[0]->[1], ..->[1]->[1]... ->[2]->[1].. etc...anyone any ideas?

cheers
Quote Reply
Re: [tintin1978] multiple sort on hash of arrays In reply to
mmmm Unimpressed I'm getting a little confused. What do you want to achieve with this code?
Quote Reply
Re: [Aki] multiple sort on hash of arrays In reply to
Basically, I wanted to achieve exactly what I got to work from that previous post you sent... ie. to order the hits in terms of name, and then start position... I just stumbled upon a little problem when I was looking at that other way (the transformation), and was curious if I could fix it at all.....

it's as I've always been told, there are loads of different ways to solve most perl problems, and I have it solved now that you gave me that help, thanks.... I was just curious about the other thing, but no hassles...

cheers again..