Gossamer Forum
Home : General : Perl Programming :

how do I extract complexes from PDB site?

Quote Reply
how do I extract complexes from PDB site?
Hi...
can anyone help me? I have to extract protein complexes from PDB site www.rcsb.org
by using perl programming. After extracting them I have to store them in a file .
But I dnt know how can I make a loop and then get all the complexes one by one in a file.
This all I have to do with perl programming....& for that I developed a programm...bt it is not working.....
#my programm

use strict;
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new( autocheck => 1);
$mech->get('http://www.rcsb.org/pdb/Welcome.do');
$mech->submit(
headerQueryForm => 'search',
fields => {'radioset' =>'Structures',},
fields => { 'inputQuickSearch' =>'protein complexes',},
button => 'search now'
);
#$mech->search;
my $results = $mech->content;
my @pdb;
while ($results == 'Sperm whale myoglobin mutant T67R S92D')
{
push @pdb,$1;
}


# open(FH, ">print.pdb");
# print FH $mech;
# close(FH);

please help me out............

-tripti
Quote Reply
Re: [Tripti Vijay] how do I extract complexes from PDB site? In reply to
In Reply To:
Hi...
can anyone help me? I have to extract protein complexes from PDB site www.rcsb.org
by using perl programming. After extracting them I have to store them in a file .
But I dnt know how can I make a loop and then get all the complexes one by one in a file.
This all I have to do with perl programming....& for that I developed a programm...bt it is not working.....
#my programm

Code:
use strict;
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new( autocheck => 1);
$mech->get('http://www.rcsb.org/pdb/Welcome.do');
$mech->submit(
headerQueryForm => 'search',
fields => {'radioset' =>'Structures',},
fields => { 'inputQuickSearch' =>'protein complexes',},
button => 'search now'
);
#$mech->search;
my $results = $mech->content;
my @pdb;
while ($results == 'Sperm whale myoglobin mutant T67R S92D')
{
push @pdb,$1;
}


# open(FH, ">print.pdb");
# print FH $mech;
# close(FH);



please help me out............

-tripti

Some immediate problems...
Probably

fields => {'radioset' =>'Structures',},
fields => { 'inputQuickSearch' =>'protein complexes',},
Will cause problems.
Code:
my $mech = WWW::Mechanize->new( autocheck => 1);
$mech->submit(
headerQueryForm => 'search',
fields => {'radioset' =>'Structures',},
fields => { 'inputQuickSearch' =>'protein complexes',},
button => 'search now'
);
You are 'use'ing strict so these will not work. In these lines, the parts before
the "=>"s (that are not in quotes) might clash with future reserved words (that's
the error message). Yet you have "'radioset' =>'Structures'". Did you copy it?

Code:
while ($results == 'Sperm whale myoglobin mutant T67R S92D')
{
push @pdb,$1;
}
Perl does not use all of the same operators as say Java. In Java, you'd check
a == "b". In Perl, to do the same, you need to use the 'eq' operator.
Code:
while ($results eq 'Sperm whale myoglobin mutant T67R S92D')
{
push @pdb,$1;
}

...but there is no $1. Additionally, you are not parsing the content,
but merely checking it it is equal. I bet that you will never get to
'push @pdb, $1' because $results never equals that.




It does not look like you know Perl. I will not write your programs for you, only, perhaps, help.
  • There is a reason there is perldoc. You can readily verify that you are using the modules properly (as well as just about any other part of Perl).
  • It helps to give some information:
    • intended results
    • errors, warnings, general failures
    • etc.

Quote Reply
Re: [mkp] how do I extract complexes from PDB site? In reply to
Hi,
Thanx for replying me and solving my problem.Well I didn't copy it ...In fact it was specified in a module called WWW::Mechanise.I just changed the fields name according to my need.
By the way you guessed correct I'm very new to this field of Perl.But I'm trying hard .And I think like you people who are masters in perl will always help me.

Now I'm facing another problem which is that how do I extract links of all the webpages results? I mean at PDB site I got 29 pages of results but my programm is extracting only the links of first page. I do not know how do I make a loop so that it fetches all the webpages i.e. from page number 2 till 29.
Quote Reply
Re: [Tripti Vijay] how do I extract complexes from PDB site? In reply to
Since you need to recursively crawl a site, it probably makes sense to use modules like WWW::Spyder or WWW::Robot and HTML::Parser.

Retrieving a URL and parsing the document is fairly simplistic. But recursing a website is a bit trickier. It would be a lot more useful if that website provided simplified (machine readable) results. They might.

If you want to write the spidering code yourself, keep in mind that a 'for' loop is just a 'while' loop in disguise:

Code:
for (my $i = 0; $i < 10; $i++) {
# do something
}





is the same as

Code:
my $i = 0;
while ($i++ < 10) {
# do something
}



Recursion, on the other hand, usually relies on subroutines that call eachother or themselves until some condition is met (you crawled the part of the site you wanted).

Or would using the same data locally be more useful (if that is the same data, I'm betting yes)?

Last edited by:

mkp: Feb 27, 2006, 4:38 AM