A question most likely for Paul.
I have read an html file into a string. I need to extract from this string a title for the record. The title needs to be the text in between the first <h*>....</h*> tag or perhaps a <p><b></b></p> or <p></p> tag.
The format of the documents varies, in that some begin with a <h1> tag, whilst others begin with a <p>.
Any help would be very much appreciated.
I had tried using:
$title =~ s/<h1>*{1,200}</h1>//i;
$title = $';
------------------
Dorg Hurgler Van Schongleur,
NordHein Van Resetelem, Belgium.
Eck SchekeBuugler Technologies.
I have read an html file into a string. I need to extract from this string a title for the record. The title needs to be the text in between the first <h*>....</h*> tag or perhaps a <p><b></b></p> or <p></p> tag.
The format of the documents varies, in that some begin with a <h1> tag, whilst others begin with a <p>.
Any help would be very much appreciated.
I had tried using:
Code:
$title = $string; $title =~ s/<h1>*{1,200}</h1>//i;
$title = $';
Dorg Hurgler Van Schongleur,
NordHein Van Resetelem, Belgium.
Eck SchekeBuugler Technologies.