Hi,
Having a big problem with a "foreign" character in a text database.
Due to this strange character the file does not even want to split in different $lines, but stops ones the perl program reaches the first appearence of this character.
So I can not even clean every line with a regex, as the file does not want to split in $lines.
What I need is a Regex that cleans the whole file "@dbbase" in one go first, and the try to split the file in $lines, once the file is cleaned.
As a attachement I am adding a small example database.txt consisting out of 3 lines with in every line this strange character.
I am getting those strange characters because the text from the database is parsed from pages that are 50% in English and 50% in Thai.
So I think that this character is a foreign Thai character.
Here is the script I wanr to use:
open (TEMPO, "database.txt");
@dbbase = <TEMPO>;
close(TEMPO);
foreach $line(@dbbase) {
# replace line breaks with spaces:
$line =~ tr!\r\n\t! !;
chomp($line);
($url,$title,$keywords)=split(/\|/, $line);
......etc
......etc
Having a big problem with a "foreign" character in a text database.
Due to this strange character the file does not even want to split in different $lines, but stops ones the perl program reaches the first appearence of this character.
So I can not even clean every line with a regex, as the file does not want to split in $lines.
What I need is a Regex that cleans the whole file "@dbbase" in one go first, and the try to split the file in $lines, once the file is cleaned.
As a attachement I am adding a small example database.txt consisting out of 3 lines with in every line this strange character.
I am getting those strange characters because the text from the database is parsed from pages that are 50% in English and 50% in Thai.
So I think that this character is a foreign Thai character.
Here is the script I wanr to use:
open (TEMPO, "database.txt");
@dbbase = <TEMPO>;
close(TEMPO);
foreach $line(@dbbase) {
# replace line breaks with spaces:
$line =~ tr!\r\n\t! !;
chomp($line);
($url,$title,$keywords)=split(/\|/, $line);
......etc
......etc