Gossamer Forum
Home : General : Perl Programming :

Unable to clean (Regex) 1 foreign character

Quote Reply
Unable to clean (Regex) 1 foreign character
Hi,

Having a big problem with a "foreign" character in a text database.

Due to this strange character the file does not even want to split in different $lines, but stops ones the perl program reaches the first appearence of this character.

So I can not even clean every line with a regex, as the file does not want to split in $lines.

What I need is a Regex that cleans the whole file "@dbbase" in one go first, and the try to split the file in $lines, once the file is cleaned.

As a attachement I am adding a small example database.txt consisting out of 3 lines with in every line this strange character.

I am getting those strange characters because the text from the database is parsed from pages that are 50% in English and 50% in Thai.

So I think that this character is a foreign Thai character.

Here is the script I wanr to use:

open (TEMPO, "database.txt");

@dbbase = <TEMPO>;

close(TEMPO);

foreach $line(@dbbase) {

# replace line breaks with spaces:

$line =~ tr!\r\n\t! !;

chomp($line);

($url,$title,$keywords)=split(/\|/, $line);

......etc

......etc

Quote Reply
Re: [sanuk] Unable to clean (Regex) 1 foreign character In reply to
Hi,

SEE: attachement !!!!

Some one has been able to identify those "foreign characters" in my txt database of being EOF (end of file) or code 26.

Now I still need a small perl snippet to open this file, clean out those characters and save the whole file again.

Regards,

Sanuk