Gossamer Forum
Home : General : Perl Programming :

A simple question in Perl

Quote Reply
A simple question in Perl
I want to make an if statement which prevent users from entering characters like "|" and HTML.
The statement is something like:
if (characters contain HTML and |){error}

What is the syntax is Perl?
Quote Reply
Re: A simple question in Perl In reply to
This is somewhat difficult to do on its own, however, let's assume for now that a tag will be any instance in which we have text enclosed with <and>. This procedure does not care if the tag is a valid HTML tag, just that we have some item enclosed inside a set of tag symbols.
Code:
if ($input =~ m/<(.+)>|\|/g) {
print "found a match $1";
exit;
}
This first block, will FIRST match for a tag, and if it doesn't find any, it will ten look for the pipe symbol "|". You could change this a bit:
Code:
if ($input =~ m/\&#0124; &#0124;<(.+)>/g) {
print "found a match $1";
exit;
}
This one will look for the pipe first, and then for an anchor. This is an either/or situation, if you wanted to look for items that looked like "|<BLAH>", then you would use:
Code:
if ($input =~ m/\|<(.+)>/g) {
print "found a match $1";
exit;
}
Note in all these examples, that I print out $1 after each match. This value is the value of the last item matched by the first set of parenthesis. As an example, if we had a tag <look out!>, and performed a match using example 1 or 2, the value of $1 would be "look out!". You can also use $& to denote the value of the entire match, in our example, it would simply be "<look out!>"

Hope this helps,


------------------
Fred Hirsch
Web Consultant & Programmer
Quote Reply
Re: A simple question in Perl In reply to
Also, watch out for this little trick:

$input = qq~
<font size=7 alt="
">Wont be removed!
~;

You need to add on a /s to treat the string as a single line:

m/<.+?>/s

However, this still isn't perfect and will fail in the case of:

<img src="blah.gif" alt="<cool> site!">

which is valid HTML. This is the reason you need to use HTML::Parser module to really remove the HTML.

Cheers,

Alex