Gossamer Forum
Home : Products : Gossamer Forum : Discussion :

regex to match/strip/convert markup tags

Quote Reply
regex to match/strip/convert markup tags
Has anyone come up with a regex that matches markup tags, without also matching any text between markup tags? I tried something as simple as \[+\], but of course that would match any non-markup text between an opening and closing markup tag, for example.

What I'm trying to do is include some "most recent posts" blurbs on my home page, which is in PHP. I'm able to use PHP to pull the information from the gforum_Post table, but the post_message field contains markup content. I need a way either to strip out the markup altogether, or (ideally) to convert it to HTML. I'm sure that's easy to do with the GT libs, but since my home page is in PHP, that's not really an option.

Thanks in advance for any ideas or suggestions.

Fractured Atlas :: Liberate the Artist
Services: Healthcare, Fiscal Sponsorship, Marketing, Education, The Emerging Artists Fund
Quote Reply
Re: [hennagaijin] regex to match/strip/convert markup tags In reply to
Well, I set out to figure this out, and learned quite a bit about regular expressions in the process... In any event, I did come up with a function in PHP that converts almost all markup tags to html and strips out any remaining markup tags. But since I'm only including "blurbs" (i.e. the first x number of characters in the most recent post), this turned out to be problematic, since sometimes it would cut off a post in the middle of an html tag, which wrecked havoc on the html formatting of the page as a whole. So I ultimately decided to use a simplified version that just strips all markup tags altogether:

Code:


function format_blurb($text) {
##Strip all markup tags
$text = ereg_replace("\[([^]]*)]", "", $text);
##Remove whitespace and add line-breaks
$text = trim($text);
$text = ereg_replace("\n *\n *\n+", "\n\n", $text);
$text = nl2br($text);
##Break up long words (e.g. urls) for table formatting purposes
$text = ereg_replace("([^ ]{30})", chunk_split("\\1", 30, "<br />"), $text);
##Shorten to 250 characters and add an ellipse if necessary
if(strlen($text) > 250) $text = substr($text, 0, 250) . "...";
return $text;
}


This is working pretty well for me so far. If anyone's interested, the part that was really tripping me up was the tendency of regex matches to find the largest possible match. So if there was a markup tag at the beginning of the post and one at the end of the post, it was matching everything between that first "[" and the last "]" - in other words, the whole post! But I was able to fix that by changing the regex to:

\[([^]]*)]

Fractured Atlas :: Liberate the Artist
Services: Healthcare, Fiscal Sponsorship, Marketing, Education, The Emerging Artists Fund
Quote Reply
Re: [hennagaijin] regex to match/strip/convert markup tags In reply to
Not sure how the php regex engine works but with perl you'd need to escape three occurances of [ and ] so they are taken as literal, ie:

s/\[[^\]]+\]//g;

...so it looks like your php code may have a few errors as I notice you escape one [ but not the other.

Last edited by:

Paul: Dec 8, 2002, 7:03 AM
Quote Reply
Re: [Paul] regex to match/strip/convert markup tags In reply to
Actually, the first time I tried it, I assumed I would have to escape that middle ], but for some reason that didn't work. When I removed the \, it worked. So I guess there may be some difference between the PHP and perl regex engines? Maybe the ^ functions as a sort of escape character as well? Not sure - but it definitely works.


EDIT: Sorry - there was a typo as well. You DO have to escape the third ], just not the middle one. So it should be: \[([^]]*)\]

Fractured Atlas :: Liberate the Artist
Services: Healthcare, Fiscal Sponsorship, Marketing, Education, The Emerging Artists Fund

Last edited by:

hennagaijin: Dec 8, 2002, 7:44 AM