Gossamer Forum: General: Perl Programming: Parse Email

Jan 7, 2002, 6:01 AM

Paul

Veteran (19537 posts)

Jan 7, 2002, 6:01 AM

Post #1 of 11

Shortcut

Parse Email

I'm using a pop3 module for my helpdesk script and emails are fetched in the following format:

This is a multi-part message in MIME format.
------=_NextPart_000_000B_01C19779.AD7C3540
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printableThis is just a test.
------=_NextPart_000_000B_01C19779.AD7C3540
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =charset=3Dwindows-1252">
<META content=3D"MSHTML 6.00.2712.300" name=3DGENERATOR>
<STYLE>
</STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV>
<FONT face=3DArial size=3D2>
This is just a =test.
</FONT>
</DIV>
</BODY>
</HTML>
------=_NextPart_000_000B_01C19779.AD7C3540--

Is this the standard format or is it just because Im viewing it through my browser?

I wasn't expecting html code.

Im parsing it with the following:

$body =~ s,^.*<FONT[^>]+>(.*?)</FONT>.*$,$1,si;

Is there something more reliable I can use?

Thanks.

Jan 7, 2002, 7:06 AM

Paul

Veteran (19537 posts)

Jan 7, 2002, 7:06 AM

Post #2 of 11

Shortcut

Re: [RedRum] Parse Email In reply to

Also is X-Return-Path more reliable for getting a return address than From ?

Jan 7, 2002, 7:22 AM

Paul

Veteran (19537 posts)

Jan 7, 2002, 7:22 AM

Post #3 of 11

Shortcut

Re: [RedRum] Parse Email In reply to

Ok Return-Path seems to work well....just need to figure out about parsing the body.

This seems to work:

$body =~ s,^.*quoted-printable(.+?)------=_NextPart.*$,$1,os;
$body =~ s,<[^>]+>,,osg;

Not sure again how reliable it is. Will the message always contain quoted-printable just before the body?

Last edited by:

RedRum: Jan 7, 2002, 7:24 AM

Jan 7, 2002, 8:19 AM

Michael_Bray

User (378 posts)

Jan 7, 2002, 8:19 AM

Post #4 of 11

Shortcut

Re: [RedRum] Parse Email In reply to

I wouldn't rely on grabbing the text from everything in between the <font> and </font> tags if that is what that regex does (I am useless with regex's). I'd take a look at some free email programs and see how they do it. I can just see that coming up with a lot of errors. People including bad HTML in a message for example. Closing a </font> tag and forgetting to start one etc.

Don't know if the regex checks for this, but if I had another <font> </font> inside the message, would it realise a new <font></font> had been opened, or stop after it see's the closing tag?

Cheers,
Michael Bray

Jan 7, 2002, 8:33 AM

Paul

Veteran (19537 posts)

Jan 7, 2002, 8:33 AM

Post #5 of 11

Shortcut

Re: [Michael_Bray] Parse Email In reply to

>>I wouldn't rely on grabbing the text from everything in between the <font> and </font> tags if that is what that regex does <<

Yeah I agree, thats why I changed it....I think I'll do as you say and look for ideas.

Jan 7, 2002, 9:14 AM

Paul

Veteran (19537 posts)

Jan 7, 2002, 9:14 AM

Post #6 of 11

Shortcut

Re: [Michael_Bray] Parse Email In reply to

Ah I think some of the MIME modules may help!

Jan 7, 2002, 9:17 AM

Wil

Veteran / Moderator (4108 posts)

Jan 7, 2002, 9:17 AM

Post #7 of 11

Shortcut

Re: [RedRum] Parse Email In reply to

Why don't you use a Module? Why re-invent the wheel?

A good source would also be the guts of Sendmail => http://www.sendmail.org for *ix machines. The code is open source under the GPL so you could take a peek.

- wil

Jan 7, 2002, 9:26 AM

Paul

Veteran (19537 posts)

Jan 7, 2002, 9:26 AM

Post #8 of 11

Shortcut

Re: [Wil] Parse Email In reply to

>>Why don't you use a Module? Why re-invent the wheel? <<

Well umm thats what Im asking Crazy