Gossamer Forum
Home : General : Perl Programming :

Parse Email

Quote Reply
Parse Email
I'm using a pop3 module for my helpdesk script and emails are fetched in the following format:

This is a multi-part message in MIME format.
------=_NextPart_000_000B_01C19779.AD7C3540
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printableThis is just a test.
------=_NextPart_000_000B_01C19779.AD7C3540
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =charset=3Dwindows-1252">
<META content=3D"MSHTML 6.00.2712.300" name=3DGENERATOR>
<STYLE>
</STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV>
<FONT face=3DArial size=3D2>
This is just a =test.
</FONT>
</DIV>
</BODY>
</HTML>
------=_NextPart_000_000B_01C19779.AD7C3540--

Is this the standard format or is it just because Im viewing it through my browser?

I wasn't expecting html code.

Im parsing it with the following:

$body =~ s,^.*<FONT[^>]+>(.*?)</FONT>.*$,$1,si;

Is there something more reliable I can use?

Thanks.


Quote Reply
Re: [RedRum] Parse Email In reply to
Also is X-Return-Path more reliable for getting a return address than From ?
Quote Reply
Re: [RedRum] Parse Email In reply to
Ok Return-Path seems to work well....just need to figure out about parsing the body.

This seems to work:

$body =~ s,^.*quoted-printable(.+?)------=_NextPart.*$,$1,os;
$body =~ s,<[^>]+>,,osg;

Not sure again how reliable it is. Will the message always contain quoted-printable just before the body?

Last edited by:

RedRum: Jan 7, 2002, 7:24 AM
Quote Reply
Re: [RedRum] Parse Email In reply to
I wouldn't rely on grabbing the text from everything in between the <font> and </font> tags if that is what that regex does (I am useless with regex's). I'd take a look at some free email programs and see how they do it. I can just see that coming up with a lot of errors. People including bad HTML in a message for example. Closing a </font> tag and forgetting to start one etc.

Don't know if the regex checks for this, but if I had another <font> </font> inside the message, would it realise a new <font></font> had been opened, or stop after it see's the closing tag?

Cheers,
Michael Bray
Quote Reply
Re: [Michael_Bray] Parse Email In reply to
>>I wouldn't rely on grabbing the text from everything in between the <font> and </font> tags if that is what that regex does <<

Yeah I agree, thats why I changed it....I think I'll do as you say and look for ideas.
Quote Reply
Re: [Michael_Bray] Parse Email In reply to
Ah I think some of the MIME modules may help!
Quote Reply
Re: [RedRum] Parse Email In reply to
Why don't you use a Module? Why re-invent the wheel?

A good source would also be the guts of Sendmail => http://www.sendmail.org for *ix machines. The code is open source under the GPL so you could take a peek.

- wil
Quote Reply
Re: [Wil] Parse Email In reply to
>>Why don't you use a Module? Why re-invent the wheel? <<

Well umm thats what Im asking Crazy....for a way to parse emails!!

If you can tell me a module to use then I'm all ears.
Quote Reply
Re: [RedRum] Parse Email In reply to
Mail::Internet

http://search.cpan.org/....42/Mail/Internet.pm

- wil
Quote Reply
Re: [Wil] Parse Email In reply to
Ah its ok thanks I've come up with an easy way.

Quote Reply
Re: [RedRum] Parse Email In reply to
Which is? What could be easier than using the module?

- wil