Gossamer Forum
Quote Reply
POP3
I've written a script to parse emails because AFAIK there are no cpan modules to actually just grab the body of an email...you get all the headers and html too.

Anyway as part of the script I need to grab the subject...thats pretty easy however I got Carsten to send me one in Japanese and the email subject turns into something like:

=?ISO-2022-JP?B?GyRCJEYkOSRIGyhCLXRlc3Q=?=

...however the body shows pretty much correctly. I thought this was a bug in my code but I opened the email in Outlook and "Properties" > "Details" shows exactly the same code for the subject line yet it displays properly as Japanese text in my inbox.

Does anyone know what is causing this and can it be fixed?

I'm assuming that if Outlook has the same issue then it is either a bug or unfixable?
Quote Reply
Re: [Paul] POP3 In reply to
You've looked at CPAN? Check out the Mail::Box::* class of modules. There's everything you need to get the body of an email in there.

- wil
Quote Reply
Re: [Wil] POP3 In reply to
Mail::Box doesn't look like it does what I need. It is for handling MailDir and MBox emails...I'm connecting to a remote server.

But in any case, the code is already done and working, it is the encoding problem I'm concerned with.

Last edited by:

Paul: May 30, 2002, 3:21 AM
Quote Reply
Re: [Paul] POP3 In reply to
The headers are MIME encoded. If you get Gossamer Forum 1.1.5, you can look at GT/Mail/Parts.pm and look at sub decode_mimewords() which will take a mime encoded header and return it decoded (I'm pretty sure this was taken or at least modeled after MIME::Lite).

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] POP3 In reply to
Thanks for the tip, so does that mean I'd should be decoding all subjects/bodies of all emails to make sure everything is decoded properly?
Quote Reply
Re: [Paul] POP3 In reply to
Well, you should decode all headers. Decoding the body is much more complex. You can have multiple parts all encoded differentlya message is much more complex. You have 7bit, 8bit, Quoted Printable and MIME encoding are popular encoding formats. You may want to look at MIME::Lite to handle the decoding of the message once you have got it from a POP box.

Even after it's decoded it's still not fun to deal with. If it's an html message from outlook, well your message has three parts:

- a multipart/alternative "envelope" which contains:
- a text/html part
- a text/plain part

Now that is significantly different from someone sending a text/plain message and attaching an html page which would still have three parts, but the top part would be multipart/mixed. =)

It was a lot of work writing GT::Mail::Parse, so you may want to use MIME::Lite, or GT::Mail::Parse (depending on what it is for).

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] POP3 In reply to
All emails being sent to the pop box the script is connecting to are all just normal emails....no attachments or anything but I just need it to handle foreign languages like japanese etc which is 7bit I think.

>>
Even after it's decoded it's still not fun to deal with. If it's an html message from outlook, well your message has three parts:

- a multipart/alternative "envelope" which contains:
- a text/html part
- a text/plain part
<<

Yeah I found that out...it sucks. I managed to parse it successfully though but in a cheaty kind of way.

Code:
for (@$email) {
(/^------=_NextPart/i && $i == 1) and last; # If $i is one and we hit Next_Part then stop saving the body.
(/^------=_NextPart/i) and $i = 1, next; # Start saving the body.
push (@$body, $_) if $i == 1;
}

That just leaves me with the nice text part. Im sure it is flawed but I've tested with about 50 emails from several different people and mail clients and it seems to parse it proprtly each time.
Quote Reply
Re: [Alex] POP3 In reply to
I've been having a play with Mail::Internet, MIME::Parser and MIME::Decoder.

MIME::Parser seems closest to what I need except I can't find out how to grab the body Angelic

If I use:

$parser->output_to_core(0)

....it creates a html and text file to disk with a copy of the message in each format. I want just the text message as an array/arrayref so I changed it to:

$parser->output_to_core(1)

....so now where does the body go?....I can't find it...hehe

Last edited by:

Paul: May 30, 2002, 9:52 AM
Quote Reply
Re: [Alex] POP3 In reply to
Nevermind, I figured it out :)
Quote Reply
Re: [Paul] POP3 In reply to
If anyone is interested here is the code to parse an email and just get the first text/plain part....

Code:
sub parse_email {
#------------------------------------------------------------------------
# Parse an email. Accepts one arg...an arrayref containing the email content.

my ($subject,$parser,$entity,$from,$subent);
my ($email) = shift;
my ($body) = [];
my ($parser) = new MIME::Parser;

$parser -> ignore_errors(1);
$parser -> decode_headers(1);
$parser -> output_to_core(1);

# Parse the email.
$entity = $parser->parse_data($email);
$subject = $entity->head->get('Subject');
$from = $entity->head->get('From');

# Only keep the first text/plain part we find.
for (0..$entity->parts) {
$subent = $entity->parts($_);
if ($subent->mime_type eq 'text/plain') {
$body = $subent->body;
last;
}
}

}
Quote Reply
Re: [Paul] POP3 In reply to
Quote:
(/^------=_NextPart/i && $i == 1) and last;


Ouch, the MIME::Parser is a much better way to go. =)

he string '------=_NextPart' is the mime boundary, and can be anything. It is specified in the Content-Type header. You can't hard code it to the string NextPart.

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] POP3 In reply to
Yep, thats why I said it was flawed and am now using MIME::Parser :)
Quote Reply
Re: [Paul] POP3 In reply to
I think that japanese is 8-bit not 7.

lol ... I could be wrong ... gotta check that ...

openoffice + gimp + sketch ... Smile
Quote Reply
Re: [QooQ] POP3 In reply to
>>
I think that japanese is 8-bit not 7.

lol ... I could be wrong ... gotta check that ...
<<

When you send me emails it says 7bit.

Woo France vs Senegal in one hour Tongue

Last edited by:

Paul: May 31, 2002, 3:25 AM
Quote Reply
Re: [Paul] POP3 In reply to
Hi. That's great but I'm getting an error when I try to pipe the email. I hope you can suggest me something

I would like to log incoming emails to my info box

... /qmail/mailnames/domain.com/.qmail-info
-------------------------------------------
|/root/parsemail.pl

-------------------------------------------

.qmail-info is owned by popuser:popuser, chmoded 600

/root/parsemail.pl is ownerd by popuser:popuser and chmoded 755

My maillog error is

qmail: 1079698609.398747 delivery 84631: deferral: /bin/sh:_/root/parsemail.pl:_Permission_denied/

I get the same error even if I change the ownership to root:qmail

By the way.. before you ask ..I can execute /root/parsemail.pl from the command line.

Do you have any experience with this problem?

Thanks!!