Gossamer Forum
Home : Products : Gossamer Forum : Development, Plugins and Globals :

Bug in converting date in MailArc

Quote Reply
Bug in converting date in MailArc
Hi together,

I import mbox archive files and there are some wired date formats
wich get not right converted. There are some regex in the MailArc like:

# Sat, 28 Jul 2001 08:44:00 -0700
# Sat, 28 Jul 2001 08:44:00 EST
# Sat, 28 Jul 2001 08:44:00 "EST"
# Sat, 21 Jul 01 19:07:20
# 28 Jul 2001 14:57:07 -0000
# 28 Jul 2001 14:57:07 GMT
# 28 Jul 2001 14:57:07 "GMT"
# 20 May 01 6:33:30 PM

There are some other used formats like:

Date: Sun Aug 10 10:52:46 2003

Can anybody tell me what's the regex for this?
Alex, is it possible to add this regex to the
the MailArc parse_date method?
Otherwise the post date is the date of importing
via archive.pl

I'm not shure, but I think that's also the case
in the lists forum e.g. see
http://gossamer-threads.com/..._string=coffee;#1205

The header of this message is:

From sidnei at plone.org Sat Aug 9 20:17:24 2003
From: sidnei at plone.org (Sidnei da Silva)
Date: Sun Aug 10 10:52:46 2003
Subject: [Zope-Annce] Archetypes 1.0 Released
Message-ID: <1060467444.3f3572f4d18ca@webmail.redesul.com.br>

and the posted date in the forum is:

Aug 9, 2003, 3:17 PM

That's the date which the mail was imported via a import script.
I think a list archive should always show the same real posting date.
An not different dates in mails, news channels and list archives.

Perhaps I'm missed something and somebody can point me
to the right direction.

Thanks Roger
Quote Reply
Re: [projekt01gmbh] Bug in converting date in MailArc In reply to
I recommend to add another Date: formatter like:

/\w+ \w+ \d\d? \d?\d:\d?\d:\d\d \d{4}/ and do { $format = '%ddd% %mmm% %dd% %HH%:%MM%:%ss% %yyyy%'; last CASE };

and change the parse_date method in MailArc to:

sub parse_date {
# -------------------------------------------------------------------
# Internal use, not usfull from a template.
# Parse an RFC 822 5.1 compliant date into one understood by mysql.
# Formats expected:
# Sat, 28 Jul 2001 08:44:00 -0700
# Sat, 28 Jul 2001 08:44:00 EST
# Sat, 28 Jul 2001 08:44:00 "EST"
# Sat, 21 Jul 01 19:07:20
# 28 Jul 2001 14:57:07 -0000
# 28 Jul 2001 14:57:07 GMT
# 28 Jul 2001 14:57:07 "GMT"
# 20 May 01 6:33:30 PM
#
# ...another Date: format
# Tue May 11 09:45:35 2004
#
#
# Only the first date is an RFC date, but it appears lots of clients don't
# use the RFC.
#
my $date = shift;
$date || return;
my $format;
$date =~ s/\s\s/ /g;
CASE: for ($date) {
/\w+, \d\d? \w+ \d{4} \d?\d:\d?\d:\d\d (?:[+-]\d+|\w+)/ and do { $format = '%ddd%, %dd% %mmm% %yyyy% %HH%:%MM%:%ss% %o%'; last CASE };
/\w+, \d\d? \w+ \d{4} \d?\d:\d?\d:\d\d "(?:[+-]\d+|\w+)"/ and do { $format = '%ddd%, %dd% %mmm% %yyyy% %HH%:%MM%:%ss% "%o%"'; last CASE };
/\w+, \d\d? \w+ \d{4} \d?\d:\d?\d:\d\d/ and do { $format = '%ddd%, %dd% %mmm% %yyyy% %HH%:%MM%:%ss%'; last CASE };
/\w+, \d\d? \w+ \d{2} \d?\d:\d?\d:\d\d/ and do { $format = '%ddd%, %dd% %mmm% %yy% %HH%:%MM%:%ss%'; last CASE };

/\w+, \d\d? \w+ \d{4} \d?\d:\d?\d:\d\d (?:[+-]\d+|\w+)/ and do { $format = '%ddd%, %dd% %mmm% %yyyy% %HH%:%MM%:%ss% %o%'; last CASE };
/\w+, \d\d? \w+ \d{4} \d?\d:\d?\d:\d\d "(?:[+-]\d+|\w+)"/ and do { $format = '%ddd%, %dd% %mmm% %yyyy% %HH%:%MM%:%ss% "%o%"'; last CASE };
/\w+, \d\d? \w+ \d{4} \d?\d:\d?\d:\d\d/ and do { $format = '%ddd%, %dd% %mmm% %yyyy% %HH%:%MM%:%ss%'; last CASE };
/\w+, \d\d? \w+ \d{2} \d?\d:\d?\d:\d\d/ and do { $format = '%ddd%, %dd% %mmm% %yy% %HH%:%MM%:%ss%'; last CASE };
/\d\d? \w+ \d{4} \d?\d:\d?\d:\d\d (?:[+-]\d+|\w+)/ and do { $format = '%dd% %mmm% %yyyy% %HH%:%MM%:%ss% %o%'; last CASE };
/\d\d? \w+ \d{4} \d?\d:\d?\d:\d\d "(?:[+-]\d+|\w+)"/ and do { $format = '%dd% %mmm% %yyyy% %HH%:%MM%:%ss% "%o%"'; last CASE };
/\d\d? \w+ \d{2} \d?\d:\d?\d:\d\d [AaPpMm]{2}/ and do { $format = '%dd% %mmm% %yy% %hh%:%MM%:%ss% %tt%'; last CASE };

# another format (ri)
/\w+ \w+ \d\d? \d?\d:\d?\d:\d\d \d{4}/ and do { $format = '%ddd% %mmm% %dd% %HH%:%MM%:%ss% %yyyy%'; last CASE };
}
$format or return;

my $parts = [split /\s/, $date];
my $str_date = fill_zero($parts);
my $time = [split /:/, @$parts[4]];
my $str_time = fill_zero($time,":");
$str_date =~ s/\Q@$parts[4]\E/$str_time/;
my $time = timelocal(parse_format($str_date, $format));
if ($time > time) {
Plugins::GForum::MailArc->error('WARNING', 'WARN', "Time '$str_date' => '$time' is in the future. Current time: " . time . "\n");
$time = time;
}
return $time;
}


There is also another bug if someone use ' at ' in the mail Form: address instead of '@'.
I recommend to add:

$value =~ s/ at /@/[/url];

to the method parse_email like:

sub parse_email {
# -------------------------------------------------------------------
# Return a hash contains name and email from email header if applicable
my $value = shift;
my $result;

# replace ' at ' with '@' in From: address
$value =~ s/ at /@/[/url];

if ($value =~ /"?([^<"]+)"?\s*<([^>]+)>/) {
$result = $2;
}
elsif ($value =~ /<([^>]+)>/) {
$result = $1;
}
else {
$result = $value || '';
$result =~ s/\([^)]+\)//g;
}
$result =~ s/\s//g;
chomp $result;
return $result;
}

Thanks Roger Ineichen