Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: GnuPG: devel

Differentiating GPG data from random data

 

 

GnuPG devel RSS feed   Index | Next | Previous | View Threaded


ted at 16systems

Nov 24, 2008, 2:19 PM

Post #1 of 4 (820 views)
Permalink
Differentiating GPG data from random data

Hi,

Hope this is not off-topic here.

I'm writing a program that searches for files that are made up of
random data. GPG data (that is not ascii armored) is consistently
identified by the program. That's expected as GPG data is very random.
However, even though GPG data passes the random tests, I'm not
interested in finding GPG encrypted files, so I thought I would write
a routine to exclude these files based on the first few bytes of the
file, but I'm not comfortable with doing that. It's not ideal, but
seems to work OK. Basically I'm skipping random data files that have
certain bytes in the beginning like so:

Symmetric:
Hex(8c 0d 04 03)
Dec(140 13 4 3)

Asymmetric:
Hex(85 02 0e 03)
Dec(133 2 14 3)

This works well in informal testing on multiple systems running
various versions of GPG, but I bet it will fail a lot in the real
world after reading the RFC's. That's why I thought I might pose the
question to this list. Is there a simple way to skip most GPG
encrypted files without implementing 4880? It does not have to be
perfect, but perhaps there is something better than what I have
described above.

Thanks for any suggestions,

Ted

_______________________________________________
Gnupg-devel mailing list
Gnupg-devel [at] gnupg
http://lists.gnupg.org/mailman/listinfo/gnupg-devel


dshaw at jabberwocky

Nov 24, 2008, 8:21 PM

Post #2 of 4 (774 views)
Permalink
Re: Differentiating GPG data from random data [In reply to]

On Nov 24, 2008, at 5:19 PM, Ted wrote:

> Hi,
>
> Hope this is not off-topic here.
>
> I'm writing a program that searches for files that are made up of
> random data. GPG data (that is not ascii armored) is consistently
> identified by the program. That's expected as GPG data is very random.
> However, even though GPG data passes the random tests, I'm not
> interested in finding GPG encrypted files, so I thought I would write
> a routine to exclude these files based on the first few bytes of the
> file, but I'm not comfortable with doing that. It's not ideal, but
> seems to work OK. Basically I'm skipping random data files that have
> certain bytes in the beginning like so:
>
> Symmetric:
> Hex(8c 0d 04 03)
> Dec(140 13 4 3)
>
> Asymmetric:
> Hex(85 02 0e 03)
> Dec(133 2 14 3)
>
> This works well in informal testing on multiple systems running
> various versions of GPG, but I bet it will fail a lot in the real
> world after reading the RFC's. That's why I thought I might pose the
> question to this list. Is there a simple way to skip most GPG
> encrypted files without implementing 4880? It does not have to be
> perfect, but perhaps there is something better than what I have
> described above.

Those bytes will more-or-less work, but as you say won't catch
everything. In OpenPGP, the first few octets cover the length and
type of the packet, so those bytes hardcode a particular length, which
is probably not what you want. For example, the "85 02 0e 03" from
your example is an old-style encrypted session key that is 526 bytes
long, which will only match a particular key size.

The problem is that OpenPGP has so many different ways to encode a
particular packet, that writing a rule loose enough to match them all
will inevitably have a huge number of false positives. For example,
hex 84, 85, 86, and C1 can all indicate an asymmetrically encrypted
message. 85 is the most common (and 84 would be extremely uncommon),
but they are all possible. Some OpenPGP programs start with or A8,
A9, AA, or CA (though it is virtually always A8). GPG will read such
a message, but doesn't generate it.

For your purpose, is it better to have false positives or false
negatives? That is, it is better to accidentally include some GPG
files, or better to accidentally exclude some files? That would help
in figuring out how many bytes you want to match on.

David


_______________________________________________
Gnupg-devel mailing list
Gnupg-devel [at] gnupg
http://lists.gnupg.org/mailman/listinfo/gnupg-devel


ted at 16systems

Nov 25, 2008, 6:09 AM

Post #3 of 4 (774 views)
Permalink
Re: Differentiating GPG data from random data [In reply to]

On Mon, Nov 24, 2008 at 11:21 PM, David Shaw <dshaw [at] jabberwocky> wrote:

> Those bytes will more-or-less work, but as you say won't catch everything.
> In OpenPGP, the first few octets cover the length and type of the packet,
> so those bytes hardcode a particular length, which is probably not what you
> want. For example, the "85 02 0e 03" from your example is an old-style
> encrypted session key that is 526 bytes long, which will only match a
> particular key size.
>
> The problem is that OpenPGP has so many different ways to encode a
> particular packet, that writing a rule loose enough to match them all will
> inevitably have a huge number of false positives. For example, hex 84, 85,
> 86, and C1 can all indicate an asymmetrically encrypted message. 85 is the
> most common (and 84 would be extremely uncommon), but they are all possible.
> Some OpenPGP programs start with or A8, A9, AA, or CA (though it is
> virtually always A8). GPG will read such a message, but doesn't generate
> it.
>
> For your purpose, is it better to have false positives or false negatives?
> That is, it is better to accidentally include some GPG files, or better to
> accidentally exclude some files? That would help in figuring out how many
> bytes you want to match on.
>
> David

Thank you for the information. It confirms what I thought after
reading the RFCs. It would be better for me to accidentally include
some GPG files rather than accidentally exclude files I'm searching
for. I can manually look at the files and use GnuPG to easily tell the
GPG ones from the non-GPG ones.

Thanks again,
Ted

_______________________________________________
Gnupg-devel mailing list
Gnupg-devel [at] gnupg
http://lists.gnupg.org/mailman/listinfo/gnupg-devel


lists-gnupgdev at lina

Nov 27, 2008, 7:42 PM

Post #4 of 4 (757 views)
Permalink
Re: Differentiating GPG data from random data [In reply to]

On Tue, Nov 25, 2008 at 09:09:09AM -0500, Ted wrote:
> Thank you for the information. It confirms what I thought after
> reading the RFCs. It would be better for me to accidentally include
> some GPG files rather than accidentally exclude files I'm searching
> for. I can manually look at the files and use GnuPG to easily tell the
> GPG ones from the non-GPG ones.

you could run

gpg --dry-run --list-packets --batch --home empty/ --status-fd 1 <file>

and check "NODATA" on stdout. Unfortunatelly a malformed PGP packet will
have the same return code (2) than a encrypted message (at least in my 1.4.1
which i tried the NO DATA and NO_SECKEY both cause exit code 2).

This will detect lots of (valid) OpenPGP files. Not sure if there are saner
options to actually make gpg not do anything.

Gruss
Bernd
--
(OO) -- Bernd_Eckenfels@Mörscher_Strasse_8.76185Karlsruhe.de --
( .. ) ecki@{inka.de,linux.de,debian.org} http://www.eckes.org/
o--o 1024D/E383CD7E eckes [at] IRCNe v:+497211603874 f:+49721151516129
(O____O) When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl!

_______________________________________________
Gnupg-devel mailing list
Gnupg-devel [at] gnupg
http://lists.gnupg.org/mailman/listinfo/gnupg-devel

GnuPG devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.