Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: ClamAV: users

Generating signatures for malware

 

 

ClamAV users RSS feed   Index | Next | Previous | View Threaded


mbroekman at maileig

Aug 28, 2012, 7:15 AM

Post #1 of 10 (1448 views)
Permalink
Generating signatures for malware

Does anyone know of a tool that would take strings in a hex signature
and turn them into appropriate wildcards? For instance, I want to strip
out all the "http://" and "https://" and replace them with {7-8} to
reduce the size of the signature and get more 'useful' strings in the
signature? There are other strings as well so it's not just a I've been
using sed but it's a little unwieldy and more than occasionally requires
manual treatment afterwards.



--Maarten

_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml


ged at jubileegroup

Aug 29, 2012, 6:26 AM

Post #2 of 10 (1429 views)
Permalink
Re: Generating signatures for malware [In reply to]

Hi there,

On Wed, 29 Aug 2012, Maarten Broekman wrote:

> Does anyone know of a tool that would take strings in a hex signature
> and turn them into appropriate wildcards? For instance, I want to strip
> out all the "http://" and "https://" and replace them with {7-8}

Your suggested replacement does not make sense to me.

> to reduce the size of the signature and get more 'useful' strings in the
> signature? There are other strings as well so it's not just a I've been
> using sed but it's a little unwieldy and more than occasionally requires
> manual treatment afterwards.

There seems to be at least one piece missing from that last sentence.
Of course there's always Perl... :)

Despite the statement of your objective it isn't clear to me what you
think you're going to achieve. My expectation would be a very large
increase in the false positive rates if you attempt to use signatures
modified in the way you describe. Can you be more specific? Define
'appropriate' and 'useful' in this context for example. If you are
just looking for the 'names' of the viruses then forget it, there is
no common naming scheme which is globally accepted. Individuals and
organizations pick names as they find new threats, and within a very
short time of their first appearance it is common for threats to be
given a few different names by several anti-virus product suppliers.

Generating signatures for scanning for malicious software is not a
simple task, and there is considerable literature available on it.

http://www.google.com/#hl=en&output=search&q=generating+virus+signatures

--

73,
Ged.
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml


mbroekman at maileig

Aug 29, 2012, 6:46 AM

Post #3 of 10 (1432 views)
Permalink
Re: Generating signatures for malware [In reply to]

> -----Original Message-----
> Despite the statement of your objective it isn't clear to me what you
> think you're going to achieve. My expectation would be a very large
> increase in the false positive rates if you attempt to use signatures
> modified in the way you describe. Can you be more specific? Define
> 'appropriate' and 'useful' in this context for example.

The rate of false positives is wholly dependent on the strings that you
are replacing with wildcards.

As an example, when generating signatures to identify phishing content
(say, content targeting bank customers), I wanted to be able to strip
out 'http://' (687474703a2f2f) and 'https://' (68747470733a2f2f) from
the hex dump (generated by sigtool) and replacing them with {7-8} (aka
WILDCARD LENGTH 7 - 8) because I don't care if the protocol in the
phishing content is http or https. This would remove 9 - 11 characters
with each replacement, allowing me to fit more of the hex dump into the
result signature which is limited to ~8k characters (including name,
file type, and offset).

Being able to replace these sorts of known strings automatically would
help speed the process of creating the signatures (which, as you
mentioned is a tough task as it is).

> If you are
> just looking for the 'names' of the viruses then forget it, there is
no
> common naming scheme which is globally accepted. Individuals and
> organizations pick names as they find new threats, and within a very
> short time of their first appearance it is common for threats to be
> given a few different names by several anti-virus product suppliers.

This has nothing to do with looking at the names of viruses. I'm only
concerned with looking at the output from sigtool --hex-dump and turning
it into a useful signature in a faster, more efficient manner.

--Maarten
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml


michael at orlitzky

Aug 29, 2012, 7:29 AM

Post #4 of 10 (1430 views)
Permalink
Re: Generating signatures for malware [In reply to]

On 08/29/2012 09:46 AM, Maarten Broekman wrote:
>> -----Original Message-----
>> Despite the statement of your objective it isn't clear to me what you
>> think you're going to achieve. My expectation would be a very large
>> increase in the false positive rates if you attempt to use signatures
>> modified in the way you describe. Can you be more specific? Define
>> 'appropriate' and 'useful' in this context for example.
>
> The rate of false positives is wholly dependent on the strings that you
> are replacing with wildcards.
>
> As an example, when generating signatures to identify phishing content
> (say, content targeting bank customers), I wanted to be able to strip
> out 'http://' (687474703a2f2f) and 'https://' (68747470733a2f2f) from
> the hex dump (generated by sigtool) and replacing them with {7-8} (aka
> WILDCARD LENGTH 7 - 8) because I don't care if the protocol in the
> phishing content is http or https. This would remove 9 - 11 characters
> with each replacement, allowing me to fit more of the hex dump into the
> result signature which is limited to ~8k characters (including name,
> file type, and offset).

I think he meant that {7-8}facebook.com matches,

* http://facebook.com
* https://facebook.com
* i go to facebook.com
* visit facebook.com
* ...

Whether or not that's a problem depends on context. I guess <a href="i
go to facebook.com"> is not so bad, but false positives are almost by
definition unintended consequences so I'd be careful.
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml


draynor at sourcefire

Aug 29, 2012, 8:24 AM

Post #5 of 10 (1428 views)
Permalink
Re: Generating signatures for malware [In reply to]

On Wed, Aug 29, 2012 at 10:29 AM, Michael Orlitzky <michael [at] orlitzky>wrote:

> On 08/29/2012 09:46 AM, Maarten Broekman wrote:
> >> -----Original Message-----
> >> Despite the statement of your objective it isn't clear to me what you
> >> think you're going to achieve. My expectation would be a very large
> >> increase in the false positive rates if you attempt to use signatures
> >> modified in the way you describe. Can you be more specific? Define
> >> 'appropriate' and 'useful' in this context for example.
> >
> > The rate of false positives is wholly dependent on the strings that you
> > are replacing with wildcards.
> >
> > As an example, when generating signatures to identify phishing content
> > (say, content targeting bank customers), I wanted to be able to strip
> > out 'http://' (687474703a2f2f) and 'https://' (68747470733a2f2f) from
> > the hex dump (generated by sigtool) and replacing them with {7-8} (aka
> > WILDCARD LENGTH 7 - 8) because I don't care if the protocol in the
> > phishing content is http or https. This would remove 9 - 11 characters
> > with each replacement, allowing me to fit more of the hex dump into the
> > result signature which is limited to ~8k characters (including name,
> > file type, and offset).
>
> I think he meant that {7-8}facebook.com matches,
>
> * http://facebook.com
> * https://facebook.com
> * i go to facebook.com
> * visit facebook.com
> * ...
>
> Whether or not that's a problem depends on context. I guess <a href="i
> go to facebook.com"> is not so bad, but false positives are almost by
> definition unintended consequences so I'd be careful.
> _______________________________________________
> Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
> http://www.clamav.net/support/ml
>

Are you hitting the maximum signature length of 8192? I suppose in that
case if you are trying to make room, then you intend to offset (pun
intended) the loss of precision in one part of the expression by being more
precise elsewhere with the extra bytes you could use elsewhere in the sig.
It sounds like a reasonable tradeoff to consider if your signature has
reached the limit, but I know of no tool or script to do it for you.

Dave R.

--
---
Dave Raynor
Sourcefire Vulnerability Research Team
draynor [at] sourcefire
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml


mbroekman at maileig

Aug 29, 2012, 9:15 AM

Post #6 of 10 (1428 views)
Permalink
Re: Generating signatures for malware [In reply to]

> -----Original Message-----
> > > The rate of false positives is wholly dependent on the strings
that
> > > you are replacing with wildcards.
> > >
> > > As an example, when generating signatures to identify phishing
> > > content (say, content targeting bank customers), I wanted to be
> able
> > > to strip out 'http://' (687474703a2f2f) and 'https://'
> > > (68747470733a2f2f) from the hex dump (generated by sigtool) and
> > > replacing them with {7-8} (aka WILDCARD LENGTH 7 - 8) because I
> > > don't care if the protocol in the phishing content is http or
> https.
> > > This would remove 9 - 11 characters with each replacement,
allowing
> > > me to fit more of the hex dump into the result signature which is
> > > limited to ~8k characters (including name, file type, and offset).
> >
> > I think he meant that {7-8}facebook.com matches,
> >
> > * http://facebook.com
> > * https://facebook.com
> > * i go to facebook.com
> > * visit facebook.com
> > * ...
> >
> > Whether or not that's a problem depends on context. I guess <a
> href="i
> > go to facebook.com"> is not so bad, but false positives are almost
by
> > definition unintended consequences so I'd be careful.
>
> Are you hitting the maximum signature length of 8192? I suppose in
that
> case if you are trying to make room, then you intend to offset (pun
> intended) the loss of precision in one part of the expression by being
> more precise elsewhere with the extra bytes you could use elsewhere in
> the sig.
> It sounds like a reasonable tradeoff to consider if your signature has
> reached the limit, but I know of no tool or script to do it for you.

Exactly. Some of the phishing content that I'm finding is resulting in
hex dumps in the 10k+ character range and I think it's more dangerous to
replace sections with '*' than to replace certain substrings with
specific length wildcards.

--Maarten
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml


ged at jubileegroup

Aug 30, 2012, 4:21 AM

Post #7 of 10 (1417 views)
Permalink
Re: Generating signatures for malware [In reply to]

Hello again,

On Thu, 30 Aug 2012, Maarten Broekman wrote:

> Some of the phishing content that I'm finding is resulting in hex
> dumps in the 10k+ character range and I think it's more dangerous to
> replace sections with '*' than to replace certain substrings with
> specific length wildcards.

This brings to mind a large proportion of our customers, who will
happily send us a four megabyte PDF file to order a pack of CDs.

I think it calls for a complete re-think.

It seems to me that if signatures are of that size there must be a
great deal of redundancy in them, and it might well be indicative of a
flaw in the process design.

I imagine that removing redundancy effectively will not be a matter of
tinkering with a few character strings, but of tackling the issue more
directly, possibly mathematically.

Please would someone explain to me the use of "{7-8}"? I do not
recognize it as valid regular expression syntax.

According to the current ClamAV documentation (15 May 2012) repeat
character counts are not supported:

http://www.clamav.net/doc/latest/phishsigs_howto.pdf

--

73,
Ged.
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml


mbroekman at maileig

Aug 30, 2012, 7:18 AM

Post #8 of 10 (1420 views)
Permalink
Re: Generating signatures for malware [In reply to]

> -----Original Message-----
> > Some of the phishing content that I'm finding is resulting in hex
> > dumps in the 10k+ character range and I think it's more dangerous to
> > replace sections with '*' than to replace certain substrings with
> > specific length wildcards.
>
> Please would someone explain to me the use of "{7-8}"? I do not
> recognize it as valid regular expression syntax.
>
> According to the current ClamAV documentation (15 May 2012) repeat
> character counts are not supported:
>
> http://www.clamav.net/doc/latest/phishsigs_howto.pdf

I see where your confusion comes from. I'm not generating pdb
signatures. I'm generating ndb signatures via 'sigtool --hex-dump' on
the normalized output from clamscan --debug --leave-temps <filename>.

In the ndb file, {7-8} matches any 7 or 8 character string.

--Maarten
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml


dennispe at inetnw

Aug 30, 2012, 11:37 AM

Post #9 of 10 (1413 views)
Permalink
Re: Generating signatures for malware [In reply to]

On 8/30/12 4:21 AM, G.W. Haywood wrote:

>
> Please would someone explain to me the use of "{7-8}"? I do not
> recognize it as valid regular expression syntax.

Here is an example used in a Sane Security signature:

http://sane.mxuptime.com/s.aspx?id=Sanesecurity.Phishing.Auction.1749

It is an offset range that in this case looks for a gif file preceded by a lot
of characters. There is an anchoring context in the pattern that prevents to a
degree spilling out of the intended search realm into the body of the message.

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml


ged at jubileegroup

Aug 31, 2012, 5:18 AM

Post #10 of 10 (1403 views)
Permalink
Re: Generating signatures for malware [In reply to]

Hi there,

On Fri, 31 Aug 2012, Maarten Broekman wrote:

> I see where your confusion comes from. I'm not generating pdb
> signatures. I'm generating ndb signatures ...

Sorry, bit of a senior moment there. They seem to be creeping up on
me lately. :( I had to go back and read

http://www.clamav.net/doc/latest/signatures.pdf

again.

I'm still perplexed by the numbers here. You say that you have
signatures of the order of 8k characters, and you want to save (O)10
characters here and there in the signatures. It seems like you're
fighting an uphill battle, what else am I missing? Have you estimated
the gains you're going to be able to make? How many occurrences of
the target replacements do you expect to find in the signatures?

A *long* time ago I was faced with something superficially similar, in
the context of trying to fit the descriptions for 50,000+ stationery
products into 40 character strings. Descriptions were abbreviated,
ad-hoc, apparently by careless staff for whom English was at best a
second language. A very large number of corrections was necessary.
It was a nightmare, and it needed to be done four times per annum, so
I wrote a simple parser in Perl. Amongst other things, it used a kind
of 'thesaurus' of text strings. Here's a brief extract:
...
*B/FILE
*BXFILE
*BOXFILE
BOX FILE
*BRACKETS
*BRCK
BRACKET
...

The asterisk is just a character which didn't often appear in the
input descriptions. Your thesaurus would probably look something like
...
*hyyp://
*hyyps://
{7-8}
...

It's a very simple idea. The input is a catalogue which contains tens
of thousands of single-line descriptions of products. A description
line is matched against the thesaurus. If a string is found in the
line which matches one of the strings in the thesaurus which you see
prefixed by an asterisk, then it is replaced by the string following
next in the thesaurus which is not prefixed by an asterisk. It's an
easy thing to do in Perl, but if Perl isn't your second language you
might find it testing. If it's of interest please give me some more
examples of your replacement requirements and I'll dust off the code.

--

73,
Ged.
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

ClamAV users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.