Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Analog: Help

Escaping Quotes in Logs

 

 

Analog help RSS feed   Index | Next | Previous | View Threaded


roberto.j.hoyle at dartmouth

Feb 26, 2008, 8:41 AM

Post #1 of 6 (495 views)
Permalink
Escaping Quotes in Logs

I have a log entry of the form:

"Entry"

however, the Entry above may have escaped quotes (\") in it.

Is there a way to have Analog differentiate between escaped quotes and
regular quotes in it's parsing?

I could change the quotes from the LogFormat entry in Apache, but
there really isn't an ASCII character that cannot appear in a query
string, so I'd just be shifting the problem around.

Thanks,

r.
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------


analog07 at eircom

Feb 26, 2008, 9:19 AM

Post #2 of 6 (464 views)
Permalink
Re: Escaping Quotes in Logs [In reply to]

Roberto Hoyle <roberto.j.hoyle[at]dartmouth.edu> wrote:
> I have a log entry of the form:
>
> "Entry"
>
> however, the Entry above may have escaped quotes (\") in it.
>
> Is there a way to have Analog differentiate between escaped quotes and
> regular quotes in it's parsing?

?? Can you expand on your description the problem? I don't understand what you're trying to do, and what you expect to get, versus what you're actually getting.

Aengus

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------


roberto.j.hoyle at dartmouth

Feb 26, 2008, 11:32 AM

Post #3 of 6 (466 views)
Permalink
Re: Escaping Quotes in Logs [In reply to]

On Feb 26, 2008, at 12:19 PM, Aengus wrote:

> Roberto Hoyle <roberto.j.hoyle[at]dartmouth.edu> wrote:
>> I have a log entry of the form:
>>
>> "Entry"
>>
>> however, the Entry above may have escaped quotes (\") in it.
>>
>> Is there a way to have Analog differentiate between escaped quotes
>> and
>> regular quotes in it's parsing?
>
> ?? Can you expand on your description the problem? I don't
> understand what you're trying to do, and what you expect to get,
> versus what you're actually getting.

A query string can have quotes in it, and the log file will contain
the quotes with a preceding '\':

"http://ry2ue4ek7d.search.serialssolutions.com/?sid=HWW:HUMAB&genre=article&pid=
<an>199728800301005</an>&aulast=Callahan&aufirst=John
+F.&issn=0041-462X&title=Twentieth+Century+Literature&stitle=Twentieth
+Century+Lit&atitle=F.+Scott+Fitzgerald's+evolving+American+dream:+the+
\"pursuit+of+happiness\"+in+Gatsby,+Tender+is+the+night,+and+The+last
+tycoon&volume=42&spage=374&epage=395&date=1996&ssn=fall"

Note that "pursuit of happiness" was entered as a search term, and
therefore gets escaped (\") and has the spaces replaced by '+".

Is there a way for Analog to deal with the entire line, instead of
stopping at the first '"' character, if I try to analyze the query
parameters?

r.
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------


analog-author at lists

Feb 26, 2008, 12:41 PM

Post #4 of 6 (465 views)
Permalink
Re: Escaping Quotes in Logs [In reply to]

No, sorry, there's no way to tell analog that.

--
Stephen Turner
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------


jeremy at 7simplemachines

Feb 26, 2008, 5:10 PM

Post #5 of 6 (462 views)
Permalink
RE: Escaping Quotes in Logs [In reply to]

The format for these parameters is not typical for a web log. Usually
the query string is URL-escaped. In that case quote characters are
converted to their hex equivalent. In your example I would expect
something more like this:

F%2E+Scott+Fitzgerald%27s+evolving+American+dream:+the+
%22pursuit+of+happiness%22+in+Gatsby%2C+Tender+is+the+night

In this case Analog can parse the file just fine. For your files you
will probably need to pre-process the lines to convert them to something
Analog can support.

Thanks,

Jeremy Wadsack
Seven Simple Machines

-----Original Message-----
From: analog-help-bounces[at]lists.meer.net
[mailto:analog-help-bounces[at]lists.meer.net] On Behalf Of Roberto Hoyle
Sent: Tuesday, February 26, 2008 11:32 AM
To: Support for analog web log analyzer
Subject: Re: [analog-help] Escaping Quotes in Logs

On Feb 26, 2008, at 12:19 PM, Aengus wrote:

> Roberto Hoyle <roberto.j.hoyle[at]dartmouth.edu> wrote:
>> I have a log entry of the form:
>>
>> "Entry"
>>
>> however, the Entry above may have escaped quotes (\") in it.
>>
>> Is there a way to have Analog differentiate between escaped quotes
>> and
>> regular quotes in it's parsing?
>
> ?? Can you expand on your description the problem? I don't
> understand what you're trying to do, and what you expect to get,
> versus what you're actually getting.

A query string can have quotes in it, and the log file will contain
the quotes with a preceding '\':

"http://ry2ue4ek7d.search.serialssolutions.com/?sid=HWW:HUMAB&genre=arti
cle&pid=
<an>199728800301005</an>&aulast=Callahan&aufirst=John
+F.&issn=0041-462X&title=Twentieth+Century+Literature&stitle=Twentieth
+Century+Lit&atitle=F.+Scott+Fitzgerald's+evolving+American+dream:+the+
\"pursuit+of+happiness\"+in+Gatsby,+Tender+is+the+night,+and+The+last
+tycoon&volume=42&spage=374&epage=395&date=1996&ssn=fall"

Note that "pursuit of happiness" was entered as a search term, and
therefore gets escaped (\") and has the spaces replaced by '+".

Is there a way for Analog to deal with the entire line, instead of
stopping at the first '"' character, if I try to analyze the query
parameters?

r.
+-----------------------------------------------------------------------
-
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+-----------------------------------------------------------------------
-

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------


roberto.j.hoyle at dartmouth

Feb 27, 2008, 9:42 AM

Post #6 of 6 (452 views)
Permalink
Re: Escaping Quotes in Logs [In reply to]

On Feb 26, 2008, at 8:10 PM, Jeremy Wadsack wrote:

> The format for these parameters is not typical for a web log. Usually
> the query string is URL-escaped. In that case quote characters are
> converted to their hex equivalent. In your example I would expect
> something more like this:
>
> F%2E+Scott+Fitzgerald%27s+evolving+American+dream:+the+
> %22pursuit+of+happiness%22+in+Gatsby%2C+Tender+is+the+night
>
> In this case Analog can parse the file just fine. For your files you
> will probably need to pre-process the lines to convert them to
> something
> Analog can support.

From the Apache 2.2 documentation:

Some Notes

For security reasons, starting with version 2.0.46, non-printable and
other special characters in %r, %i and %o are escaped using \xhh
sequences, where hh stands for the hexadecimal representation of the
raw byte. Exceptions from this rule are " and \, which are escaped by
prepending a backslash, and all whitespace characters, which are
written in their C-style notation (\n, \t, etc). In versions prior to
2.0.46, no escaping was performed on these strings so you had to be
quite careful when dealing with raw log files.

So, for Apache, at least, what I'm seeing is expected behavior.
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------

Analog help RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.