talklists at newgeo
Oct 4, 2009, 4:34 PM
Post #5 of 6
Thanks! I think I need t just jump in and see how it works. Your
Re: General questions, analog, summary, large log sets, incremental
[In reply to]
pointing me to the form base demand method is good. That may come in
handy as then I am only using CPU cycles as they are needed. A few
On Oct 4, 2009, at 3:15 PM, Aengus wrote:
> On 10/4/2009 4:47 PM, Scott Haneda wrote:
>>>> Every Apache server we have logs all hits to one log, which is
>>>> nightly. Summary used built in ftp to pull down only the new log
>>>> files. Is there a provision to get logs from remote machines, or
>>>> I need to look at something like rsync to make this happen?
>>> Analog just analyzes the logfiles, it doesn't do any logfile
>>> "management", so you'd have to handle that yourself.
>> What would be the best way to manage this then.
> How long is a piece of string? Different people will set it up in
> different ways.
Generally, my strings are pretty long, though I often end up cutting
them short, as I get tired of measuring them all the time :)
>> Consider a system where there acre apache access_logs from 10
>> machines. There is an 11th machine that will do analog. I have a
>> log rolling on 24 hours, which means, I could rsync the remote logs
>> directories of the 10 machines and keep all 24 hour log files up to
>> date. However, the like log, access_log, that is in progress,
>> needs to come over just before analog runs. This, with
>> incremental, gives the client, what appears to be near real time
> It really depends on the size of the logfiles. When the client is
> looking for "real time" stats, are they just interested in the last
> hours worth of activity?
I am only familiar with Summary, and as I said, wanted to get away
from it. Not because it is doing anything wrong, but I strongly
believe that I should not have to pay for an update just to get new
user agent strings. Every time a new iPhone, or browser comes out,
Summary will not know about it. Yes, I get some new features as well,
but the user agent and a few other things are such moving targets,
this really needs to be a file that can be user maintained.
When I say "real time" I more mean, around a 5 minute delay, which is
how I was able to work this with Summary. I could turn on
incremental, and a 5 minute schedule would be able to parse all the
log data in very short time, well under the 5 minutes it would be
before the next schedule was due.
Now, if I had to reprocess the entire batch, and was near the end of
the year, as I keep a year worth live at all times, that would take
about a hour. Keeping in mind, this was on an older Power PC
PowerMac, 800Mhz CPU upgrade, so some pretty slow stuff.
> Rather than having a machine churning away 24 hours a day generating
> "real time" charts that get over-written every 5 minutes, I'd be
> more inclined to use something like the Analog Form interface to
> allow the user to generate the report "on demand".
In the case of Summary, it was a relatively small CPU spike for a very
short time, but I do thank you for pointing me to this on demand form
interface, it seems in theory, to be a much smarter way to deal with
this. As you mention below, not all my clients even use the stats,
but they all have the ability, so those that do not need them, I may
as well not process those.
>> I could run rsync every 4 minutes, and have analog run very 5, but
>> this is a poor method, as times get out of sync, some logs are
>> larger than others etc. I am going to assume analog is triggered
>> by scheduler?
> You can trigger it by scheduler, or manually (though a cgi-type
> form, in this case - http://analog.cx/docs/form.html)
Perfect, thank you. I think the last time I used Analog was to parse
out mail server logs, it supported an obscure email server out of the
box as well. I am pretty sure I can configure analog to get to where
I need to be.
>> I understand analog is one of the most popular, though if it is not
>> a good suit for a large shared hosting environment, please let me
>> know. I have seen where logs are dropped into the virtual hosts
>> client directory, and analog is set as an option to point to just
>> that users files. I however, prefer to parse out my entire
>> facilities worth o f logs.
> Analog can generate reports for the whole facility from a set of
> "combined" logs, or from a bunch of "per host" logs - it's simply a
> matter of configuration. If you're going to allow the user to
> customize their own reports, there's less chance of inadvertently
> giving them access to someone else's log data if you generate
> separate logfiles, but it's really just a matter of preference.
I generally lock off a lot of reports, as I will spend too much time
tech supporting users explaining what each means. They see "error" or
"hijacking" and think there is something wrong on my end, so it is
best to give them just what they need to be able to detect big
mistakes, and not too much, as it will be a burden our our support
team to answer basic questions.
I do need to make sure there is no pollution of one clients log data
to anothers. I log the host request header, (virtual host name) so as
long as I can limit the report by that, I should be fine. I will have
to find out how to lock out that setting fro being changes by the
user, or if they were to guess a host name on the machine, they could
gain a lot of sensitive data.
I will start with the install, and a test log set, and see where I can
get with the docs. You have answered my basic set of questions, which
tell me analog will work for my needs, it just depends on how long a
string I am willing to maintain.
> Analog is extremely flexible, and is often used in large hosted
> environments. But there isn't one "right" way to deploy it - it
> really does depend on what you want to achieve.
Thank you, time to spend some time in the docs.
I again thank you for your time especially over a weekend.
Scott * If you contact me off list replace talklists@ with scott@ *
| TO UNSUBSCRIBE from this list:
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general