Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: ModPerl: Docs-dev

adding search?

 

 

ModPerl docs-dev RSS feed   Index | Next | Previous | View Threaded


stas at stason

Jan 25, 2002, 9:20 PM

Post #1 of 20 (1060 views)
Permalink
adding search?

Bill, I guess this is a question mainly to you as you are somewhat
involved with apache.org search. I said that we add the search widget
but keep it hidden for now since we don't have a good search facility.
May be we can have the search facility? What do you say?


_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas [at] stason http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


moseley at hank

Jan 25, 2002, 11:40 PM

Post #2 of 20 (1040 views)
Permalink
Re: adding search? [In reply to]

At 12:20 PM 1/26/2002 +0800, Stas Bekman wrote:
>Bill, I guess this is a question mainly to you as you are somewhat
>involved with apache.org search. I said that we add the search widget
>but keep it hidden for now since we don't have a good search facility.
>May be we can have the search facility? What do you say?

Sure, no problem. I've got a generic search script that works with TT, so
we should be able to use the same templates as the rest of the site.





Bill Moseley
mailto:moseley [at] hank

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


stas at stason

Jan 25, 2002, 11:57 PM

Post #3 of 20 (1041 views)
Permalink
Re: adding search? [In reply to]

Bill Moseley wrote:

> At 12:20 PM 1/26/2002 +0800, Stas Bekman wrote:
>
>>Bill, I guess this is a question mainly to you as you are somewhat
>>involved with apache.org search. I said that we add the search widget
>>but keep it hidden for now since we don't have a good search facility.
>>May be we can have the search facility? What do you say?
>>
>
> Sure, no problem. I've got a generic search script that works with TT, so
> we should be able to use the same templates as the rest of the site.

You mean a real search engine, with indexing and all the good search
engine bells and whistles?


_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas [at] stason http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


moseley at hank

Jan 26, 2002, 12:11 AM

Post #4 of 20 (1041 views)
Permalink
Re: adding search? [In reply to]

At 02:57 PM 1/26/2002 +0800, Stas Bekman wrote:
>Bill Moseley wrote:
>
>> At 12:20 PM 1/26/2002 +0800, Stas Bekman wrote:
>>
>>>Bill, I guess this is a question mainly to you as you are somewhat
>>>involved with apache.org search. I said that we add the search widget
>>>but keep it hidden for now since we don't have a good search facility.
>>>May be we can have the search facility? What do you say?
>>>
>>
>> Sure, no problem. I've got a generic search script that works with TT, so
>> we should be able to use the same templates as the rest of the site.
>
>You mean a real search engine, with indexing and all the good search
>engine bells and whistles?

I think you have to pay a lot of money for a real search engine. But I can
get you an ok one for a reasonable price like free. What kind of bells and
whistles were you looking for?




Bill Moseley
mailto:moseley [at] hank

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


stas at stason

Jan 26, 2002, 1:09 AM

Post #5 of 20 (1041 views)
Permalink
Re: adding search? [In reply to]

Bill Moseley wrote:

> At 02:57 PM 1/26/2002 +0800, Stas Bekman wrote:
>
>>Bill Moseley wrote:
>>
>>
>>>At 12:20 PM 1/26/2002 +0800, Stas Bekman wrote:
>>>
>>>
>>>>Bill, I guess this is a question mainly to you as you are somewhat
>>>>involved with apache.org search. I said that we add the search widget
>>>>but keep it hidden for now since we don't have a good search facility.
>>>>May be we can have the search facility? What do you say?
>>>>
>>>>
>>>Sure, no problem. I've got a generic search script that works with TT, so
>>>we should be able to use the same templates as the rest of the site.
>>>
>>You mean a real search engine, with indexing and all the good search
>>engine bells and whistles?
>>
>
> I think you have to pay a lot of money for a real search engine. But I can
> get you an ok one for a reasonable price like free. What kind of bells and
> whistles were you looking for?


Of course, Bill, I was just asking which search engine you are talking
about :)

If you remember we've been through this discussion before when we were
looking for the search engine for the guide. And these two: nextrieve
and swish-e were found the best options:
http://perl.apache.org/guide/#search

The main criteria was being able to search for perl code. Well, you
remember this right? Or we could dig up the thread from a year ago or so.

_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas [at] stason http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


moseley at hank

Jan 26, 2002, 6:37 AM

Post #6 of 20 (1042 views)
Permalink
Re: adding search? [In reply to]

At 04:09 PM 1/26/2002 +0800, Stas Bekman wrote:

>Of course, Bill, I was just asking which search engine you are talking
>about :)

Yes, I know. ;)

>If you remember we've been through this discussion before when we were
>looking for the search engine for the guide. And these two: nextrieve
>and swish-e were found the best options:

>The main criteria was being able to search for perl code. Well, you
>remember this right? Or we could dig up the thread from a year ago or so.

I remember the discussion we had. You asked me to get the swish config
file from Randy and IIRC, it was just a standard setup.

With swish, you define at indexing time what makes up a word. Text is a
lot easier, of course, than code, especially if people use different coding
styles.

We could create a second index that uses white space only to separate
words, which might make searching perl code a bit easier. It would be
helpful to see what kind of things to search for.

But then if you were looking for $| you could find "$| = 1;" but not "$|++".

Or, perhaps, have a mode that simply uses a perl regular expression and do
a brut force grep search. Slow, but the site is not that large, especially
if it was limited to just the docs section.

All the reverse indexing engines will parse on indexing, so it will always
be an issue of defining what makes up a word.

Let me ask Avi Rappoport if there's something good for searching code.

Bill Moseley
mailto:moseley [at] hank

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


stas at stason

Jan 26, 2002, 9:00 AM

Post #7 of 20 (1045 views)
Permalink
Re: adding search? [In reply to]

>>If you remember we've been through this discussion before when we were
>>looking for the search engine for the guide. And these two: nextrieve
>>and swish-e were found the best options:
>>
>
>>The main criteria was being able to search for perl code. Well, you
>>remember this right? Or we could dig up the thread from a year ago or so.
>>
>
> I remember the discussion we had. You asked me to get the swish config
> file from Randy and IIRC, it was just a standard setup.


Yup.


> With swish, you define at indexing time what makes up a word. Text is a
> lot easier, of course, than code, especially if people use different coding
> styles.
>
> We could create a second index that uses white space only to separate
> words, which might make searching perl code a bit easier. It would be
> helpful to see what kind of things to search for.
>
> But then if you were looking for $| you could find "$| = 1;" but not "$|++".


that's not good then.


> Or, perhaps, have a mode that simply uses a perl regular expression and do
> a brut force grep search. Slow, but the site is not that large, especially
> if it was limited to just the docs section.


I think you underestimate the size of the site:

% find src -name "*pod" | xargs du -c |grep total
3172
total
% find src -name "*pod" | wc -l
134

so we have about 3MB of source code in 134 files (and will be more
likely 6MB, when 2.0 docs are done, with 200+ files). Do you think it's
possible to grep through in a reasonable response time? Remember that
there will be a lot of IO for opening and closing many files.


> All the reverse indexing engines will parse on indexing, so it will always
> be an issue of defining what makes up a word.
>
> Let me ask Avi Rappoport if there's something good for searching code.

I think that Randy's setup was quite satisfying, but nextrieve was even
better. What do you think about nextrieve?

_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas [at] stason http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


moseley at hank

Jan 26, 2002, 10:10 AM

Post #8 of 20 (1041 views)
Permalink
Re: adding search? [In reply to]

At 12:00 AM 01/27/02 +0800, Stas Bekman wrote:
>so we have about 3MB of source code in 134 files (and will be more
>likely 6MB, when 2.0 docs are done, with 200+ files). Do you think it's
>possible to grep through in a reasonable response time? Remember that
>there will be a lot of IO for opening and closing many files.

It's not like mod_perl is a high volume site. And it's running on a lot
faster machine than my machine:

~/modperl-docs > find src -name '*.pod' | wc -l
105

~/modperl-docs > time find src -name '*.pod' | xargs fgrep '$|' | wc -l
23

real 0m0.033s
user 0m0.030s
sys 0m0.010s

That seems reasonable enough, even if it was ten times slower.


>> All the reverse indexing engines will parse on indexing, so it will always
>> be an issue of defining what makes up a word.
>>
>> Let me ask Avi Rappoport if there's something good for searching code.
>
>I think that Randy's setup was quite satisfying, but nextrieve was even
>better. What do you think about nextrieve?

I don't know much about it. It's not open source, and it's not free. I
really doubt it integrates with Template Toolkit.

Could we feed the pod source into Parse::RecDescent and get it to tokenize
perl code? That would be more fun.


--
Bill Moseley
mailto:moseley [at] hank

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


domm at zsi

Jan 26, 2002, 6:28 PM

Post #9 of 20 (1042 views)
Permalink
Re: adding search? [In reply to]

Hi!

I use SWISH for most of my search engines and find it very useful. I'd
suggest using it.

On Sat, Jan 26, 2002 at 05:37:09AM -0800, Bill Moseley wrote:
> But then if you were looking for $| you could find "$| = 1;" but not "$|++".
What about appending * to every query by default, so that a search for '$|'
gets turned into a search for '$|*', which should find both examples?


--
D_OMM +----> http://domm.zsi.at <-----+
O_xyderkes | neu: Arbeitsplatz |
M_echanen | http://domm.zsi.at/d/d162.html |
M_asteuei +--------------------------------+



---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


stas at stason

Jan 26, 2002, 8:16 PM

Post #10 of 20 (1043 views)
Permalink
Re: adding search? [In reply to]

Bill Moseley wrote:

> At 12:00 AM 01/27/02 +0800, Stas Bekman wrote:
>
>>so we have about 3MB of source code in 134 files (and will be more
>>likely 6MB, when 2.0 docs are done, with 200+ files). Do you think it's
>>possible to grep through in a reasonable response time? Remember that
>>there will be a lot of IO for opening and closing many files.
>>
>
> It's not like mod_perl is a high volume site. And it's running on a lot
> faster machine than my machine:
>
> ~/modperl-docs > find src -name '*.pod' | wc -l
> 105
>
> ~/modperl-docs > time find src -name '*.pod' | xargs fgrep '$|' | wc -l
> 23
>
> real 0m0.033s
> user 0m0.030s
> sys 0m0.010s
>
> That seems reasonable enough, even if it was ten times slower.


Hmm, you were trying this on uloaded machine, right? If you have many
parallel searches and other tasks running this can be much much slower, no?

Also remember that user doesn't care about CPU clocks, but elapsed
wallclock.

Also which OS/distro are you running this at? how time gets through the
pipe? It doesn't work for me. If I try:

time find src -name '*.pod' -exec fgrep -l '$|' {} \;
src/docs/2.0/devel/debug_c/debug_c.pod
src/docs/2.0/devel/testing/testing.pod
src/docs/1.0/faqs/cgi_to_mod_perl.pod
src/docs/1.0/guide/control.pod
src/docs/1.0/guide/debug.pod
src/docs/1.0/guide/perl.pod
src/docs/1.0/guide/performance.pod
src/docs/1.0/guide/porting.pod
src/docs/1.0/guide/scenario.pod
src/docs/1.0/guide.good/control.pod
src/docs/1.0/guide.good/debug.pod
src/docs/1.0/guide.good/perl.pod
src/docs/1.0/guide.good/performance.pod
src/docs/1.0/guide.good/porting.pod
src/docs/1.0/guide.good/scenario.pod
0.120u 0.170s 0:00.31 93.5% 0+0k 0+0io 18193pf+0w

as you can see it's much slower.


>>>All the reverse indexing engines will parse on indexing, so it will always
>>>be an issue of defining what makes up a word.
>>>
>>>Let me ask Avi Rappoport if there's something good for searching code.
>>>
>>I think that Randy's setup was quite satisfying, but nextrieve was even
>>better. What do you think about nextrieve?
>>
>
> I don't know much about it. It's not open source, and it's not free. I
> really doubt it integrates with Template Toolkit.


Ah, OK. I didn't know that.


> Could we feed the pod source into Parse::RecDescent and get it to tokenize
> perl code? That would be more fun.

I guess so, but from what I know, Parse::RecDescent is not good for
real-time processing because it's very slow. Rememember that it stores
the parsed tree using Perl datastructures, which is very ineffective. I
don't know if it was rewritten to use C datastructures since last year.

_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas [at] stason http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


moseley at hank

Jan 26, 2002, 8:41 PM

Post #11 of 20 (1043 views)
Permalink
Re: adding search? [In reply to]

At 11:16 AM 01/27/02 +0800, Stas Bekman wrote:
>Hmm, you were trying this on uloaded machine, right? If you have many
>parallel searches and other tasks running this can be much much slower, no?

I didn't see much difference on apache.org:

moseley [at] daedalu:/home/stas/apache.org/modperl-docs >time find src -name
'*.pod' | xargs fgrep '$|'
src/docs/1.0/faqs/cgi_to_mod_perl.pod: local $| = 1;
src/docs/1.0/guide/control.pod: ($name) = $0 =~ m|([^/]+)$|;
src/docs/1.0/guide/debug.pod: $| = 1;
src/docs/1.0/guide/debug.pod: $|=1;
src/docs/1.0/guide/debug.pod:I<http://localhost/perl/debug/perl_trace.pl>,
we have used C<$|=1;>
src/docs/1.0/guide/debug.pod: $|=1;
src/docs/1.0/guide/debug.pod:that I've made STDOUT unbuffered with C<$|=1;>
so I will immediately
src/docs/1.0/guide/performance.pod: local $| = 1;
src/docs/1.0/guide/performance.pod: local $| = 1;
src/docs/1.0/guide/performance.pod:The localized setting of C<$|=1>
unbuffers the STDERR stream, so we
src/docs/1.0/guide/performance.pod: local $| = 1;
src/docs/1.0/guide/performance.pod:=head2 Using $|=1 Under mod_perl and
Better print() Techniques.
src/docs/1.0/guide/performance.pod:As you know, C<local $|=1;> disables the
buffering of the currently
src/docs/1.0/guide/performance.pod:performance degradation with C<$|=1>.
It also uses too many
src/docs/1.0/guide/performance.pod:Now let's go back to the C<$|=1> topic.
I still disable buffering,
src/docs/1.0/guide/perl.pod:Special Perl variables like C<$|> (buffering),
C<$^T> (script's start
src/docs/1.0/guide/porting.pod: local $| = 1;
src/docs/1.0/guide/scenario.pod: BEGIN{ $|=1 }
src/docs/1.0/guide/scenario.pod: $|=1;
src/docs/1.0/guide/scenario.pod:You must disable buffering. C<$|=1;> does
the job. If you do not
src/docs/2.0/devel/debug_c/debug_c.pod: BEGIN { $| = 1; print "1..1\n"; }
src/docs/2.0/devel/testing/testing.pod: $response_test =~
s|t/[^/]+/Test([^/]+)/(.*).pm$|t/\L$1\E/$2.t|;
src/docs/2.0/devel/testing/testing.pod: $response_test =~
s|.*/([^/]+)/(.*).pm$|/$1::$2|;

real 0m0.048s
user 0m0.015s
sys 0m0.036s

But Maybe I'm not using time correctly. Let's see...

moseley [at] daedalu:/home/stas/apache.org/modperl-docs >time find src -name
'*.pod' | xargs fgrep '$|' | sleep 5

real 0m5.009s
user 0m0.033s
sys 0m0.012s

Looks like it's counting total time.


>Also remember that user doesn't care about CPU clocks, but elapsed
>wallclock.

That's "real" time above. Again, how many searches a second are you
expecting? We aren't running google loads here. ;)


>Also which OS/distro are you running this at?

That's on my P550 Linux with an old Kernel. daedalus is always faster than
my machine -- even when it's load average is high. Dual 850s and fast SCSI
(and BSD's I/O) seems to do the trick.

>0.120u 0.170s 0:00.31 93.5% 0+0k 0+0io 18193pf+0w
>
>as you can see it's much slower.

>> Could we feed the pod source into Parse::RecDescent and get it to tokenize
>> perl code? That would be more fun.
>
>I guess so, but from what I know, Parse::RecDescent is not good for
>real-time processing because it's very slow. Rememember that it stores
>the parsed tree using Perl datastructures, which is very ineffective. I
>don't know if it was rewritten to use C datastructures since last year.

My idea was not for parsing while searching, rather parsing while indexing.
The parsing of a tiny query wouldn't take long. But I was also kind of
joking, but it would be interesting, too.

I'm not sure what to parse into. Should $x++ get parsed into "$x", "$x++",
or "$x and ++"?

Do a normal indexed search for most, but offer an option to search for perl
code and then use fgrep.


--
Bill Moseley
mailto:moseley [at] hank

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


stas at stason

Jan 26, 2002, 10:08 PM

Post #12 of 20 (1039 views)
Permalink
Re: adding search? [In reply to]

Bill Moseley wrote:


> real 0m0.048s
> user 0m0.015s
> sys 0m0.036s
>
> But Maybe I'm not using time correctly. Let's see...
>
> moseley [at] daedalu:/home/stas/apache.org/modperl-docs >time find src -name
> '*.pod' | xargs fgrep '$|' | sleep 5
>
> real 0m5.009s
> user 0m0.033s
> sys 0m0.012s
>
> Looks like it's counting total time.


Benchmark.pm?


>>Also remember that user doesn't care about CPU clocks, but elapsed
>>wallclock.
>>
>
> That's "real" time above. Again, how many searches a second are you
> expecting? We aren't running google loads here. ;)


Not many, but don't forget that apache.org every so often goes
overloaded. We don't have the whole machine dedicated to perl.apache.org.


>>>Could we feed the pod source into Parse::RecDescent and get it to tokenize
>>>perl code? That would be more fun.
>>>
>>I guess so, but from what I know, Parse::RecDescent is not good for
>>real-time processing because it's very slow. Rememember that it stores
>>the parsed tree using Perl datastructures, which is very ineffective. I
>>don't know if it was rewritten to use C datastructures since last year.
>>
>
> My idea was not for parsing while searching, rather parsing while indexing.
> The parsing of a tiny query wouldn't take long. But I was also kind of
> joking, but it would be interesting, too.


interesting, but not practical for real-time search. But as you said
it's fine for pre-parsing.


> I'm not sure what to parse into. Should $x++ get parsed into "$x", "$x++",
> or "$x and ++"?


I don't know.


> Do a normal indexed search for most, but offer an option to search for perl
> code and then use fgrep.

OK, so how hard would be to have a simple prototype to play with?



_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas [at] stason http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


moseley at hank

Jan 27, 2002, 3:00 PM

Post #13 of 20 (1043 views)
Permalink
Re: adding search? [In reply to]

At 01:08 PM 01/27/02 +0800, Stas Bekman wrote:
>> Do a normal indexed search for most, but offer an option to search for perl
>> code and then use fgrep.
>
>OK, so how hard would be to have a simple prototype to play with?

Which part?

Indexing the site with swish is easy and can be setup in a few minutes. I
just did it in my modperl-docs copy.

Once we have a script running we can worry about searching perl code.
fgrep might be a quick hack. I wouldn't worry about speed or machine load
at this time. For now, let's wait until we have a problem before we solve it.

The hard part, at least for me, is integrating with DocSet.

One problem, from what I can tell, will be that the template variables that
DocSet creates from parsing the source files are not available for CGI
scripts run outside of DocSet. So a CGI script will not be able to use the
same templates.

Also (minor point), with out looking too hard, I noticed that DocSet didn't
copy dot files (.htaccess .swishcgi.conf), and also didn't keep permissions
(so CGI scripts lost their execute bit). At least that's what it looked
like. Hum, I'm sure there's a way to do that, but out of time now.

Some minor details:

I assume spidering is the way to go (instead of scanning the "src" directory).
It makes most sense to index what people can actually get to by following
links.

Also, the templates should have processing directives for swish. This is
what I do:

...
<body>
<!-- SwishCommand noindex -->
...
<!-- SwishCommand index -->
[% content %]
<!-- SwishCommand noindex -->
...

Then swish is indexing the <head> section, and then only indexing content
(and not menu text and so on).

Stas, if you know how to deal with integrating a CGI into the templates
with DocSet, cool. But, I don't see how, unless DocSet is writing a file
that can be used by TT (PRE_PROCESS) to give the CGI script the same
template variables available when DocSet is running.

<just and idea>
One idea for the HTML output part of DocSet would be to write an
intermediate site ready for use with ttree. That might allow easier
integration with CGI scripts or just plain content that works outside of
DocSet, since not everything will be content. I think using ttree is
rather flexible.

In other words, let DocSet build it's data structures such as the overall
site menu (table of contents), with attached abstracts if needed, and write
that to a TT config file. Write the source docs out as if they were source
files for TT content -- or even as BLOCKs for use in other templates. Use
[% META %] tags at the top of each file to set UP, NEXT, PREV, breadcrumb,
and anything else that's page specific.

Then run ttree to generate the final site. The advantage is that the
DocSet site can be extended with source outside of the DocSet system. Not
to mention that things like CGI scripts can be processed with ttree, which
can PRE_PROCESS the config file created by DocSet so it can use the same
templates (and template variables) to format its output page the same way
DocSet pages would (i.e. with the same menu items). Also paths can be set
in the CGI script based on TT variables, instead of hard coded.

Or another idea: Don't you cache a lot of data? So maybe we just need a
plugin that reads that cache into template variables and use that with CGI
scripts outside of DocSet.

I like the first idea better, as it separates TT from DocSet. Plus it
allows a TT driven site to be enhanced by DocSet instead of trying to make
DocSet do everything.
</>

But, of course, you probably have already thought how to solve this.


--
Bill Moseley
mailto:moseley [at] hank

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


stas at stason

Jan 27, 2002, 6:38 PM

Post #14 of 20 (1043 views)
Permalink
Re: adding search? [In reply to]

Bill Moseley wrote:

> At 01:08 PM 01/27/02 +0800, Stas Bekman wrote:
>
>>>Do a normal indexed search for most, but offer an option to search for perl
>>>code and then use fgrep.
>>>
>>OK, so how hard would be to have a simple prototype to play with?
>>
>
> Which part?
>
> Indexing the site with swish is easy and can be setup in a few minutes. I
> just did it in my modperl-docs copy.


cool!


> Once we have a script running we can worry about searching perl code.
> fgrep might be a quick hack. I wouldn't worry about speed or machine load
> at this time. For now, let's wait until we have a problem before we solve it.


ok


> The hard part, at least for me, is integrating with DocSet.
>
> One problem, from what I can tell, will be that the template variables that
> DocSet creates from parsing the source files are not available for CGI
> scripts run outside of DocSet. So a CGI script will not be able to use the
> same templates.


I don't understand why do you need the template variables. Why swish
cannot work with the static content?


> Also (minor point), with out looking too hard, I noticed that DocSet didn't
> copy dot files (.htaccess .swishcgi.conf),


Did you specify these files in the copy_glob attribute?

> and also didn't keep permissions
> (so CGI scripts lost their execute bit). At least that's what it looked
> like. Hum, I'm sure there's a way to do that, but out of time now.


That should be easily fixable. I use File::Copy::copy to copy the files,
so I think I need to move it to File::Copy::syscopy. Can you try if it
does the trick? see DocSet/Util.pm


> Some minor details:
>
> I assume spidering is the way to go (instead of scanning the "src" directory).
> It makes most sense to index what people can actually get to by following
> links.
>
> Also, the templates should have processing directives for swish. This is
> what I do:
>
> ...
> <body>
> <!-- SwishCommand noindex -->
> ...
> <!-- SwishCommand index -->
> [% content %]
> <!-- SwishCommand noindex -->
> ...
>
> Then swish is indexing the <head> section, and then only indexing content
> (and not menu text and so on).


that's very easy, just put these into tmpl/custom/html/page_body
next to the #render the content comment.


> Stas, if you know how to deal with integrating a CGI into the templates
> with DocSet, cool. But, I don't see how, unless DocSet is writing a file
> that can be used by TT (PRE_PROCESS) to give the CGI script the same
> template variables available when DocSet is running.


I still don't understand what are you trying to do here.

Though, all the linkage, titles, abstract info is cached already and
used by the CacheNavigate to build all the index.html files, menus and
pre|next navigations. Without it you will have to rebuild the whole site
when you change one file.


[.snipped the ideas, which we will return to once I understand what are they trying to solve :)]



--


_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas [at] stason http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


moseley at hank

Jan 27, 2002, 6:57 PM

Post #15 of 20 (1043 views)
Permalink
Re: adding search? [In reply to]

At 09:38 AM 01/28/02 +0800, Stas Bekman wrote:
>I don't understand why do you need the template variables. Why swish
>cannot work with the static content?

If it's going to use the same templates as the site to build it's page then
it needs the template variables to build the menus, right? The menu is
generated by DocSet, no?

In DocSet each page is generated with TT, and DocSet sets the template
variables for TT to use. For example, top_level_menu needs data to
generate the menu, right? But if a CGI script is running there's not data
to generate the menu -- that data is only known by DocSet.

So, for the CGI script to generate a page that fits into the site (with the
side menu) it will need to somehow load the same data into TT that DocSet
did. And the two ways I can think of that is

1) had DocSet write a TT config file that can be loaded by the CGI script
with a PRE_PROCESS, or

2) get a module/plugin that reads DocSet's cache and load up TT variables
needed to generate the page.

I like the first, but the 2nd might be easier to implement with your
current modules.

I say all this stuff not knowing DocSet at all, other than the previous
version that I use.

>Did you specify these files in the copy_glob attribute?

I did:

copy_glob => [
qw(
style.css
images/*
robots.txt
search/*
.htaccess
)
],

Where search is the directory.



>> and also didn't keep permissions
>> (so CGI scripts lost their execute bit). At least that's what it looked
>> like. Hum, I'm sure there's a way to do that, but out of time now.
>
>
>That should be easily fixable. I use File::Copy::copy to copy the files,
>so I think I need to move it to File::Copy::syscopy. Can you try if it
>does the trick? see DocSet/Util.pm

Ok, but I'm about to go off line mostly until about Tuesday (It's still
Sunday here!)

..
>> <!-- SwishCommand index -->
>> [% content %]
>> <!-- SwishCommand noindex -->
>> ...
>that's very easy, just put these into tmpl/custom/html/page_body
>next to the #render the content comment.

Do you want diffs? or Is that something you can easily do?


>I still don't understand what are you trying to do here.
>
>Though, all the linkage, titles, abstract info is cached already and
>used by the CacheNavigate to build all the index.html files, menus and
>pre|next navigations. Without it you will have to rebuild the whole site
>when you change one file.

Right, but how does a CGI script that is outside of DocSet make use of that
data?



--
Bill Moseley
mailto:moseley [at] hank

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


stas at stason

Jan 27, 2002, 7:44 PM

Post #16 of 20 (1041 views)
Permalink
Re: adding search? [In reply to]

Bill Moseley wrote:

> At 09:38 AM 01/28/02 +0800, Stas Bekman wrote:
>
>>I don't understand why do you need the template variables. Why swish
>>cannot work with the static content?
>>
>
> If it's going to use the same templates as the site to build it's page then
> it needs the template variables to build the menus, right? The menu is
> generated by DocSet, no?


Of course, silly me :)

But that's easy. We just prepare a special template during the
rendering, which will be an html with everything inside + a placeholder
for search results. Piece of cake. Right? We should need only one static
page for this as all the hits will have their links from /.


>>Did you specify these files in the copy_glob attribute?
>>
>
> I did:
>
> copy_glob => [.
> qw(
> style.css
> images/*
> robots.txt
> search/*
> .htaccess
> )
> ],
>
> Where search is the directory.


so you say that it didn't copy the dot files from the search directory,
right?

BTW, update your cvs, I've added an .htaccess file at the root dir already.


>>>and also didn't keep permissions
>>>(so CGI scripts lost their execute bit). At least that's what it looked
>>>like. Hum, I'm sure there's a way to do that, but out of time now.
>>>
>>
>>That should be easily fixable. I use File::Copy::copy to copy the files,
>>so I think I need to move it to File::Copy::syscopy. Can you try if it
>>does the trick? see DocSet/Util.pm
>>
>
> Ok, but I'm about to go off line mostly until about Tuesday (It's still
> Sunday here!)


no prob, once it works for you, send me the patch or just tell me what
did work and I'll fix it. Whichever way is more convenient to you.


>>> <!-- SwishCommand index -->
>>> [% content %]
>>> <!-- SwishCommand noindex -->
>>> ...
>>>
>>that's very easy, just put these into tmpl/custom/html/page_body
>>next to the #render the content comment.
>>
>
> Do you want diffs? or Is that something you can easily do?


If all you need is this:

>>> <!-- SwishCommand index -->
>>> [% content %]
>>> <!-- SwishCommand noindex -->

I'll commit it immediately.


>>I still don't understand what are you trying to do here.
>>
>>Though, all the linkage, titles, abstract info is cached already and
>>used by the CacheNavigate to build all the index.html files, menus and
>>pre|next navigations. Without it you will have to rebuild the whole site
>>when you change one file.
>>
>
> Right, but how does a CGI script that is outside of DocSet make use of that
> data?

If my reply to the first para of your post above solves the problem,
there is no need for this.


_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas [at] stason http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


moseley at hank

Jan 27, 2002, 8:25 PM

Post #17 of 20 (1041 views)
Permalink
Re: adding search? [In reply to]

At 10:44 AM 01/28/02 +0800, Stas Bekman wrote:
>Of course, silly me :)
>
>But that's easy. We just prepare a special template during the
>rendering, which will be an html with everything inside + a placeholder
>for search results. Piece of cake. Right? We should need only one static
>page for this as all the hits will have their links from /.

That's fine. If DocSet can build me a page with just a [% content %]
section where you want the results, that will be easy. My existing
template is designed to use with WRAPPER, so that will make things simple.

We could always hard code the layout for the search script. Would be a
little faster than loading TT for each search. But it would be nice to use
the script.

So with docset, if I modify one page and run /bin/build, it knows to only
build that one page. But what if a change in a page modifies how the
sidebar menu will look will DocSet go and re-build everything?

Maybe when you can't sleep some night think about separating DocSet from
the HTML generation. It might be nice. DocSet -> ttree -> html output.
DocSet might write a tt config file, or provide a plugin to read the cache.
You might end up with a more powerful solution where DocSet, CGI, and
mod_perl could be used together in a site. DocSet should also be able to
read a config. For example, if you wanted "Search" to be a menu item, but
Search is not part of "src" then it might be nice to have a way to tell
DocSet to include that.

>so you say that it didn't copy the dot files from the search directory,
>right?

Right. But odd. I did a cvs update and bin/build and now the .htaccess
you added was copied.

Also:

.../dst_html/.htaccess: Redirect to non-URL


>BTW, update your cvs, I've added an .htaccess file at the root dir already.

I thought I just updated a few hours ago. ... Oh, now there's the update.


> >>> <!-- SwishCommand index -->
> >>> [% content %]
> >>> <!-- SwishCommand noindex -->
>
>I'll commit it immediately.

Yep, that's it. Add libxml2 to swish-e was a very nice thing. The old
html and xml parsers in swish really sucked.

But then I get "bug" reports where javascript was being indexed:

<script>
<!-- hide from old clients
some javascript here
// end of hide -->
</script>

The old swish HTML parser would ignore the java script inside the
"comment", but libxml2 (correctly) seems that as CDATA and it was indexed.
Libxml2 makes it easy to add

IgnoreTags script style

And fixed.

One more bug to fix before I sign off.

Let me know when you can generate a template for the search script to use.



--
Bill Moseley
mailto:moseley [at] hank

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


stas at stason

Jan 27, 2002, 8:55 PM

Post #18 of 20 (1039 views)
Permalink
Re: adding search? [In reply to]

Bill Moseley wrote:

> At 10:44 AM 01/28/02 +0800, Stas Bekman wrote:
>
>>Of course, silly me :)
>>
>>But that's easy. We just prepare a special template during the
>>rendering, which will be an html with everything inside + a placeholder
>>for search results. Piece of cake. Right? We should need only one static
>>page for this as all the hits will have their links from /.
>>
>
> That's fine. If DocSet can build me a page with just a [% content %]
> section where you want the results, that will be easy. My existing
> template is designed to use with WRAPPER, so that will make things simple.


Yes, simply add an html page ala src/404.html and add it to
src/config.cfg into the hidden section ala 404.html. Call it
searchresults.html?

See cvs, I've just added this page.


> We could always hard code the layout for the search script. Would be a
> little faster than loading TT for each search. But it would be nice to use
> the script.


Yes, I think you don't need TT here at all. A simple s/// will be
faster. So we better use a different placeholder than [% results %]

e.g. [RESULTS]


> So with docset, if I modify one page and run /bin/build, it knows to only
> build that one page. But what if a change in a page modifies how the
> sidebar menu will look will DocSet go and re-build everything?


The logic is more complex than that.

If you change a page, config file or any of the files that are copied as
is, the docset this changed object resides in gets rebuilt. And of
course all the parent level docsets are rebuilt as well.

It tries to do the right thing doing the minimal work.

If you change any of the templates, you should rebuild the whole thing
with -f (force). Since templates don't change a lot, we don't want to
involve extra logic in source modification alg.


> Maybe when you can't sleep some night think about separating DocSet from
> the HTML generation. It might be nice. DocSet -> ttree -> html output.
> DocSet might write a tt config file, or provide a plugin to read the cache.
> You might end up with a more powerful solution where DocSet, CGI, and
> mod_perl could be used together in a site.


Nope, not DocSet. That's how Andy's TT docs get built - in two passes. I
don't like this approach because his templates look like hell (the first
pass need to generate templates for the second pass) and if you look at
the pods, they have loads of TT markup in there.

DocSet builds everything in one pass, but in two phases - first scan
everything and cache the wanted bits then render it all. No circus
acrobatics required.

> DocSet should also be able to
> read a config. For example, if you wanted "Search" to be a menu item, but
> Search is not part of "src" then it might be nice to have a way to tell
> DocSet to include that.


You can do this already, look at the src/config.cfg file. and search for
news [at] take2, now do the same for search.

Didn't I tell that DocSet is a dream app for the docs repository
manager? It does all you ever possibly want to do :) :) :)


>>so you say that it didn't copy the dot files from the search directory,
>>right?
>>
>
> Right. But odd. I did a cvs update and bin/build and now the .htaccess
> you added was copied.


May be you've forgotten to save the config file? I've got it working the
moment I've added it. Or there is a bug.


> Also:
>
> .../dst_html/.htaccess: Redirect to non-URL


what do you mean? This entry is wrong?

RedirectPermanent /guide/ /docs/1.0/guide/

How do I fix it? Just "Redirect"?


>>BTW, update your cvs, I've added an .htaccess file at the root dir already.
>>
>
> I thought I just updated a few hours ago. ... Oh, now there's the update.


:)


>>>>> <!-- SwishCommand index -->
>>>>> [% content %]
>>>>> <!-- SwishCommand noindex -->
>>>>>
>>I'll commit it immediately.
>>
>
> Yep, that's it.


committed, see if something is missing. like an earlier 'noindex' tag.

> Add libxml2 to swish-e was a very nice thing. The old
> html and xml parsers in swish really sucked.
>
> But then I get "bug" reports where javascript was being indexed:
>
> <script>
> <!-- hide from old clients
> some javascript here
> // end of hide -->
> </script>
>
> The old swish HTML parser would ignore the java script inside the
> "comment", but libxml2 (correctly) seems that as CDATA and it was indexed.
> Libxml2 makes it easy to add
>
> IgnoreTags script style
>
> And fixed.
>
> One more bug to fix before I sign off.


cool!


> Let me know when you can generate a template for the search script to use.

it's done.

--


_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas [at] stason http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


moseley at hank

Jan 28, 2002, 7:37 AM

Post #19 of 20 (1041 views)
Permalink
Re: adding search? [In reply to]

At 11:55 AM 01/28/02 +0800, Stas Bekman wrote:
>> Maybe when you can't sleep some night think about separating DocSet from
>> the HTML generation. It might be nice. DocSet -> ttree -> html output.
>> DocSet might write a tt config file, or provide a plugin to read the cache.
>> You might end up with a more powerful solution where DocSet, CGI, and
>> mod_perl could be used together in a site.
>
>
>Nope, not DocSet. That's how Andy's TT docs get built - in two passes. I
>don't like this approach because his templates look like hell (the first
>pass need to generate templates for the second pass) and if you look at
>the pods, they have loads of TT markup in there.

He's suggested to me lately to do that two-step process to build the
templates that are used as templates (TT builds TT's templates from
templtes ;). Things get confusing with too many layers in the same
application.

>> DocSet should also be able to
>> read a config. For example, if you wanted "Search" to be a menu item, but
>> Search is not part of "src" then it might be nice to have a way to tell
>> DocSet to include that.
>
>You can do this already, look at the src/config.cfg file. and search for
>news [at] take2, now do the same for search.

Of course. I wasn't thinking.


>> .../dst_html/.htaccess: Redirect to non-URL
>
>
>what do you mean? This entry is wrong?
>
>RedirectMatch Permanent /guide/ /docs/1.0/guide/

This drives me crazy! The solution is:
RedirectMatch Permanent /guide/ /docs/1.0/guide/


Maybe it's my version of Apache?

Apache/1.3.20 (Unix) mod_perl/1.25_01


> cat .htaccess
RedirectPermanent /guide/ /docs/1.0/guide/

> HEAD -S http://localhost/guide
HEAD http://localhost/guide --> 500 Internal Server Error

> cat .htaccess
RedirectMatch Permanent /guide/ /docs/1.0/guide/

> HEAD -S http://localhost/guide
HEAD http://localhost/guide --> 404 Not Found



--
Bill Moseley
mailto:moseley [at] hank

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl


stas at stason

Jan 28, 2002, 7:43 AM

Post #20 of 20 (1041 views)
Permalink
Re: adding search? [In reply to]

Bill Moseley wrote:

> At 11:55 AM 01/28/02 +0800, Stas Bekman wrote:
>
>>>Maybe when you can't sleep some night think about separating DocSet from
>>>the HTML generation. It might be nice. DocSet -> ttree -> html output.
>>>DocSet might write a tt config file, or provide a plugin to read the cache.
>>> You might end up with a more powerful solution where DocSet, CGI, and
>>>mod_perl could be used together in a site.
>>>
>>
>>Nope, not DocSet. That's how Andy's TT docs get built - in two passes. I
>>don't like this approach because his templates look like hell (the first
>>pass need to generate templates for the second pass) and if you look at
>>the pods, they have loads of TT markup in there.
>>
>
> He's suggested to me lately to do that two-step process to build the
> templates that are used as templates (TT builds TT's templates from
> templtes ;). Things get confusing with too many layers in the same
> application.


Well, I've started going Andy's way but then I was horrified by all the
escaping that I had to do. So I've taken a new way.

In any case DocSet *is* doing it in 2 passes, but the first one doesn't
need the templates.



> This drives me crazy! The solution is:
> RedirectMatch Permanent /guide/ /docs/1.0/guide/
>
>
> Maybe it's my version of Apache?
>
> Apache/1.3.20 (Unix) mod_perl/1.25_01

weird :(


_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas [at] stason http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-dev-unsubscribe [at] perl
For additional commands, e-mail: docs-dev-help [at] perl

ModPerl docs-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.