
alexmv at bestpractical
Feb 15, 2012, 12:08 PM
Post #1 of 1
(105 views)
Permalink
|
|
rt branch, 4.0/pg-fts-invalid-character, created. rt-4.0.5-62-g81df7e2
|
|
The branch, 4.0/pg-fts-invalid-character has been created at 81df7e2d07c35834b670e0e41adf677cd15affb5 (commit) - Log ----------------------------------------------------------------- commit 12b0fded547c53c79db4f5a2e2f049b5f397d387 Author: Alex Vandiver <alexmv [at] bestpractical> Date: Wed Feb 15 15:01:05 2012 -0500 With the Pg FTS, catch and skip attachments which contain invalid UTF8 bytes diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in index 7e31cac..652fde0 100644 --- a/sbin/rt-fulltext-indexer.in +++ b/sbin/rt-fulltext-indexer.in @@ -371,6 +371,8 @@ sub process_pg { unless ( $status ) { if ($dbh->errstr =~ /string is too long for tsvector/) { warn "Attachment @{[$attachment->id]} not indexed, as it contains too many unique words to be indexed"; + } elsif ($dbh->errstr =~ /invalid byte sequence/) { + warn "Attachment @{[$attachment->id]} cannot be indexed, as it contains invalid UTF8 bytes"; } else { die "error: ". $dbh->errstr; } commit 19721b8012776f5ae523e27f07b6dac06ad1dded Author: Alex Vandiver <alexmv [at] bestpractical> Date: Wed Feb 15 15:03:38 2012 -0500 Strengthen wording about our ability (or lack thereof) to FTS index on Pg diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in index 652fde0..d978586 100644 --- a/sbin/rt-fulltext-indexer.in +++ b/sbin/rt-fulltext-indexer.in @@ -370,7 +370,7 @@ sub process_pg { my $status = eval { $dbh->do( $query, undef, $$text, $attachment->id ) }; unless ( $status ) { if ($dbh->errstr =~ /string is too long for tsvector/) { - warn "Attachment @{[$attachment->id]} not indexed, as it contains too many unique words to be indexed"; + warn "Attachment @{[$attachment->id]} cannot be indexed, as it contains too many unique words"; } elsif ($dbh->errstr =~ /invalid byte sequence/) { warn "Attachment @{[$attachment->id]} cannot be indexed, as it contains invalid UTF8 bytes"; } else { commit 81df7e2d07c35834b670e0e41adf677cd15affb5 Author: Alex Vandiver <alexmv [at] bestpractical> Date: Wed Feb 15 15:03:45 2012 -0500 If we fail to index on Pg, ensure that we continue indexing past that point Previously, failure to index (because of invalid bytes, or too-long content) left the content index NULL. As our check for where to resume indexing is based on rows where the index IS NOT NULL, this could lead to a pessimal condition where a large number of failures to index in a row would prevent forward progress of the indexer. diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in index d978586..407afe0 100644 --- a/sbin/rt-fulltext-indexer.in +++ b/sbin/rt-fulltext-indexer.in @@ -376,6 +376,11 @@ sub process_pg { } else { die "error: ". $dbh->errstr; } + + # Insert an empty tsvector, so we count this row as "indexed" + # for purposes of knowing where to pick up + eval { $dbh->do( $query, undef, "", $attachment->id ) } + or die "Failed to insert empty tsvector: " . $dbh->errstr; } } ----------------------------------------------------------------------- _______________________________________________ Rt-commit mailing list Rt-commit [at] lists http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-commit
|