Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Request Tracker: Commit

rt branch, 4.0/pg-fts-invalid-character, created. rt-4.0.5-62-g81df7e2

 

 

Request Tracker commit RSS feed   Index | Next | Previous | View Threaded


alexmv at bestpractical

Feb 15, 2012, 12:08 PM

Post #1 of 1 (105 views)
Permalink
rt branch, 4.0/pg-fts-invalid-character, created. rt-4.0.5-62-g81df7e2

The branch, 4.0/pg-fts-invalid-character has been created
at 81df7e2d07c35834b670e0e41adf677cd15affb5 (commit)

- Log -----------------------------------------------------------------
commit 12b0fded547c53c79db4f5a2e2f049b5f397d387
Author: Alex Vandiver <alexmv [at] bestpractical>
Date: Wed Feb 15 15:01:05 2012 -0500

With the Pg FTS, catch and skip attachments which contain invalid UTF8 bytes

diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 7e31cac..652fde0 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -371,6 +371,8 @@ sub process_pg {
unless ( $status ) {
if ($dbh->errstr =~ /string is too long for tsvector/) {
warn "Attachment @{[$attachment->id]} not indexed, as it contains too many unique words to be indexed";
+ } elsif ($dbh->errstr =~ /invalid byte sequence/) {
+ warn "Attachment @{[$attachment->id]} cannot be indexed, as it contains invalid UTF8 bytes";
} else {
die "error: ". $dbh->errstr;
}

commit 19721b8012776f5ae523e27f07b6dac06ad1dded
Author: Alex Vandiver <alexmv [at] bestpractical>
Date: Wed Feb 15 15:03:38 2012 -0500

Strengthen wording about our ability (or lack thereof) to FTS index on Pg

diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 652fde0..d978586 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -370,7 +370,7 @@ sub process_pg {
my $status = eval { $dbh->do( $query, undef, $$text, $attachment->id ) };
unless ( $status ) {
if ($dbh->errstr =~ /string is too long for tsvector/) {
- warn "Attachment @{[$attachment->id]} not indexed, as it contains too many unique words to be indexed";
+ warn "Attachment @{[$attachment->id]} cannot be indexed, as it contains too many unique words";
} elsif ($dbh->errstr =~ /invalid byte sequence/) {
warn "Attachment @{[$attachment->id]} cannot be indexed, as it contains invalid UTF8 bytes";
} else {

commit 81df7e2d07c35834b670e0e41adf677cd15affb5
Author: Alex Vandiver <alexmv [at] bestpractical>
Date: Wed Feb 15 15:03:45 2012 -0500

If we fail to index on Pg, ensure that we continue indexing past that point

Previously, failure to index (because of invalid bytes, or too-long
content) left the content index NULL. As our check for where to resume
indexing is based on rows where the index IS NOT NULL, this could lead
to a pessimal condition where a large number of failures to index in a
row would prevent forward progress of the indexer.

diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index d978586..407afe0 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -376,6 +376,11 @@ sub process_pg {
} else {
die "error: ". $dbh->errstr;
}
+
+ # Insert an empty tsvector, so we count this row as "indexed"
+ # for purposes of knowing where to pick up
+ eval { $dbh->do( $query, undef, "", $attachment->id ) }
+ or die "Failed to insert empty tsvector: " . $dbh->errstr;
}
}


-----------------------------------------------------------------------
_______________________________________________
Rt-commit mailing list
Rt-commit [at] lists
http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-commit

Request Tracker commit RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.