Gossamer Forum
Home : Products : Links 2.0 : Discussions :

UTF-8 in Links 2

Quote Reply
UTF-8 in Links 2
For a few years, I have been helping a small non-profit org maintain a niche directory with Links 2. Over time, we have made numerous minor and major hacks. The customized Links 2 integrates well with the rest of the web site.

However, they have made a very sensible decision to bring the website into the 21st century. The next web site update must be XHTML 1.0 Strict (ie: using doctype !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd") and UTF-8 encoding on all pages except Links 2 admin (where it will be preferred but not required). This is now a non-negotiable contractual necessity, so the site can support multiple languages. It is also the right thing to do.

The customized Links 2 is already fully W3C-compliant with structural markup, so the switch from HTML 4.01 is trivial (I have already done it). But to switch from 8-bit ISO-8859-1 to UTF-8 is decidedly non-trivial (especially when we have to hack forms and search results). Right now, I can see how to start -- but I have no idea how long it might take, so I really don't know if it is worth attempting.

Has anyone else ever done this?
Any experiences to share?

Quote Reply
Re: [YoYoYoYo] UTF-8 in Links 2 In reply to
I have worked over the Links 2 code for my sites, and it is probably pretty close to what you need. There are numerous changes involved, but none too difficult. Most work is in db_utils.pl, site_html_templates.pl, and nph-build.cgi.

I did get the forms to build correctly, including the drop-down lists. Are you also switching to all CSS, and tossing the tables aside? The admin section is still original, since it is non-public and would need considerable alteration to be compliant. I will post the changes after I look through my code and gather them up. Feel free to contact me directly, too.


Leonard
aka PerlFlunkie
Quote Reply
Re: [PerlFlunkie] UTF-8 in Links 2 In reply to
OK, here's a start...
Changes you need are in red, code you can ignore (specific to my site mods) are in purple. Many routines are edited to show only what needs changing, mssing code indicated with ***.

Code:

db_utils.pl

sub build_select_field {
***
# This section creates the XHTML.
$output = qq|<select name="$name"|;
$mult ?
($output .= qq| multiple="multiple" size="$altcatsize">\n|) :
($output .= qq|>\n|);
$output .= qq|<option value="---">---</option>\n|;

foreach $field (@fields) {
$values{$field} ?
($output .= qq|<option value="$field" selected="selected">$field</option>\n|) :
($output .= qq|<option value="$field">$field</option>\n|);
}
$output .= "</select>\n";
return $output;
}

---------------
This next one is a new routine, you need to add it.

sub build_select_field_clean {
# --------------------------------------------------------
# Builds an XHTML-valid select field for the add/modify forms.
# Only used for the Category and Altcategories fields.

my ($column, $value, $name, $mult) = @_;
my ($size, %values);

$name || ($name = $column);
$size || ($size = 1);
$altcatsize || ($altcatsize = 5);

# Subcat/Yahoo select mod >
$db_select_fields{'Mult-Subcats'} = join (",", &category_list);
# < Subcat/Yahoo select mod


if (! exists $db_select_fields{$column}) {
$db_select_fields{$db_cols[$db_category]} = $db_select_fields{'Mult-Related'} = join (",", &category_list);
}

# alt cat mod >
if (! exists $db_select_fields{$column} && $column eq "AltCategories" && exists $db_select_fields{'Category'}) {
$db_select_fields{$db_cols[$db_alt_cat]} = $db_select_fields{'Mult-AltCategories'} = $db_select_fields{'Category'}
}
elsif (! exists $db_select_fields{$column} && $column eq "AltCategories" && ! exists $db_select_fields{'Category'}) {
$db_select_fields{$db_cols[$db_alt_cat]} = $db_select_fields{'Mult-AltCategories'} = join (",", &category_list);
}
# < alt cat mod

if ($mult) {
@fields = split (/\,/, $db_select_fields{"Mult-$column"});
%values = map { $_ => 1 } split (/\Q$db_delim\E/, $value);
}
else {
@fields = split (/\,/, $db_select_fields{$column});
$values{$value}++;
}
($#fields >= 0) or return "error building select field: no select fields specified in config for field '$column'!";

# This section creates the XHTML.
$output = qq|<select name="$name"|;
$mult ?
($output .= qq| multiple="multiple" size="$altcatsize">\n|) :
($output .= qq|>\n|);
$output .= qq|<option value="---">---</option>\n|;

foreach $field (@fields) {
# These two subs clean up the entries --
my $field_clean = &build_clean($field);
my $field_url = &urlencode($field);

$values{$field} ?
($output .= qq|<option value="$field_url" selected="selected">$field_clean</option>\n|) :
($output .= qq|<option value="$field_url">$field_clean</option>\n|);
}
$output .= "</select>\n";
return $output;
}


----------------

sub build_select_field_from_db {
***
# Make a select list out of those names.
$output = qq|<select name="$name"><option>---</option>|;
foreach $field (sort keys %selectfields) {
($field eq $value) ?
($output .= "<option value=\"$field\" selected=\"selected\">$field</option>\n") :
($output .= "<option value=\"$field\">$field</option>\n");
}
$output .= "</select>\n";

return $output;
}

--------------

sub build_checkbox_field {
***
foreach $box (@boxes) {
(grep $_ eq $box, @values) ?
($output .= qq!<input type="checkbox" name="$name" value="$box" checked="checked" /> $box\n!) :
($output .= qq!<input type="checkbox" name="$name" value="$box" /> $box\n!);
}
return $output;
}

------------------

sub build_radio_field {
***
foreach $button (@buttons) {
($value eq $button) ?
($output .= qq|<input type="radio" name="$name" value="$button" checked /> $button \n|) :
($output .= qq|<input type="radio" name="$name" value="$button" /> $button \n|);
}
return $output;
}

---------------

These next two are modified for my mod that allows special characters in the category name,
which is posted elsewhere. They may not need to be changed for your site.
The part in red will change the category divider in the drop-down lists from a / to a ».

sub build_clean {
# --------------------------------------------------------
# Formats a category name for displaying as XHTML compliant.
# In order to enable use of the pound sign (#) in the cat name,
# a two-step jig is required...

my ($input) = shift;
$input =~ s/_/ /g; # Change '_' to spaces.
$input =~ s|#|\^|g; # Change '#' to '^' (step 1)
$input =~ s/&/&#38;/g; # Change '&' to '&#38;'
$input =~ s|\^|&#35;|g; # Change '^' to '&#35;' (step 2)
$input =~ s|/| &#187; |g; # Change '/' to ' &#187; ' (' » '). # Catgeory divider
$input =~ s/\\/&#92;/g; # Change '\' to '&#92;'
$input =~ s/\?/&#63;/g; # Change '?' to '&#63;'
$input =~ s/!/&#33;/g; # Change '!' to '&#33;'

return $input;
}

The red part changes all letters in the URL to lowercase.
This mod will require other changes, too, also previously posted.

sub urlencode {
# --------------------------------------------------------
# Escapes a string to make it suitable for printing as a URL.
my($toencode) = shift;
$toencode =~ s/&/and/g; #change '&' to 'and'
$toencode =~ tr/ /_/; #replace space with underscore
$toencode =~ tr/[A-Z]/[a-z]/; #change all letters to lowercase
$toencode =~ s/[!?#]//g; #remove these characters (!?#)
$toencode =~ s/([^a-zA-Z0-9_\-.])/uc sprintf("%%%02x",ord($1))/eg;
$toencode =~ s/\%2F/\//g;
return $toencode;
}

-----------

site_html_templates.pl

sub site_html_add_form {
# --------------------------------------------------------
# This routine determines how the add form page will look like.
#
&html_print_headers;
my $category = shift;

($category = qq~$category <input type="hidden" name="Category" value="$category">~) :
($category = &build_select_field_clean ("Category", "$in{'Category'}"));
my $altcategories = &build_select_field_clean ("AltCategories","$in{'AltCategories'}","AltCategories","multiple Size=3");

print &load_template ('add.html', {
Category => $category,
AltCategories => $altcategories,
%in,
%globals
});
}

----------------

sub site_html_add_failure {
# --------------------------------------------------------
# This routine determines how the add failure page will look like.
my ($errormsg) = shift;
my $category = &build_select_field_clean ("Category", "$in{'Category'}");
delete $in{'Category'};

my $altcategories = &build_select_field_clean ("AltCategories","$in{'AltCategories'}","AltCategories","multiple Size=3");
delete $in{'AltCategories'};


&html_print_headers;
print &load_template ('add_error.html', {
error => $errormsg,
Category => $category,
AltCategories => $altcategories,
%in,
%globals
});
}

--------------
In the modify template, change to this:
Category <%Cat%>

sub site_html_modify_form {
# --------------------------------------------------------
# This routine determines how the modify form page will look like.
my (%record) = @_;
&html_print_headers;
my $cat = &build_select_field_clean ("Category", "$record{'Category'}");
my $altcategories = &build_select_field_clean ("AltCategories","$in{'AltCategories'}","AltCategories","MULTIPLE Size=3");

print &load_template ('modify.html', {
Cat => $cat,
Category => $category,
AltCategories => $altcategories,
%in,
%record,
%globals
});
}

------------

This routine is highly modified, but the general idea is still the same...

sub site_html_print_cat {
# --------------------------------------------------------
# This routine determines how the list of categories will look.
my (@subcat) = @_;
my ($url, $numlinks, $mod, $subcat, $category_name, $description, $output, $i, $columns, $subcatsub);

my ($half) = int (($#subcat+2) / 2);

# Print Header.


$output = qq|<div class="float_container"><div class="float_left">\n|;

# > category sort mod
sub byfield { $category{$a}[$cat_sort_field] <=> $category{$b}[$cat_sort_field] };
foreach $subcat (sort byfield @subcat) {
# foreach $subcat (sort @subcat) { # < original
# < category sort mod


($description) = @{$category{$subcat}}[2];
# First let's get the name, number of links, and last modified date...
$url = "$build_root_url/" . &urlencode($subcat) . "/";
if ($subcat =~ m,.*/([^/]+)$,) { $category_name = &build_clean($1); } else { $category_name = &build_clean($subcat); }
$numlinks = $stats{"$subcat"}[0];
$mod = $stats{"$subcat"}[1];
# We check to see if we are half way through, if so we stop this table cell
# and begin a new one (this lets us have category names in two columns).
if ($i == $half) {
$output .= qq~</div>\n~;
$output .= qq~<div class="float_right">\n~;
$i = 0;
}


# Then we print out the name linked, new if it's new, and popular if its popular.
$output .= qq|<div class="link_top">\n|;
$output .= qq|<a class="cat2" href="$url">$category_name</a>
<span class="smalltype">($numlinks)</span>\n|;
$output .= qq|</div>|;


# > Yahoo-style mod
if ($#{$subcategories{$subcat}} >= 0) {
$v = 0;
$output .= qq~<div class="link_bottom">~;

# > category sort mod and added '$subcatsub'
foreach $subcatsub (sort byfield @{$subcategories{$subcat}}) {
#foreach $subcatsub (sort @{$subcategories{$subcat}}) {
# < category sort mod

$suburl = "$build_root_url/" . &urlencode($subcatsub) . "/";
if ($subcatsub =~ m,.*/([^/]+)$,) {$subcategory_name = &build_clean($1);
}
else {$subcategory_name = &build_clean($subcatsub);
}

$output .= qq~<a class="cat2" href="$suburl">$subcategory_name</a>~ if ($v <= 5);
$output .= qq~,&#160;\n~ if ($v ne $#{$subcategories{$subcat}} && $v <= 5);
$output .= qq~<a class="cat2" href="$url"> more&#187;</a>\n~ if ($v eq "5");
$v++;
}
$output .= qq~</div>~;
}
$i++;
}
## < Yahoo-style mod


$output .= "</div></div>\n";
$output .= qq|</dl>|;

return $output;
}


There changes in nph-build.cgi, but they are pretty minor. Much alteration is required in the templates, of course. Let me know how this works, and what else needs correcting.


Leonard
aka PerlFlunkie
Quote Reply
Re: [PerlFlunkie] UTF-8 in Links 2 In reply to
Thanks Perlflunkie ...

I've got off to a good start. It's been easier than I thought, but taking a long time. But after a 12-hour marathon, I have hacked Links 2 so I can now have links in English, Spanish, Russian, Arabic, Turkish and Russian all on the same page, with every page passing the W3C Validator test at http://validator.w3.org/

I will report here in a few days when I have recovered.