Gossamer Forum
Home : General : Perl Programming :

The content_language of a site

Quote Reply
The content_language of a site
does anyone know how i can get the content language of a site so i can say if a site is english, dutch, blah. I try $res->content_language, but that brings back blank results.
and res = $ua->request($req);, and i think u know what ua and req are, so anyhelp would be appreciated.
Quote Reply
Re: The content_language of a site In reply to
Try something like:
if ($ENV{'HTTP_ACCEPT_LANGUAGE'} eq "en-us") {
do english stuff
...
}
else {
do other lang
...
}

-- Gordon --


------------------
$blah='82:84:70:77';
print chr($_) foreach (split/:/,$blah);
Quote Reply
Re: The content_language of a site In reply to
Yeah, i know how i would translate it from en-us to English with an if call, but using that Accept_language only brings the lanuage of my cgi which would make the sites its getting all english, and i want the code to look at other urls' languages.

btw...
are you the new moderator of this forum?

[This message has been edited by Bmxer (edited November 01, 1999).]
Quote Reply
Re: The content_language of a site In reply to
Sorry, misunderstood. S'what i get for answering too early in the morning Smile Why don't you show more of your code so we don't have to guess at what is causing the problem?
Yep, I am the moderator.
-- Gordon --


------------------
$blah='82:84:70:77';
print chr($_) foreach (split/:/,$blah);
Quote Reply
Re: The content_language of a site In reply to
  use LWP::UserAgent;
use HTTP::Request;
use HTTP::Headers;
$url = ($in{'URL'});
my ($ua,$req,$res);
$ua = LWP::UserAgent->new();
$ua->timeout(30);
$ua->agent("UrlScope");
$req = new HTTP::Request 'GET' => $url;
$res = $ua->request($req);
$code = $res->code;
$message = $res->message;
$in{'Language'} = $res->header('X-Meta-MS.LOCALE');
if (!$in{'Language'}) {
$in{'Language'} = $res->content_language;
}
$in{'Title'} = $res->header('Title');
$in{'Title'} = substr($in{'Title'},0,50) . ".." if (length($in{'Title'}) > 50);
if (!$in{'Title'}) {
$in{'Title'} = "No Title";
}
$in{'Description'} = $res->header('X-Meta-Description');
$in{'Description'} = substr($in{'Description'},0,125) . "..." if (length($in{'Description'}) > 125);
if (!$in{'Description'}) {
$in{'Description'} = "No Description";
}
$in{'Keywords'} = $res->header('X-Meta-Keywords');
$in{'Keywords'} = substr($in{'Keywords'},0,145) . "..." if (length($in{'Keywords'}) > 145);
if (!$in{'Keywords'}) {
$in{'Keywords'} = $in{'Ceywords'};
}

See, i have $res->content_language which should work using HTTP::Headers but it brings back nothing
Quote Reply
Re: The content_language of a site In reply to
You might can use JavaScript
but dont know how to bring it to the perl...
maybe you must create a startpage which checks the language and then forward to any other...
Quote Reply
Re: The content_language of a site In reply to
From the World Wide Web Consortium:
Quote:
content-language
semantics
The natural language of the resource content. This is used mainly by the NegotiatedFrame to negotiate among its set of variant resources. The value of this attribute can be either extracted from the resource content (e.g. if it is an HTML file that includes some appropriate META tag), or provided for informational purposes.
type
This attribute is an editable LanguageAttribute
default value
This attribute defaults to null.

Looks like you're in a tough spot...
The only thing i could think of would be to create some key-word checking routine or character-type or a combo of both? If anyone else has any insight, please let me know Smile
-- Gordon --


------------------
$blah='82:84:70:77';
print chr($_) foreach (split/:/,$blah);

[This message has been edited by GClemmons (edited November 04, 1999).]
Quote Reply
Re: The content_language of a site In reply to
Thanks GClemmons, that makes me wonder how altavista does it. Unless they go to the sites and view them themselves and distinguish the language.
Quote Reply
Re: The content_language of a site In reply to
AltaVista licensed Arthur Dent's babble fish, so there really is no way to duplicate it.



--mark

------------------
Due to repeated requests for support via ICQ, I am once again offering support online.

The ONLY number I will accept requests on is #53788453. Messages sent to my other numbers will not pass through to me, as I have security settings set to ignore messages from users not on that list.

I don't know how often I will be active on this number, as it is for only when I am free to offer assistance.

Could this signature be any longer? :)
Quote Reply
Re: The content_language of a site In reply to
Ok, thanks Mark, now i see i had a lost cause, though i could parse for characters on a page using lwp, its too tedious. Thanks all.