Gossamer Forum
Home : Products : Gossamer Links : Development, Plugins and Globals :

XML Parser - Open Source Script/Plugin

Quote Reply
XML Parser - Open Source Script/Plugin
Hello all,
Am posting the following hoping to keep it free open source. Need some help with the html escape on the following script. When adding a link with an "&" in the url, it stops at the "&".

What the script does: It's a modified version of the script found at mikesheasDOTnet that allows you to input a .xml file into a form and it will pop up the list of links. This version allows you to quickly add them into your linksql database via the add.cgi form... hoping for some help with the html escape thing as well as if anyone would like to take a shot at putting it into a plugin, that would be great.

How to install. I have it in my admin directory. Create a writeable (chmod 777) directory called "cache" (must cache the xml file to parse). Upload the xml2links.cgi to your admin or any executable directory, chmod to 755 and run.

Hope you can use it, and once again if anyone would be willing to put it into plugin format, that would be great... (must remain open source under the GNU), but really need help with the html escape thing...

Thanks

Code:

#!/usr/bin/perl
# RSS parser written by Mike Shea in May 2003.
# Contact Mike at mike@mikeshea.net
# Description: This script takes in an RSS file and outputs a simple XHTML list.
# Passing it a "rss_url=" CGI variable will return an XHTML list of items.
# It works with both RSS 0.91 files as well as 2.0 files.
# To Do:
# - Add in support for <xhtml:body> in RSS 2.0 files
# - Perhaps wire this into a newsfeed system for the setup, fetching, and display
# of various RSS files.
# - Automatically create cached versions of the HTML output and offer up a URL?
#
# example use: http://mikeshea.net/xml/rss2html.pl?rss_url=http://mikeshea.net/articles.xml
#


#JAN 2006 Modified by Dinky to work with Links SQL. Still under GNU opensource.

#Change your full path and information in the ADD field below. Don't change anything else, unless you can make it better.


# Load modules, available via cpan.org
use SOAP::Lite;

use CGI qw(:standard);
use XML::RSS;
use LWP::Simple;

# start an HTML header to the client
print header;

# if this is called without a parameter, display a form to enter an RSS url
if (!param()) {
print start_html('RSS Parser'),
h1('RSS Parser');
print "<p>Enter RSS URL</p>";
print "
<form action=\"xml2links.cgi\">
<input type=\"text\" name=\"rss_url\" size=\"40\" />
<input type=\"submit\" />
</form>
";
print end_html;
exit (0);

} else {
# If this is called with a parameter, we launch the parser and output HTML

# load up the RSS url
my $rss_url = param('rss_url');

#RSS Filename is the URL without http:// in it and replacing "/" with "-" and within the ./cache/ directory.
my $rss_filename = $rss_url;
$rss_filename =~ s/http:\/\///ig; # get rid of "http://"
$rss_filename =~ s/\//-/ig; # change "/" to "-"
$rss_filename = "./cache/".$rss_filename; # add ./cache/ to the directory

# Fetch the RSS feed.
# check for a local version of the file:
# if it exists, check to see if it is under 30 minutes old

# calculate file modified time and get a var for 30 minutes ago
($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks) = stat($rss_filename);

my $rss_source;

if ($mtime >= time() - 3600) {
# if it exists and is under 30 minutes old, use this file and put it in our var.
open (RSSFILE, "$rss_filename") or die print "cannot open file";
undef $/; # allow a single variable to take in an entire filehandler, from Perl Cookbook
$rss_source = <RSSFILE>; # suck in the file into $rss_source
close (RSSFILE);
# print "file found"; # Check to make sure caching works
} else {
# print "file not found, fetching URL"; # Check to make sure caching works
# if it either does not exist or is older than 30 minutes fetch the file
$rss_source = get("$rss_url");
# and save it locally.
open (RSSFILE, "> $rss_filename") or die print "unable to write file $rss_filename";

my $rss = new XML::RSS; # begin a new XML RSS instance
$rss->parse($rss_source) or die "cannot parse XML file"; # Parse it

# Start displaying the HTML result. This output is XHTML with symantic tagging, use a stylesheet to make it look good.
print RSSFILE "<h2><a href=\"$rss->{'channel'}->{'link'}\" title=\"$rss->{'channel'}->{'description'}\">$rss->{'channel'}->{'title'}

</a></h2>\n";
print RSSFILE "<ul>\n";

# Print each item
foreach my $item (@{$rss->{'items'}}) {
next unless defined($item->{'title'}) && defined($item->{'link'});
# $item->{'description'} = "";
$item->{'description'} =~ s/[^A-Za-z0-9 \*,\.:'\/\\-]//ig; # Parse out weird characters
print RSSFILE "<li><a href=\"$item->{'link'}\">$item->{'title'}</a></li><br>$item->{'description'}\n";

print qq|

<div align="center">
<center>
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" id="AutoNumber1" width="574">
<tr>
<td width="93"><font face="Tahoma" size="2">URL:</font></td>
<td width="481"><font face="Tahoma" size="2"><a href="$item->{'link'}" target="_blank">$item->{'link'}</a></font></td>
</tr>
<tr>
<td width="93"><font face="Tahoma" size="2">Title:</font></td>
<td width="481"><font face="Tahoma" size="2">$item->{'title'}</font></td>
</tr>
<tr>
<td width="93"><font face="Tahoma" size="2">Description:</font></td>
<td width="481"><font face="Tahoma" size="2">$item->{'description'}</font></td>
</tr>
<tr>
<td width="93"><font face="Tahoma" size="2">Author (owner):</font></td>
<td width="481"><font face="Tahoma" size="2">$author</font></td>
</tr>
<tr>
<td width="93"><font face="Tahoma" size="2">Email (owner):</font></td>
<td width="481"><font face="Tahoma" size="2">$email</font></td>
</tr>
<tr>
<td width="93"><font face="Tahoma" size="2">Options:</font></td>
<td width="481"><font face="Tahoma" size="2"><b>



<a href="HTTP://CHANGETOYOURFULLHTMLPATHTOLINKSADMIN/admin/admin.cgi?do=add_form&ID-opt=%3D&Username-opt=%3D&db=Links&todo=add_form&keyid=&URL=$item->{'link'}&Description=$item->{'description'}&LinkOwner=ADMIN&Contact_Name=admin&Contact_Email=admin@YOURSITE.COM&Title=$item->{'title'}&CatLinks.CategoryID=" target="_blank">Add</a>








</b></font>





</td>
</tr>
<tr>
<td width="93"><font face="Tahoma" size="2">Status:</font></td>
<td width="481"><font face="Tahoma" size="2"><b>$status</b></font></td>
</tr>
<tr>
<td colspan="2" width="574">
<hr color="#000000" width="90%">
</td>
</tr>
</table>
</center>
</div>


|;


}
print RSSFILE "</ul>\n";
close (RSSFILE);
open (RSSFILE, "$rss_filename") or die print "cannot open file";
undef $/; # allow a single variable to take in an entire filehandler, from Perl Cookbook
$rss_source = <RSSFILE>; # suck in the file into $rss_source
close (RSSFILE);
}

print $rss_source;

exit (0);
}

Also uploaded it...

</not a clue>
Quote Reply
Re: [Dinky] XML Parser - Open Source Script/Plugin In reply to
We have been offered an xml feed (with 20,000 records/jobs updated daily) for our Gossamer Links powered site (our site is a little unusual with regard the use of Glinks)

Jobsin.com

However, I've no idea 'how to implement' this feed, but this code / script looks something like what we need.

Q: would this be suitable and how would I use it?

Thanks
Colin Thompson
Quote Reply
Re: [colintho] XML Parser - Open Source Script/Plugin In reply to
Hi Colin,

I have a plugin for this. With that you can specify the feed url, cache, the length of output.

PM me if you are interested.

Cheers,

Cheers,

Dat

Programming and creating plugins and templates
Blog