Gossamer Forum
Home : Products : Gossamer Links : Pre Sales :

Importing data

Quote Reply
Importing data
Hi there,

I have a question in regards of importing rather large csv files.

Is there an easy way to import these files (different csv files updated throughout the day)?

Or is there some sort of plugin or mod available?

Thanks in advance.

Cheers,

RR
Quote Reply
Re: [RoadRunner] Importing data In reply to
Hi. I've done/do quite a bit of large XML/CSV data imports. How large roughly will these files be?

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Importing data In reply to
Hi Andy,

thanks for your reply.

The csv/xml files are anything between 2 and 200 mb.

Cheers,

RR
Quote Reply
Re: [RoadRunner] Importing data In reply to
In terms of the CSV files, thats pretty simple to write an import script (you wouldn't believe how many I've had to do :p).

For the smaller CSV files, you could convert the to pipe, or tab delimited files, and then import them via my Data_Import plugin (see the links in my sig.). This makes it easy to import anything upto 30,000 links/products in one go (its only limited by your computer, and how much memory it can hold in an IE text-area ... mine isn't that powerful, so crashes on me a lot :'( ).

Regarding the larger XML files ... these are more of a pain. You have to do a while {} loop on the file, to find the appropriate product details, and then use XML::Simple::XMLIn() to parse the XML, and make it into a usable hash/array. Unfortunatly, with large XML files, its quite hard to slurp it all into XML::Simple right away (i.e without slicing it up first), as its quite a CPU/RAM gobbling process Unimpressed

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Importing data In reply to
Probably better to use PerlSax (or similar) instead of XML::Simple as this reads the XML file line by line instead of keeping the whole thing in memory.
Quote Reply
Re: [afinlr] Importing data In reply to
Quote:
instead of XML::Simple as this reads the XML file line by line instead of keeping the whole thing in memory.

You try loading more than a few hundred Mb (or 20/30mb on some servers) into memory. Trust me... its a PITA :'(

Normally, when working with XML, I've found the easiest way to do it, is to go through the XML file with a while { }, and then write it to a custom .db file (pipe delimited, or similar). Then, when it comes to the importing itself, its all in flatfile format, which makes it a lot nicer to import with :) (and faster, as you don't have to do loads of processing along the line).

Just my opinion though =)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Importing data In reply to
In Reply To:
You try loading more than a few hundred Mb (or 20/30mb on some servers) into memory. Trust me... its a PITA :'([/quote]

I agree - that's why I don't use XML::Simple. Wink
Quote Reply
Re: [afinlr] Importing data In reply to
Quote:
I agree - that's why I don't use XML::Simple

... but for something to be read, surely *everything* needs to be read? (i.e to find the ending tags, etc) ... or am I just missing something simple :/

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Importing data In reply to
XML::Simple is a DOM parser.

Here's an quick explanation of the difference between SAX and DOM taken from http://www.webreference.com/xml/column11/3.html

Quote:

SAX vs. DOM
The DOM is a quite convenient way to access and manipulate XML data, but it comes with a price to pay:
  • The underlying XML needs to be fully parsed before processing can occur. As most DOM implementations are purely memory-based, this limits the amount of XML data that can be processed this way. Also the possiblity for pipelining various stages of processing is limited.
  • The DOM structure only defines generic nodes, whereas most languages are strongly typed so that one might want to map specific nodes to specific classes of say Java code.

SAX is more useful in cases where:
  • Huge amounts of XML need to be processed, but the information needed is highly local, meaning only a small amount of data needs to be stored. This is usually the case in transforming linear documents, where little cross-linking occurs.
  • Various stages of XML processing are interconnected to form a pipeline. The next stage can then begin its work as soon as the first character comes out of the previous one, instead of having to wait for the full document to be converted into objects.


I'm sure there are better explanations. There are lots of SAX parsers but they are all more complicated to program than XML::Simple. However, I think it is worth getting into if you are doing a lot of XML parsing.
Quote Reply
Re: [afinlr] Importing data In reply to
Thanks ... I may have to play with that at some point :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!