beema.shafreen at gmail
Jun 14, 2008, 11:38 PM
Post #7 of 7
Thanks lot for your valuable suggestions
On Sun, Jun 15, 2008 at 4:04 AM, Dennis Lee Bieber <wlfraed [at] ix>
> On Sat, 14 Jun 2008 12:45:47 +0530, "Beema shafreen"
> <beema.shafreen [at] gmail> declaimed the following in
> Strange: I don't recall seeing this on comp.lang.py, just the first
> responder; and a search on message ID only found it on gmane...
> > Hi all,
> > I have a file with three columns i need to sort the file with respect to
> > the third column. How do I do it uisng python. I used Linux command to do
> > this. Sort but i not able to do it ?
> > can any body ssuggest me
> Question 1: Will the file fit completely within the memory of a running
> Python program?
> Question 2: How are the columns defined? Fixed width, known in advance;
> tab separated; comma separated.
> If #1 is true, I'd read the file into a list of tuples/sublists (if line
> is fixed width columns, read line, manually split on column widths; if
> TSV or CSV use the proper options with the CSV module to read the file).
> Define a sort key function to extract the key column and use the
> built-in list sort method
> data.sort(key=lambda x : x) #warning, I'm not skilled at lambda
> Actually, if text sort order (not numeric value order) is okay, and the
> lines are fixed width columns, no need to manually split the columns
> into tuples; just read all lines into a list and define a key function
> that picks out the columns needed
> data.sort(key=lambda x : x[colstart:colend])
> If #1 if FALSE (too big for memory) you will need to create a sort-merge
> procedure in which you read n-lines of the file; sort them, write to
> temporary file; alternating among 2+ temporary files keeping the same
> n-lines (except for the last packet). Then merge the 2+ temporaries over
> the n-lines in the batch to a new temporary file; after the first n
> lines have been merged (giving n*2+ lines in the batch) switch to
> another temporary file for the next batch.... When all original batches
> are merged, repeat the merge using batches of size n*2+... Repeat until
> only one temporary file is left (ie, only one long merge batch is
> Or figure out how to call whatever system sort command is available
> with whatever parameters are needed -- after all, why reinvent the wheel
> if you can reach outside the snake and grab that is already in the snake
> pit ("outside the snake" => os.system(...); "snake pit" => the OS
> environment). Even WinXP has a command line sort command; as long as you
> don't need a multikey sort it can handle the simple text record sorting
> with limitations on memory size to use.
> Wulfraed Dennis Lee Bieber KD6MOG
> wlfraed [at] ix wulfraed [at] bestiaria
> (Bestiaria Support Staff: web-asst [at] bestiaria)