Gossamer Forum
Home : General : Perl Programming :

Need to split large text file into smaller ones...

Quote Reply
Need to split large text file into smaller ones...
I have a 28Mb text file that I want to split into smaller text files. The file consists of names at one per line. There are approx. 1.9 million names listed. I would like to have the smaller files created to something more manageable, like 500K to 1Mb.

Anybody know of a script that's available to do this? Cutting and pasting is not my idea of a way to spend Friday night Smile TIA!
Quote Reply
Re: Need to split large text file into smaller ones... In reply to
Could probably be optimized a little better, but that should work for you just fine. Just set the $max_records to however many records (not kb) you want in each file. Set the file path at the top, and there are two spots where you need to specify filename (the input file, and the output file. On the output file, dont remove the $num portion.)


Code:
#!/usr/bin/perl -w


use strict;


print "Content-type: text/html\n\n";


my $max_records = 1000; # Maximum number of records in each file
my $file_path = '/path/to/directory/where/file/is';
my @elements;


my $count = 0;
my $file_no = 1;
open (FILE, "$file_path/test.dat") or die "Can't open file. Reason: $!";
while (<FILE> ) {
push (@elements, $_);
$count++;
if ($count == $max_records) {
write_file($file_no, @elements);
$file_no++;
@elements = ();
$count = 0;
}
}
close (FILE) or die "Can't close file. Reason: $!";
write_file($file_no, @elements) if @elements; # Write the leftovers


sub write_file {
my ($num, @items) = @_;


print "Writing File #$num...\n";
open (FILE, ">$file_path/output$num.dat") or die "Can't open file. Reason: $!";
print FILE foreach @items;
close (FILE) or die "Can't close file. Reason: $!";
print "Finished Writing File #$num...\n\n";
}

--mark

[This message has been edited by Mark Badolato (edited April 28, 2000).]
Quote Reply
Re: Need to split large text file into smaller ones... In reply to
Forgot to mention... you're best bet is to run this thru command line, rather than the web. will work thru the web but you wont see any progress, as I didnt make it an nph file.

Also... if you want the new files to be listed as file0001.dat, file0002.dat, etc, change:

open (FILE, ">$file_path/output$num.dat") or die "Can't open file. Reason: $!";

to:

open (FILE, ">$file_path/output" . sprintf("%04d", $num) . ".dat") or die "Can't open file. Reason: $!";

--mark

[This message has been edited by Mark Badolato (edited April 28, 2000).]
Quote Reply
Re: Need to split large text file into smaller ones... In reply to
Wow! Great! Will give it a try and let you know how I make out. THANKS!
Quote Reply
Re: Need to split large text file into smaller ones... In reply to
I am getting the following error:

Read on closed filehandle <FILE> at ./filesplit.cgi line 14.
Can't close file. Reason: Bad file descriptor at ./filesplit.cgi line 25.

Any ideas? Thanks
Quote Reply
Re: Need to split large text file into smaller ones... In reply to
In the write_file sub, change FILE to something else like DAT.


Dan Smile
Quote Reply
Re: Need to split large text file into smaller ones... In reply to
Could you elaborate more? I am not that familiar with this.
Quote Reply
Re: Need to split large text file into smaller ones... In reply to
Change sub to:

Code:
sub write_file {
my ($num, @items) = @_;

print "Writing File #$num...\n";
open (DAT, ">$file_path/output$num.dat") or die "Can't open file. Reason: $!";
print DAT foreach @items;
close (DAT) or die "Can't close file. Reason: $!";
print "Finished Writing File
#$num...\n\n";}
}


Dan Smile
Quote Reply
Re: Need to split large text file into smaller ones... In reply to
Worked like a charm! Thanks to both you for your help!
Quote Reply
Re: Need to split large text file into smaller ones... In reply to
Oops, that should have been MYFILE ,not FILE.
Little "conflict of interest" there, so to speak.

my bad. sorry! Smile

Good catch dan.

--mark

[This message has been edited by Mark Badolato (edited April 29, 2000).]
Quote Reply
Re: Need to split large text file into smaller ones... In reply to
I just went to use the script again, on another server and I am getting the following error:

syntax error at ./filesplit.cgi line 32, near "DAT foreach "
Execution of ./filesplit.cgi aborted due to compilation errors.

------------------------------------------------
sub write_file {
my ($num, @items) = @_;
print "Writing File #$num...\n";
open (DAT, ">$file_path/output$num.txt") or die "Can't open file. Reason: $!";
print DAT foreach @items; < -- Line 32
close (DAT) or die "Can't close file. Reason: $!";
print "Finished Writing File #$num...\n\n";
}
-------------------------------------------------

Had no problems last time I used it and not aware of any changes except for different server.

Again, any ideas?

Quote Reply
Re: Need to split large text file into smaller ones... In reply to
 
It looks the same problem as someone had in the "encription Algorithym thread".

You are probably using a windows version of perl maybe CPAN's 5.004, it seems to want the foreach command to be followed by brackets.
It fails on my perl 5.004, but I took Mark's advice and downloaded build 522 from activestate, so I can test both flavours.

The code needs altering slightly for your perl.

rog

Quote Reply
Re: Need to split large text file into smaller ones... In reply to
If that's the case, then foreach (@items) should solve it. That better not be the case though, because that's gonna ruin many a game of Perl Golf Smile

--mark

Installation support is provided via ICQ at UIN# 53788453. I will only respond on that number.