Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Query regarding set([])?

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


vox2000 at gmail

Jul 10, 2009, 4:46 AM

Post #1 of 8 (391 views)
Permalink
Query regarding set([])?

Hi,
I'm contsructing a simple compare-script and thought I would use set
([]) to generate the difference output. But I'm obviosly doing
something wrong.

file1 contains 410 rows.
file2 contains 386 rows.
I want to know what rows are in file1 but not in file2.

This is my script:
s1 = set(open("file1"))
s2 = set(open("file2"))
s3 = set([])
s1temp = set([])
s2temp = set([])

s1temp = set(i.strip() for i in s1)
s2temp = set(i.strip() for i in s2)
s3 = s1temp-s2temp

print len(s3)

Output is 119. AFAIK 410-386=24. What am I doing wrong here?

BR,
Andy
--
http://mail.python.org/mailman/listinfo/python-list


__peter__ at web

Jul 10, 2009, 5:04 AM

Post #2 of 8 (372 views)
Permalink
Re: Query regarding set([])? [In reply to]

vox wrote:

> I'm contsructing a simple compare-script and thought I would use set
> ([]) to generate the difference output. But I'm obviosly doing
> something wrong.
>
> file1 contains 410 rows.
> file2 contains 386 rows.
> I want to know what rows are in file1 but not in file2.
>
> This is my script:
> s1 = set(open("file1"))
> s2 = set(open("file2"))

Remove the following three lines:

> s3 = set([])
> s1temp = set([])
> s2temp = set([])


> s1temp = set(i.strip() for i in s1)
> s2temp = set(i.strip() for i in s2)
> s3 = s1temp-s2temp
>
> print len(s3)
>
> Output is 119. AFAIK 410-386=24. What am I doing wrong here?

You are probably misinterpreting len(s3). s3 contains lines occuring in
"file1" but not in "file2". Duplicate lines are only counted once, and the
order doesn't matter.

So there are 119 lines that occur at least once in "file2", but not in
"file1".

If that is not what you want you have to tell us what exactly you are
looking for.

Peter

--
http://mail.python.org/mailman/listinfo/python-list


vox2000 at gmail

Jul 10, 2009, 5:52 AM

Post #3 of 8 (375 views)
Permalink
Re: Query regarding set([])? [In reply to]

On Jul 10, 2:04 pm, Peter Otten <__pete...@web.de> wrote:
> You are probably misinterpreting len(s3). s3 contains lines occuring in
> "file1" but not in "file2". Duplicate lines are only counted once, and the
> order doesn't matter.
>
> So there are 119 lines that occur at least once in "file2", but not in
> "file1".
>
> If that is not what you want you have to tell us what exactly you are
> looking for.
>
> Peter

Hi,
Thanks for the answer.

I am looking for a script that compares file1 and file2, for each line
in file1, check if line is present in file2. If the line from file1 is
not present in file2, print that line/write it to file3, because I
have to know what lines to add to file2.

BR,
Andy



--
http://mail.python.org/mailman/listinfo/python-list


drobinow at gmail

Jul 10, 2009, 6:10 AM

Post #4 of 8 (383 views)
Permalink
Re: Query regarding set([])? [In reply to]

On Fri, Jul 10, 2009 at 8:52 AM, vox<vox2000 [at] gmail> wrote:
> I am looking for a script that compares file1 and file2, for each line
> in file1, check if line is present in file2. If the line from file1 is
> not present in file2, print that line/write it to file3, because I
> have to know what lines to add to file2.
Just copy file1 to file2.
(I'm pretty sure that's not what you want, but in explaining why it
should become clearer what you're trying to do.)
--
http://mail.python.org/mailman/listinfo/python-list


davea at ieee

Jul 10, 2009, 7:17 AM

Post #5 of 8 (367 views)
Permalink
Re: Query regarding set([])? [In reply to]

vox wrote:
> On Jul 10, 2:04 pm, Peter Otten <__pete...@web.de> wrote:
>
>> You are probably misinterpreting len(s3). s3 contains lines occuring in
>> "file1" but not in "file2". Duplicate lines are only counted once, and the
>> order doesn't matter.
>>
>> So there are 119 lines that occur at least once in "file2", but not in
>> "file1".
>>
>> If that is not what you want you have to tell us what exactly you are
>> looking for.
>>
>> Peter
>>
>
> Hi,
> Thanks for the answer.
>
> I am looking for a script that compares file1 and file2, for each line
> in file1, check if line is present in file2. If the line from file1 is
> not present in file2, print that line/write it to file3, because I
> have to know what lines to add to file2.
>
> BR,
> Andy
>
>
>
There's no more detail in that response. To the level of detail you
provide, the program works perfectly. Just loop through the set and
write the members to the file.

But you have some unspecified assumptions:
1) order doesn't matter
2) duplicates are impossible in the input file, or at least not
meaningful. So the correct output file could very well be smaller than
either of the input files.

And a few others that might matter:
3) the two files are both text files, with identical line endings
matching your OS default
4) the two files are ASCII, or at least 8 bit encoded, using the
same encoding (such as both UTF-8)
5) the last line of each file DOES have a trailing newline sequence



--
http://mail.python.org/mailman/listinfo/python-list


vox2000 at gmail

Jul 10, 2009, 7:28 AM

Post #6 of 8 (366 views)
Permalink
Re: Query regarding set([])? [In reply to]

On Jul 10, 4:17 pm, Dave Angel <da...@ieee.org> wrote:
> vox wrote:
> > On Jul 10, 2:04 pm, Peter Otten <__pete...@web.de> wrote:
>
> >> You are probably misinterpreting len(s3). s3 contains lines occuring in
> >> "file1" but not in "file2". Duplicate lines are only counted once, and the
> >> order doesn't matter.
>
> >> So there are 119 lines that occur at least once in "file2", but not in
> >> "file1".
>
> >> If that is not what you want you have to tell us what exactly you are
> >> looking for.
>
> >> Peter
>
> > Hi,
> > Thanks for the answer.
>
> > I am looking for a script that compares file1 and file2, for each line
> > in file1, check if line is present in file2. If the line from file1 is
> > not present in file2, print that line/write it to file3, because I
> > have to know what lines to add to file2.
>
> > BR,
> > Andy
>
> There's no more detail in that response.  To the level of detail you
> provide, the program works perfectly.  Just loop through the set and
> write the members to the file.
>
> But you have some unspecified assumptions:
>     1) order doesn't matter
>     2) duplicates are impossible in the input file, or at least not
> meaningful.  So the correct output file could very well be smaller than
> either of the input files.
>
> And a few others that might matter:
>     3) the two files are both text files, with identical line endings
> matching your OS default
>     4) the two files are ASCII, or at least 8 bit encoded, using the
> same encoding  (such as both UTF-8)
>     5) the last line of each file DOES have a trailing newline sequence

Thanks all for the input!
I have guess I have to think it through a couple times more. :)

BR,
Andy
--
http://mail.python.org/mailman/listinfo/python-list


__peter__ at web

Jul 10, 2009, 7:59 AM

Post #7 of 8 (365 views)
Permalink
Re: Query regarding set([])? [In reply to]

vox wrote:

> On Jul 10, 4:17 pm, Dave Angel <da...@ieee.org> wrote:
>> vox wrote:
>> > On Jul 10, 2:04 pm, Peter Otten <__pete...@web.de> wrote:
>>
>> >> You are probably misinterpreting len(s3). s3 contains lines occuring
>> >> in "file1" but not in "file2". Duplicate lines are only counted once,
>> >> and the order doesn't matter.
>>
>> >> So there are 119 lines that occur at least once in "file2", but not in
>> >> "file1".
>>
>> >> If that is not what you want you have to tell us what exactly you are
>> >> looking for.
>>
>> >> Peter
>>
>> > Hi,
>> > Thanks for the answer.
>>
>> > I am looking for a script that compares file1 and file2, for each line
>> > in file1, check if line is present in file2. If the line from file1 is
>> > not present in file2, print that line/write it to file3, because I
>> > have to know what lines to add to file2.
>>
>> > BR,
>> > Andy
>>
>> There's no more detail in that response. To the level of detail you
>> provide, the program works perfectly. Just loop through the set and
>> write the members to the file.
>>
>> But you have some unspecified assumptions:
>> 1) order doesn't matter
>> 2) duplicates are impossible in the input file, or at least not
>> meaningful. So the correct output file could very well be smaller than
>> either of the input files.
>>
>> And a few others that might matter:
>> 3) the two files are both text files, with identical line endings
>> matching your OS default
>> 4) the two files are ASCII, or at least 8 bit encoded, using the
>> same encoding (such as both UTF-8)
>> 5) the last line of each file DOES have a trailing newline sequence
>
> Thanks all for the input!
> I have guess I have to think it through a couple times more. :)

Indeed. Note that others thinking through related problems have come up with

http://docs.python.org/library/difflib.html

Peter

--
http://mail.python.org/mailman/listinfo/python-list


tjreedy at udel

Jul 10, 2009, 1:47 PM

Post #8 of 8 (371 views)
Permalink
Re: Query regarding set([])? [In reply to]

vox wrote:
> Hi,
> I'm contsructing a simple compare-script and thought I would use set
> ([]) to generate the difference output. But I'm obviosly doing
> something wrong.
>
> file1 contains 410 rows.
> file2 contains 386 rows.
> I want to know what rows are in file1 but not in file2.
>
> This is my script:
> s1 = set(open("file1"))
> s2 = set(open("file2"))
> s3 = set([])
> s1temp = set([])
> s2temp = set([])
>
> s1temp = set(i.strip() for i in s1)
> s2temp = set(i.strip() for i in s2)
> s3 = s1temp-s2temp
>
> print len(s3)
>
> Output is 119. AFAIK 410-386=24. What am I doing wrong here?

Assuming that every line in s2 is in s1. If there are lines in s2 that
are not in s1, then the number of lines in s1 not in s2 will be larger
than 24. s1 - s2 subtracts the intersection of s1 and s2.

--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.