| |
 Search this list this category for: (Advanced)

Mailing List Archive: Python: Python

# How to compute a delta: the difference between lists of strings

jmwebaze at gmail

May 5, 2012, 5:12 AM

Post #1 of 5 (154 views)
 How to compute a delta: the difference between lists of strings
This is out of curiosity, i know this can be done with python diffllib
module, but been figuring out how to compute the delta, Consider two lists
below.

s1 = ['e', 'f', 'g', 'A', 'B', 'C', 'D', 'C']
s2 =['e', 'A', 'B', 'f', 'g', 'C', 'D', 'z']

This is the result should be

[.' e', '+ A', '+ B', ' f', ' g', '- A', '- B', ' C', ' D', '- C', '+
z']

ideas on how to approach this.. ?

--
*Mob UG: +256 (0) 70 1735800 | NL +31 (0) 6 852 841 38 | Gtalk: jmwebaze |
skype: mwebazej | URL: www.astro.rug.nl/~jmwebaze

/* Life runs on code */*

rosuav at gmail

May 5, 2012, 5:39 AM

Post #2 of 5 (149 views)
 Re: How to compute a delta: the difference between lists of strings [In reply to]
On Sat, May 5, 2012 at 10:12 PM, J. Mwebaze <jmwebaze [at] gmail> wrote:
> This is out of curiosity, i know this can be done with python diffllib
> module, but been figuring out how to compute the delta, Consider two lists
> below.
>
> s1 = ['e', 'f', 'g', 'A', 'B', 'C', 'D', 'C']
> s2 =['e', 'A', 'B', 'f', 'g', 'C', 'D', 'z']
>
> This is the result should be
>
> [.'  e', '+ A', '+ B', '  f', '  g', '- A', '- B', '  C', '  D', '- C', '+
> z']
>
> ideas on how to approach this.. ?

Here's a simple algorithm that produces sorta-mostly-reasonable
results most of the time:

Set your current-position-index to the beginning of each lists. (Call
them 'pos1' and 'pos2'.)
If s1[pos1] and s2[pos2] are identical, match - increment and iterate.
Otherwise, increment pos2 until either it matches pos1 or you reach
the end of s2.
If you find a match, report the insertion into s2, increment both
pointers past the match, and carry on.
If you hit the end of s2, increment pos1 once and report an insertion
into s1, then go back to looking for a match.

It helps to append a sentinel to each list; that way, you don't need
to separately check for additional content at the end of either list
(as the sentinel won't match any of the strings).

This is the algorithm I used for writing a simple file diff tool ages
ago. It's not as good as diff(1), but it was fun to do the exercise.
It's quite inefficient at handling large insertions into s1, and needs
to be modified for most file handling (for instance, requiring a
2-line or 3-line rematch after a difference, to avoid matching on
blank lines), but it's a basis.

Producing beautiful or minimal diffs is a more complex task. :)

ChrisA
--
http://mail.python.org/mailman/listinfo/python-list

jmwebaze at gmail

May 5, 2012, 6:13 AM

Post #3 of 5 (150 views)
 Re: How to compute a delta: the difference between lists of strings [In reply to]
thank Chris..

On Sat, May 5, 2012 at 2:39 PM, Chris Angelico <rosuav [at] gmail> wrote:k

> On Sat, May 5, 2012 at 10:12 PM, J. Mwebaze <jmwebaze [at] gmail> wrote:
> > This is out of curiosity, i know this can be done with python diffllib
> > module, but been figuring out how to compute the delta, Consider two
> lists
> > below.
> >
> > s1 = ['e', 'f', 'g', 'A', 'B', 'C', 'D', 'C']
> > s2 =['e', 'A', 'B', 'f', 'g', 'C', 'D', 'z']
> >
> > This is the result should be
> >
> > [.' e', '+ A', '+ B', ' f', ' g', '- A', '- B', ' C', ' D', '- C', '+
> > z']
> >
> > ideas on how to approach this.. ?
>
> Here's a simple algorithm that produces sorta-mostly-reasonable
> results most of the time:
>
> Set your current-position-index to the beginning of each lists. (Call
> them 'pos1' and 'pos2'.)
> If s1[pos1] and s2[pos2] are identical, match - increment and iterate.
> Otherwise, increment pos2 until either it matches pos1 or you reach
> the end of s2.
> If you find a match, report the insertion into s2, increment both
> pointers past the match, and carry on.
> If you hit the end of s2, increment pos1 once and report an insertion
> into s1, then go back to looking for a match.
>
> It helps to append a sentinel to each list; that way, you don't need
> to separately check for additional content at the end of either list
> (as the sentinel won't match any of the strings).
>
> This is the algorithm I used for writing a simple file diff tool ages
> ago. It's not as good as diff(1), but it was fun to do the exercise.
> It's quite inefficient at handling large insertions into s1, and needs
> to be modified for most file handling (for instance, requiring a
> 2-line or 3-line rematch after a difference, to avoid matching on
> blank lines), but it's a basis.
>
> Producing beautiful or minimal diffs is a more complex task. :)
>
> ChrisA
> --
> http://mail.python.org/mailman/listinfo/python-list
>

--
*Mob UG: +256 (0) 70 1735800 | NL +31 (0) 6 852 841 38 | Gtalk: jmwebaze |
skype: mwebazej | URL: www.astro.rug.nl/~jmwebaze

/* Life runs on code */*

emile at fenx

May 5, 2012, 8:50 AM

Post #4 of 5 (146 views)
 Re: How to compute a delta: the difference between lists of strings [In reply to]
On 5/5/2012 5:12 AM J. Mwebaze said...
> This is out of curiosity, i know this can be done with python diffllib
> module, but been figuring out how to compute the delta, Consider two
> lists below.
>
> s1 = ['e', 'f', 'g', 'A', 'B', 'C', 'D', 'C']
> s2 =['e', 'A', 'B', 'f', 'g', 'C', 'D', 'z']
>
> This is the result should be
>
> [.' e', '+ A', '+ B', ' f', ' g', '- A', '- B', ' C', ' D', '- C',
> '+ z']
>
> ideas on how to approach this.. ?
>

use difflib:

from difflib import unified_diff as ud

s1 = ['e', 'f', 'g', 'A', 'B', 'C', 'D', 'C']
s2 = ['e', 'A', 'B', 'f', 'g', 'C', 'D', 'z']

[ r for r in ud(s1,s2,lineterm="|") if not r.endswith("|") ]

HTH,

Emile

--
http://mail.python.org/mailman/listinfo/python-list

vito.detullio at gmail

May 5, 2012, 9:16 AM

Post #5 of 5 (144 views)
 Re: How to compute a delta: the difference between lists of strings [In reply to]
J. Mwebaze wrote:

> This is out of curiosity, i know this can be done with python diffllib
> module, but been figuring out how to compute the delta, Consider two lists
> below.
>
> s1 = ['e', 'f', 'g', 'A', 'B', 'C', 'D', 'C']
> s2 =['e', 'A', 'B', 'f', 'g', 'C', 'D', 'z']
>
> This is the result should be
>
> [.' e', '+ A', '+ B', ' f', ' g', '- A', '- B', ' C', ' D', '- C', '+
> z']
>
> ideas on how to approach this.. ?

http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

--
ZeD

--
http://mail.python.org/mailman/listinfo/python-list