jmwebaze at gmail
May 5, 2012, 6:13 AM
Post #3 of 5
Re: How to compute a delta: the difference between lists of strings
[In reply to]
On Sat, May 5, 2012 at 2:39 PM, Chris Angelico <rosuav [at] gmail> wrote:k
> On Sat, May 5, 2012 at 10:12 PM, J. Mwebaze <jmwebaze [at] gmail> wrote:
> > This is out of curiosity, i know this can be done with python diffllib
> > module, but been figuring out how to compute the delta, Consider two
> > below.
> > s1 = ['e', 'f', 'g', 'A', 'B', 'C', 'D', 'C']
> > s2 =['e', 'A', 'B', 'f', 'g', 'C', 'D', 'z']
> > This is the result should be
> > [.' e', '+ A', '+ B', ' f', ' g', '- A', '- B', ' C', ' D', '- C', '+
> > z']
> > ideas on how to approach this.. ?
> Here's a simple algorithm that produces sorta-mostly-reasonable
> results most of the time:
> Set your current-position-index to the beginning of each lists. (Call
> them 'pos1' and 'pos2'.)
> If s1[pos1] and s2[pos2] are identical, match - increment and iterate.
> Otherwise, increment pos2 until either it matches pos1 or you reach
> the end of s2.
> If you find a match, report the insertion into s2, increment both
> pointers past the match, and carry on.
> If you hit the end of s2, increment pos1 once and report an insertion
> into s1, then go back to looking for a match.
> It helps to append a sentinel to each list; that way, you don't need
> to separately check for additional content at the end of either list
> (as the sentinel won't match any of the strings).
> This is the algorithm I used for writing a simple file diff tool ages
> ago. It's not as good as diff(1), but it was fun to do the exercise.
> It's quite inefficient at handling large insertions into s1, and needs
> to be modified for most file handling (for instance, requiring a
> 2-line or 3-line rematch after a difference, to avoid matching on
> blank lines), but it's a basis.
> Producing beautiful or minimal diffs is a more complex task. :)
*Mob UG: +256 (0) 70 1735800 | NL +31 (0) 6 852 841 38 | Gtalk: jmwebaze |
skype: mwebazej | URL: www.astro.rug.nl/~jmwebaze
/* Life runs on code */*