Treeline on top of a snow covered hill

Diff in elisp
A simple & lispy Myers difference algorithm

Doing some org-mode interface hacking recently, I needed to know what changes happend between two versions of a list. This called for a diff algorithm.

With the help of some websearching I found that I actually wanted to find the longest common subsequence (lcs @ wikipedia) between the two lists. The lcs is what stayed the same between both lists and can be massaged into their diff. I then found the lcs and diff implemented in common lisp and also a rosetta code page. Translating these findings to elisp yields:

(defun greater (a b)
  (if (> (length a) (length b)) a b))

(defun lcs (a b)
  (when (and a b)
    (if (eq (car a) (car b))
        (cons (car a) (lcs (cdr a) (cdr b)))
      (greater (lcs a (cdr b))
               (lcs (cdr a) b)))))

(lcs '(1 2 3)
     '(1 2 3 4))
;; => (1 2 3)

(lcs '(1 2 22 21 3)
     '(2 3 3 3 4 5))
;; => (2 3)

This finds the lcs using recursion (what the wikipedia page calls dynamic programming). To get the diff from this our reference (and wikipedia) recommend walking the lcs and the two lists again to imperatively construct their diff. Since our algorithm already walked all combinations of the two lists, let us construct their diff at the same time. Also throw in memoization with the newish with-memoization macro. This makes for a rather huge runtime difference of around four magnitudes.

(let ((memory (make-hash-table :test 'equal)))
  (defun list-diff (a b)
    "Diff A and B.
Recursivly computes the longest common subsequence and while
doing so assembles the returned diff."
    ;; f:first r:rest
    (with-memoization (gethash (cons a b) memory)
      (cl-labels
          ((choose (a b)
             ;; choose diff with longer lcs
             (let ((la (seq-count #'atom a))
                   (lb (seq-count #'atom b)))
               (if (> la lb) a b)))
           (merge (x)
             ;; merge consecutive add or rem
             (pcase x
               (`((,op ,f) (,op . ,r) . ,d)
                `((,op ,f . ,r) . ,d))
               (d d))))
        (cond
         ((and a b)
          (pcase-let ((`(,af . ,ar) a)
                      (`(,bf . ,br) b))
            (if (eq af bf)
                ;; same
                (cons af (list-diff ar br))
              ;; different
              (merge (choose (cons (list '+ bf) (list-diff a br))
                             (cons (list '- af) (list-diff ar b)))))))
         ;; rest
         (a `((- . ,a)))
         (b `((+ . ,b))))))))

(list-diff '(1 2 3)
           '(1 2 3 4))
;; => (1 2 3 (+ 4))

(list-diff '(1 2 22 21 3)
           '(2 3 3 3 4 5))
;; => ((- 1) 2 (- 22 21) 3 (+ 3 3 4 5))

list-diff assumes lists of atoms which are comparable with equal. But it could be adapted to other types of data as well.

list-diff is a bit more complex than lcs at first glance. But it does more and integrates the previously seperate greater. Let's have a closer look:

with-memoization
uses a hashtable to remember previously computed results
choose
compares two diffs and returns the lcs - the list with more same items
merge
combines consecutive additions or removals
the body
is mostly similar to the previous lcs. It uses pcase-let to get rid of the cars and cdrs. It also has to account for possible rests of both lists that follow their lcs.

So long and merry hacking :)