Hunt-Szymanski diff

A port of the Hunt-Szymanski diff algorithm to Elixir, with a focus on performance and memory efficiency.

This library is particularly suited for longer texts such as wiki pages or blog articles.

Credits

This library was generated with assistance from ChatGPT.
While I guided its development and reviewed the implementation, the core algorithm was recreated by AI.

Usage

left = """
Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.
"""

right = """
Roses are blue,
Violets are blue,
I love fluffy clouds,
And so are you.
"""

# 1. Diff two strings

diff = HSDiff.diff(left, right)

# => 
# [
#   del: ["Roses are red,"],
#   ins: ["Roses are blue,"],
#   eq: ["Violets are blue,"],
#   del: ["Sugar is sweet,"],
#   ins: ["I love fluffy clouds,"],
#   eq: ["And so are you."],
#   eq: [""]
# ]


# 2. Optimize the diff result

optimized = HSDiff.optimize(diff)

# => 
# [
#   del: 1,
#   ins: ["Roses are blue,"],
#   eq: 1,
#   del: 1,
#   ins: ["I love fluffy clouds,"],
#   eq: 2
# ]


# 3. Patch 

restored = HSDiff.patch(left, optimized) 

# => "Roses are blue,\nViolets are blue,\nI love fluffy clouds,\nAnd so are you.\n"

Why Use Hunt-Szymanski Diff?

Elixir’s standard library already provides String.myers_difference/2 to generate diffs between strings (and lists). However:

In short, if you need a diff algorithm for big textual data, Hunt–Szymanski can help you avoid the heavy memory footprint that comes with other approaches.

Installation

Add hunt_szymanski_diff to your list of dependencies in mix.exs:

def deps do
  [
    {:hunt_szymanski_diff, "~> 0.1.0"}
  ]
end