ThreadIndex

Encode and decode the Outlook Thread-Index email header — also known as the MAPI PidTagConversationIndex property and the Microsoft Graph conversationIndex field — in pure Elixir, with a self-contained Python reference script included.

This library handles both on-the-wire variants of the format (desktop Outlook and Exchange Online/OWA/Graph), recovers correct reply dates from child blocks — including the Exchange-generated blocks that no public parser decoded correctly before — and encodes reply blocks byte-compatible with what Outlook/Exchange themselves produce, so threads keep grouping and ordering correctly in Outlook.

{:ok, index} = ThreadIndex.decode("AQHWLRNo4NaOjvXU8EODe0ZotrA8B6itzaxf")
index.format #=> :modern
index.date #=> ~U[2020-05-18 12:54:02.646732Z]
index.guid #=> <<126, 19, 71, 171, ...>> (16-byte conversation GUID)
hd(index.replies).date #=> ~U[2020-05-18 12:57:39.754393Z]
root = ThreadIndex.encode_root(time: ~U[2025-01-01 10:00:00Z])
reply = ThreadIndex.encode_reply(root, time: ~U[2025-01-01 11:00:00Z])

Installation

def deps do
[
{:thread_index, "~> 0.1.0"}
]
end

Docs: https://hexdocs.pm/thread_index. Not an Elixir shop? Copy examples/thread_index.py — a dependency-free Python port of the same algorithm with a small CLI:

$ python examples/thread_index.py decode "AQHWLRNo4NaOjvXU8EODe0ZotrA8B6itzaxf"
format: modern
date: 2020-05-18 12:54:02.646732 UTC
guid: 7e1347abc447fa1327db1c16fa410b21
reply[ 0] 2020-05-18 12:57:39.754393 UTC dc=1 random=0x5f

Why this library exists

If you build email automation against Microsoft Graph or raw SMTP and want your replies to thread correctly in Outlook, you need to emit a Thread-Index header that extends the parent's conversation index. The format is only partially documented: the official spec (MS-OXOMSG 2.2.1.3) describes one header layout while shipping Outlook clients use another (admitted in the spec's Appendix A footnote <2>), the child-block "time delta" prose contradicts what Microsoft's own encoders do, and Exchange-generated child blocks (footnote <3>: "Exchange 2013, Exchange 2016, and Exchange 2019 set the Delta Code field to 1 and do not calculate the Time Delta field based on TimeDiff") decode to nonsense dates with every published parser we could find — including the forensic tools that documented the format in the first place.

This library is the result of reverse-engineering the actual behavior from Microsoft's own implementations (the MAPI SDK sample cindex.c and the decompiled Microsoft.Exchange.Data.Storage.ConversationIndex) and validating it against published forensic ground-truth vectors and live Exchange Online threads. The complete findings are below.

The format

A conversation index is a 22-byte header block followed by one 5-byte child block per reply. In the Thread-Index MIME header it is base64-encoded; in MAPI/Graph it is the raw binary (Graph also serves it base64-encoded as conversationIndex).

+--------------------------- header (22 bytes) ---------------------------+
| 6 bytes time-derived | 16 bytes conversation GUID |
+--------------------------------------------------------------------------+
| 5-byte child block per reply: [1 bit DC | 31 bits delta | 8 bits random] |
+--------------------------------------------------------------------------+

Finding 1 — there are two header layouts in the wild

:classic:modern
Producersdesktop Outlook 2007–2019, Exchange 2007–2010Exchange 2013+, Exchange Online, OWA, Graph
Layoutbytes 0–5 = FILETIME >> 16byte 0 = 0x01, bytes 1–5 = FILETIME >> 24
Precision6.55 ms1.68 s
Base64 starts withAc / Ad / AeAQ (typically AQH)
Documented inMS-OXOMSG Appendix A footnote <2>MS-OXOMSG 2.2.1.3 main body

(FILETIME = 100 ns ticks since 1601-01-01 UTC. Its top byte is 0x01 for all dates between 1829 and 2057, which is why the classic layout's first byte looks like the documented "reserved 0x01 byte" — it is actually part of the timestamp.)

Detection: if byte 0 is 0x01 and byte 1 is small (< 0x80), it is :modern; under the classic reading byte 1 holds FILETIME bits 55–48, which is ≥ 0xB0 for any date after 1990, so the two layouts cannot collide. Reading a :modern header with the classic rule yields a date around year 1831 — that misread is the root cause of most broken parsers (and, it turns out, part of how the format actually works — see finding 3).

Finding 2 — child deltas are cumulative

Each child block stores a time delta. The spec prose says it is "the difference between the current time and the time stored in the conversation index header". It is not: both of Microsoft's own encoders compute each new delta against the accumulated time of all previous child blocks (see ExtractLastFileTime in cindex.c and GetLastFileTime in Exchange's ConversationIndex, both of which run a cumulative sum). The Meridian Discovery forensic analysis and Joachim Metz's libfmapi documentation observed the same empirically.

The delta encoding itself (from the delta in FILETIME ticks):

DC bitStored valueUnitMax range
0delta >> 18, 31 bits26.2 ms~1.78 years
1delta >> 23, 31 bits0.84 s~57.09 years

DC is 0 when the delta is below 2⁴⁹ ticks (the spec phrases this as testing the high dword against 0x00FE0000). The 5th byte is an implementation-specific uniqueness value (legacy MAPI docs describe it as 4 random bits + 4 sequence bits; Exchange uses the low byte of the message timestamp).

Finding 3 — modern child blocks wrap mod 2³¹ (the previously unsolved part)

Here is the subtle one. When any Microsoft implementation appends a child block — desktop Outlook or Exchange — it computes the chain anchor by reading header bytes 0–5 the classic way, even when the header is :modern. For a modern header that read lands around year 1831, so the first reply's delta is a ~195-year value. It doesn't fit in 31 bits after >> 23, so it silently truncates mod 2³¹ — wrapping modulo ~57.09 years of time. That is why:

Crucially, the truncation is harmless for Microsoft's encode-only usage — a multiple of 2⁵⁶ ticks vanishes mod 2³¹ after either shift, so every client computes identical bytes regardless of how it (mis)reads the header. Only decoders ever notice.

Decoding fix: run the same chain arithmetic (classic-read anchor + cumulative raw deltas), then for each child add the minimal k × 2^(31+shift) ticks needed to land at or after the true header date (minus ~107 s of slack for truncation effects). This recovers correct reply dates for both variants with one code path.

Encoding fix: none needed — compute the delta against the classic-read cumulative anchor and mask to 31 bits, exactly like Outlook/Exchange. This library's encode_reply/2 reproduces real Exchange Online and desktop Outlook child blocks byte-for-byte (it's in the test suite: strip the last child block of a captured index, re-encode it from the decoded timestamp, compare).

What the timestamps mean

Child block timestamps record when each reply was composed (the moment the reply draft was created — "Reply" clicked, or createReply called), not when it was sent or delivered. Expect decoded reply dates to precede the times displayed in Outlook by the composition duration (typically minutes). Forensic examiners use exactly this gap to estimate how long a reply took to write. All times are UTC; the encoder's local clock is the source, so cross-machine skew shows up as-is. Deltas are unsigned — a reply composed "before" the previous block (clock skew) wraps into a bogus positive delta, in this library exactly as in Outlook.

API

Validation

The test suite pins:

Provenance

The research and this prototype were built by Claude Fable (Anthropic's Claude Fable 5 model, working in Claude Code): reverse-engineering the format from Microsoft's own implementations, cross-checking the forensic literature, deriving the wrap model from live Exchange Online thread indexes, and producing the Elixir library, the Python port, and the test suite.

An earlier prototype of the same idea was written with OpenAI Codex and failed in ways that are instructive about this format:

Fable's rework replaced the guesswork with the findings documented above — the dual header layout, the cumulative chain anchored at the classic read, and the mod-2³¹ wrap recovery — each pinned by published ground-truth vectors and byte-identical re-encoding tests against real Outlook and Exchange output.

References

License

MIT — see LICENSE.