MarkdownLd
High-performance Markdown processing with SIMD optimizations and JSON-LD integration for Elixir.
MarkdownLd is built for production systems that require extreme performance and reliability. Leveraging Rust SIMD optimizations, memory pooling, and advanced parsing algorithms, it delivers 10-50x faster markdown processing compared to traditional pure Elixir solutions.
β‘ Performance Highlights
- π 10-50x faster than pure Elixir markdown parsers
- π₯ SIMD-optimized string processing (Apple Silicon NEON, x86 AVX2)
- β‘ Zero-copy binary processing for maximum efficiency
- π§ Memory pools to minimize allocation overhead
- π Parallel processing with configurable concurrency
- π Built-in performance tracking and metrics
π Quick Start
Add to your mix.exs:
def deps do
[
{:markdown_ld, "~> 0.4.0"}
]
endBasic usage:
# Parse markdown content
{:ok, result} = MarkdownLd.parse("""
# Hello World
This is **bold** text with a [link](https://example.com).
def hello, do: :world
- [ ] Todo item
- [x] Completed item
""")
# Result contains structured data
IO.inspect(result.headings)
# [%{level: 1, text: "Hello World", line: 1}]
IO.inspect(result.links)
# [%{text: "link", url: "https://example.com", line: 3}]
IO.inspect(result.code_blocks)
# [%{language: "elixir", content: "def hello, do: :world", line: 5}]
IO.inspect(result.tasks)
# [%{completed: false, text: "Todo item", line: 9},
# %{completed: true, text: "Completed item", line: 10}]π Features
Core Parsing
- Headings - All levels (H1-H6) with line numbers
- Links - Markdown and reference-style links
- Code blocks - Fenced and indented with language detection
- Task lists - GitHub-style checkboxes
- Word counting - SIMD-optimized text analysis
Diff & Merge (Foundations)
- Diff data model for structure-aware markdown changes
- JSON-LD semantic ops types (add/remove/update triples)
- Three-way merge skeleton with conflict detection
- Streaming event schema for real-time patching
- Inline diff ops for text updates within blocks
- JSON-LD stub extractor with triple-level diff
-
Supports code fences (
json-ld) and simple frontmatter `jsonld: { ... }` - Basic context expansion: `@vocab`, prefixes, and term definitions map to IRIs ### Performance Optimizations - **Zero-copy processing** - Direct binary manipulation - **SIMD acceleration** - Vectorized pattern matching - **Memory pooling** - Reduced allocation overhead - **Pattern caching** - LRU cache for repeated structures - **Parallel batch processing** - Both Elixir and Rust concurrency ### Batch Processingelixir
-
Supports code fences (
Process multiple documents in parallel
documents = ["# Doc 1", "# Doc 2", "# Doc 3"]
Elixir-side parallel processing
= MarkdownLd.parse_batch(documents, max_workers: 4)
Rust-side parallel processing (fastest)
= MarkdownLd.parse_batch_rust(documents)
Stream processing with backpressure
results = MarkdownLd.parse_stream(document_stream, max_workers: 8)
### Performance Tracking
Get performance metrics
= MarkdownLd.get_performance_stats() IO.inspect(stats)
%{
"simd_operations" => 1_250_000,
"cache_hit_rate" => 85.2,
"memory_pool_usage" => 2_048_576,
"pattern_cache_size" => 128
}
Reset counters
MarkdownLd.reset_performance_stats()
## π Diffing
old = """
Title
Hello world
JSONLD: post:1, schema:name, Hello """
new = """
Title
Hello brave new world
JSONLD: post:1, schema:name, Hello World JSONLD: post:1, schema:author, Alice """
= MarkdownLd.diff(old, new, similarity_threshold: 0.5) IO.inspect(Enum.map(patch.changes, & &1.kind))
[:update_block, :jsonld_update, :jsonld_add]
### Streaming Diffs
old = """
Title
Para one
JSONLD: post:1, schema:name, Hello """
new = """
Title
Para one updated
JSONLD: post:1, schema:name, Hello World """
events = MarkdownLd.Diff.Stream.emit(old, new, max_paragraphs: 2)
=> [%StreamEvent{type: :init_snapshot, ...}, %StreamEvent{type: :chunk_patch, ...}, %StreamEvent{type: :complete, ...}]
= MarkdownLd.Diff.Stream.apply_events(old, events, max_paragraphs: 2)
Heading-Level Chunking
# Chunk by H1 sections (default), align by stable heading IDs
events = MarkdownLd.Diff.Stream.emit(old, new,
chunk_strategy: :headings,
heading_level: 1,
rename_match_threshold: 0.7 # fuzzy-match renamed headings
)
# Chunk by H2 subsections
events = MarkdownLd.Diff.Stream.emit(old, new,
chunk_strategy: :headings,
heading_level: 2
)π§© Chunking Strategies
-
Paragraphs (default):
chunk_strategy: :paragraphs,max_paragraphs: 8. -
Headings:
chunk_strategy: :headingsto start a new chunk at each heading; events include a stable_id derived from the heading text.
βοΈ Conflict Formatting
merge = MarkdownLd.Diff.three_way_merge(base_patch, our_patch, their_patch)
if merge.merged == nil do
# Present conflicts in UI
messages = MarkdownLd.Diff.Format.to_text(merge.conflicts)
maps = MarkdownLd.Diff.Format.to_maps(merge.conflicts)
end⨠Inline Preview
# Given an update_block payload with inline_ops from the diff
ops = [{:keep, "Hello"}, {:delete, "brave"}, {:insert, "bold"}, {:keep, "world"}]
MarkdownLd.Diff.Preview.render_ops(ops)
# "Hello {-brave-} {+bold+} world"
# ANSI style
MarkdownLd.Diff.Preview.render_ops(ops, style: :ansi)
## βοΈ Configuration
Configure default options in your `config.exs`:
config :markdown_ld, # Performance options parallel: true, max_workers: System.schedulers_online(),
# Optimization options
cache_patterns: true,
track_performance: true,
memory_pool_size: 1024 * 1024, # 1MB
pattern_cache_size: 500,
# Processing options enable_tables: true, enable_strikethrough: true, enable_footnotes: true,
# SIMD options (auto-detected) simd_enabled: true, simd_features: [:neon, :avx2], # Auto-detected based on CPU
# Batch processing batch_size: 100, batch_timeout: 5_000, # 5 seconds
# Development options debug_performance: false, log_slow_operations: true, slow_operation_threshold: 1000 # microseconds
### Runtime Configuration
You can also configure options at runtime:
Per-operation configuration
= MarkdownLd.parse(content, parallel: false, cache_patterns: true, track_performance: true, max_workers: 2 )
Application-wide configuration
Application.put_env(:markdown_ld, :max_workers, 8)
## ποΈ Advanced Build System
MarkdownLd includes a comprehensive build system with multiple optimization profiles:
Development build (fast compilation)
make dev
Production build (maximum optimization)
make prod
Benchmark build (with profiling symbols)
make bench
Profile-Guided Optimization
make pgo
Run comprehensive benchmarks
make bench
### Build Profiles
- **`dev`** - Fast compilation with some optimization
- **`prod`** - Full LTO, maximum optimization, stripped binaries
- **`bench`** - Optimized with debug symbols for profiling
- **`pgo`** - Profile-Guided Optimization for additional 10-20% gains
## π Benchmarks
Based on comprehensive benchmarking:
| Document Size | Processing Time | Throughput | vs Pure Elixir |
|---------------|----------------|------------|-----------------|
| Small (1KB) | 3-7ΞΌs | 150MB/s | 10-20x faster |
| Medium (10KB) | 5-10ΞΌs | 1GB/s | 10-20x faster |
| Large (100KB) | 15-35ΞΌs | 3GB/s | 10-25x faster |
### Extraction Functions
- **Word Count**: 226,027 KB/s
- **Link Extraction**: 875,855 KB/s
- **Heading Extraction**: 333,659 KB/s
- **Code Block Extraction**: 3,503,418 KB/s
Run benchmarks yourself:mix run bench/turbo_benchmark.exs
## π¦ Production Usage
MarkdownLd is designed for high-throughput production systems:
### Scalability
- **Thousands of documents per second**
- **Configurable concurrency** (Elixir processes + Rust threads)
- **Memory-efficient** with pooled allocations
- **Graceful degradation** under load
### Reliability
- **Comprehensive error handling**
- **Memory safety** (Rust + Elixir supervision)
- **Performance monitoring** with built-in metrics
- **Extensive test coverage**
### Integration
- **Zero dependencies** on external parsers
- **Compatible** with Phoenix, LiveView, GenServer
- **Streamable** for large document processing
- **Configurable** for different performance profiles
## π¬ Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ β Elixir API βββββΆβ Rust NIF Core βββββΆβ SIMD Optimized β β β β β β Operations β β β’ Batch Proc. β β β’ Memory Pools β β β’ Pattern Match β β β’ Streaming β β β’ Pattern Cache β β β’ String Ops β β β’ Error Handle β β β’ Parallel Proc. β β β’ Word Count β βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
## π οΈ Development
Install dependencies
make install
Run tests
make test
Format code
make format
Lint code
make lint
Run benchmarks
make bench
Generate documentation
make docs
## π Documentation
- **[HexDocs](https://hexdocs.pm/markdown_ld)** - Complete API documentation
- **[Performance Report](PERFORMANCE_REPORT.md)** - Detailed benchmark results
- **[Build System](Makefile)** - Advanced build configuration
## π License
MIT License - see [LICENSE](LICENSE) for details.
## π€ Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes with tests
4. Run the full test suite (`make ci`)
5. Submit a pull request
### Development Guidelines
- **Performance first** - All changes should maintain or improve performance
- **Comprehensive tests** - Include benchmarks for performance-critical code
- **Documentation** - Update docs for API changes
- **Backwards compatibility** - Follow semantic versioning
---
**MarkdownLd** - Built for production systems that demand extreme performance.
Developed with β€οΈ for the Elixir community.
## π Quick Links
- Diff API: `MarkdownLd.diff/3`
- Merge API: `MarkdownLd.Merge.merge_texts/4`
- Streaming: `MarkdownLd.Diff.Stream.emit/3` and `apply_events/3`
- Inline Preview: `MarkdownLd.Diff.Preview.render_ops/2`
- QCPrompt: `QCP.parse/1`, `QCP.Stream.process/1`
- Spec: `SPEC.md` (MarkdownβLD Profile v0.1)