MarkdownLd
High-performance Markdown processing with SIMD optimizations and JSON-LD integration for Elixir.
MarkdownLd is built for production systems that require extreme performance and reliability. Leveraging Rust SIMD optimizations, memory pooling, and advanced parsing algorithms, it delivers 10-50x faster markdown processing compared to traditional pure Elixir solutions.
β‘ Performance Highlights
- π 10-50x faster than pure Elixir markdown parsers
- π₯ SIMD-optimized string processing (Apple Silicon NEON, x86 AVX2)
- β‘ Zero-copy binary processing for maximum efficiency
- π§ Memory pools to minimize allocation overhead
- π Parallel processing with configurable concurrency
- π Built-in performance tracking and metrics
π Quick Start
Add to your mix.exs:
def deps do
[
{:markdown_ld, "~> 0.3.0"}
]
endBasic usage:
# Parse markdown content
{:ok, result} = MarkdownLd.parse("""
# Hello World
This is **bold** text with a [link](https://example.com).
def hello, do: :world
- [ ] Todo item
- [x] Completed item
""")
# Result contains structured data
IO.inspect(result.headings)
# [%{level: 1, text: "Hello World", line: 1}]
IO.inspect(result.links)
# [%{text: "link", url: "https://example.com", line: 3}]
IO.inspect(result.code_blocks)
# [%{language: "elixir", content: "def hello, do: :world", line: 5}]
IO.inspect(result.tasks)
# [%{completed: false, text: "Todo item", line: 9},
# %{completed: true, text: "Completed item", line: 10}]π Features
Core Parsing
- Headings - All levels (H1-H6) with line numbers
- Links - Markdown and reference-style links
- Code blocks - Fenced and indented with language detection
- Task lists - GitHub-style checkboxes
- Word counting - SIMD-optimized text analysis
Performance Optimizations
- Zero-copy processing - Direct binary manipulation
- SIMD acceleration - Vectorized pattern matching
- Memory pooling - Reduced allocation overhead
- Pattern caching - LRU cache for repeated structures
- Parallel batch processing - Both Elixir and Rust concurrency
Batch Processing
# Process multiple documents in parallel
documents = ["# Doc 1", "# Doc 2", "# Doc 3"]
# Elixir-side parallel processing
{:ok, results} = MarkdownLd.parse_batch(documents, max_workers: 4)
# Rust-side parallel processing (fastest)
{:ok, results} = MarkdownLd.parse_batch_rust(documents)
# Stream processing with backpressure
results = MarkdownLd.parse_stream(document_stream, max_workers: 8)Performance Tracking
# Get performance metrics
{:ok, stats} = MarkdownLd.get_performance_stats()
IO.inspect(stats)
# %{
# "simd_operations" => 1_250_000,
# "cache_hit_rate" => 85.2,
# "memory_pool_usage" => 2_048_576,
# "pattern_cache_size" => 128
# }
# Reset counters
MarkdownLd.reset_performance_stats()βοΈ Configuration
Configure default options in your config.exs:
config :markdown_ld,
# Performance options
parallel: true,
max_workers: System.schedulers_online(),
# Optimization options
cache_patterns: true,
track_performance: true,
memory_pool_size: 1024 * 1024, # 1MB
pattern_cache_size: 500,
# Processing options
enable_tables: true,
enable_strikethrough: true,
enable_footnotes: true,
# SIMD options (auto-detected)
simd_enabled: true,
simd_features: [:neon, :avx2], # Auto-detected based on CPU
# Batch processing
batch_size: 100,
batch_timeout: 5_000, # 5 seconds
# Development options
debug_performance: false,
log_slow_operations: true,
slow_operation_threshold: 1000 # microsecondsRuntime Configuration
You can also configure options at runtime:
# Per-operation configuration
{:ok, result} = MarkdownLd.parse(content,
parallel: false,
cache_patterns: true,
track_performance: true,
max_workers: 2
)
# Application-wide configuration
Application.put_env(:markdown_ld, :max_workers, 8)ποΈ Advanced Build System
MarkdownLd includes a comprehensive build system with multiple optimization profiles:
# Development build (fast compilation)
make dev
# Production build (maximum optimization)
make prod
# Benchmark build (with profiling symbols)
make bench
# Profile-Guided Optimization
make pgo
# Run comprehensive benchmarks
make benchBuild Profiles
dev- Fast compilation with some optimizationprod- Full LTO, maximum optimization, stripped binariesbench- Optimized with debug symbols for profilingpgo- Profile-Guided Optimization for additional 10-20% gains
π Benchmarks
Based on comprehensive benchmarking:
| Document Size | Processing Time | Throughput | vs Pure Elixir |
|---|---|---|---|
| Small (1KB) | 3-7ΞΌs | 150MB/s | 10-20x faster |
| Medium (10KB) | 5-10ΞΌs | 1GB/s | 10-20x faster |
| Large (100KB) | 15-35ΞΌs | 3GB/s | 10-25x faster |
Extraction Functions
- Word Count: 226,027 KB/s
- Link Extraction: 875,855 KB/s
- Heading Extraction: 333,659 KB/s
- Code Block Extraction: 3,503,418 KB/s
Run benchmarks yourself:
mix run bench/turbo_benchmark.exsπ¦ Production Usage
MarkdownLd is designed for high-throughput production systems:
Scalability
- Thousands of documents per second
- Configurable concurrency (Elixir processes + Rust threads)
- Memory-efficient with pooled allocations
- Graceful degradation under load
Reliability
- Comprehensive error handling
- Memory safety (Rust + Elixir supervision)
- Performance monitoring with built-in metrics
- Extensive test coverage
Integration
- Zero dependencies on external parsers
- Compatible with Phoenix, LiveView, GenServer
- Streamable for large document processing
- Configurable for different performance profiles
π¬ Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Elixir API βββββΆβ Rust NIF Core βββββΆβ SIMD Optimized β
β β β β β Operations β
β β’ Batch Proc. β β β’ Memory Pools β β β’ Pattern Match β
β β’ Streaming β β β’ Pattern Cache β β β’ String Ops β
β β’ Error Handle β β β’ Parallel Proc. β β β’ Word Count β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββπ οΈ Development
# Install dependencies
make install
# Run tests
make test
# Format code
make format
# Lint code
make lint
# Run benchmarks
make bench
# Generate documentation
make docsπ Documentation
- HexDocs - Complete API documentation
- Performance Report - Detailed benchmark results
- Build System - Advanced build configuration
π License
MIT License - see LICENSE for details.
π€ Contributing
- Fork the repository
-
Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with tests
-
Run the full test suite (
make ci) - Submit a pull request
Development Guidelines
- Performance first - All changes should maintain or improve performance
- Comprehensive tests - Include benchmarks for performance-critical code
- Documentation - Update docs for API changes
- Backwards compatibility - Follow semantic versioning
MarkdownLd - Built for production systems that demand extreme performance.
Developed with β€οΈ for the Elixir community.