Saxy
Saxy (Sá xị) is a XML SAX parser in Elixir that focuses on speed and standard compliance.
Comply with Extensible Markup Language (XML) 1.0 (Fifth Edition).
Features
- SAX parsing for XML 1.0.
- Large file parsing in native Elixir stream.
- XML Simple DOM.
- Quickly return during parsing process.
- Manual entity references conversion.
Installation
Add :saxy to your mix.exs.
def deps do
[{:saxy, "~> 0.5.0"}]
endOverview
Full documentation is available on HexDocs.
SAX Parser
A SAX event handler implementation is required before starting parsing.
defmodule MyEventHandler do
@behaviour Saxy.Handler
def handle_event(:start_document, prolog, state) do
IO.inspect "Start parsing document"
[{:start_document, prolog} | state]
end
def handle_event(:end_document, _data, state) do
IO.inspect "Finish parsing document"
[{:end_document} | state]
end
def handle_event(:start_element, {name, attributes}, state) do
IO.inspect "Start parsing element #{name} with attributes #{inspect(attributes)}"
[{:start_element, name, attributes} | state]
end
def handle_event(:end_element, {name}, state) do
IO.inspect "Finish parsing element #{name}"
[{:end_element, name} | state]
end
def handle_event(:characters, chars, state) do
IO.inspect "Receive characters #{chars}"
[{:chacters, chars} | state]
end
endThen parse your XML with:
initial_state = []
Saxy.parse_string(data, MyEventHandler, initial_state)Streaming parsing
Saxy’s SAX parser accepts file stream as the input.
stream = File.stream!("/path/to/file")
Saxy.parse_stream(stream, MyEventHandler, initial_state)Or it even accepts a normal stream.
stream = File.stream!("/path/to/file") |> Stream.filter(&(&1 != "\n"))
Saxy.parse_stream(stream, MyEventHandler, initial_state)Simple form parsing
Saxy also supports parsing XML documents into simple-form format.
Saxy.SimpleForm.parse_string(data)
{"menu", [],
[
{"movie",
[{"id", "tt0120338"}, {"url", "https://www.imdb.com/title/tt0120338/"}],
[{"name", [], ["Titanic"]}, {"characters", [], ["Jack & Rose"]}]},
{"movie",
[{"id", "tt0109830"}, {"url", "https://www.imdb.com/title/tt0109830/"}],
[
{"name", [], ["Forest Gump"]},
{"characters", [], ["Forest & Jenny"]}
]}
]}Benchmarking
Performance varies from document to document and depends on the complexity of the XML document. But it often gives 1.4X better performance than erlsom. For some large documents, Saxy can be 4X faster.
The benchmark suite can be found in this repository.
Limitations
- No XSD supported.
-
No DTD supported, when the parser encounters a
<!DOCTYPE, it simply stops parsing.
Where does the name come from?
👆 Sa xi is an awesome soft drink that made by Chuong Duong.
Contributing
If you have any issues or ideas, feel free to write to https://github.com/qcam/saxy/issues.
To start developing:
- Fork the repository.
- Write your code and related tests.
- Create a pull request at https://github.com/qcam/saxy/pulls.