Microformats2
A Microformats2 parser for Elixir.
Installation
This parser is available in Hex:
Add microformats2 to your list of dependencies in
mix.exs:def deps do [{:microformats2, "~> 0.3.1"}] endIf you want to directly
parsefrom URLs, addteslato your list of dependencies inmix.exs:def deps do [{:microformats2, "~> 0.3.1"}, {:tesla, "~> 1.3.0"}] end
Usage
Give the parser an HTML string and the URL it was fetched from:
Microformats2.parse("""
<div class="h-card">
<img class="u-photo" alt="photo of Mitchell"
src="https://webfwd.org/content/about-experts/300.mitchellbaker/mentor_mbaker.jpg"/>
<a class="p-name u-url"
href="http://blog.lizardwrangler.com/">Mitchell Baker</a>
(<a class="u-url" href="https://twitter.com/MitchellBaker">@MitchellBaker</a>)
<span class="p-org">Mozilla Foundation</span>
<p class="p-note">
Mitchell is responsible for setting the direction and scope of the Mozilla Foundation and its activities.
</p>
<span class="p-category">Strategy</span>
<span class="p-category">Leadership</span>
</div>
""", "http://example.org")It will parse the object to a structure like that:
%{
"items" => [
%{
"properties" => %{
"category" => ["Strategy", "Leadership"],
"name" => ["Mitchell Baker"],
"note" => ["Mitchell is responsible for setting the direction and scope of the Mozilla Foundation and its activities."],
"org" => ["Mozilla Foundation"],
"photo" => [
%{
"alt" => "photo of Mitchell",
"value" => "https://webfwd.org/content/about-experts/300.mitchellbaker/mentor_mbaker.jpg"
}
],
"url" => ["http://blog.lizardwrangler.com/",
"https://twitter.com/MitchellBaker"]
},
"type" => ["h-card"]
}
],
"rel-urls" => %{},
"rels" => %{}
}You can also provide HTML trees already parsed with Floki:
Microformats2.parse(Floki.parse("<div class=\"h-card\">...</div>"), "http://example.org")Or URLs if you have Tesla installed:
Microformats2.parse("http://example.org")Dependencies
We need Floki for HTML parsing and optionally Tesla for fetching URLs.
Features
Implemented:
- parsing depth first, doc order
- parsing a p- property
- parsing a u- property
- parsing a dt- property
- parsing a e- property
- parsing implied properties
- nested properties
- nested microformat with associated property
- dynamic creation of properties
- rel
- nested microformat without associated property
- normalize u-* property values
- value-class-pattern
- recognition of vendor extensions
Not implemented:
- include-pattern
- backwards compatible support for microformats v1
License
This software is licensed under the MIT license.