ejpet
Matching JSON nodes in Erlang.
What for ?
Kind of regular expression applied to JSON documents.
- Find if a JSON document has some structural properties, and possibly extract some information.
- Useful to extract small data pieces from large JSON documents.
- Efficient filtering of JSON nodes in real time.
Backends for sone, jsx, jiffy and mochijson2.
Quick start
Obtain ejpet
Add it to your project
Add a dependency to ejpet and possibly to a supported JSON
codec in your project dependency set.
-
With
rebar3, inrebar.configfile
{deps, [
%% ...
{ejpet, ".*", {git, "git://github.com/nmichel/ejpet.git", {tag, "0.7.0"}},
{jsx, ".*", {git, "https://github.com/talentdeficit/jsx.git", {tag, "v2.8.3"}},
%% ...
]}.-
With
mix, inmix.exsfile
defmodule MyProject.Mixfile do
use Mix.Project
def project do
[
# ...
deps: deps()
# ...
]
end
defp deps() do
[
# ...
{:ejpet, "~> 0.7.0"},
{:jsx, "~> 2.8"},
# ...
]
end
endFrom source
Clone
$ git clone git@github.com:nmichel/ejpet.gitBuild
$ cd ejpet
$ ./rebar get-deps
$ make && make testStart Erlang shell
erl -pz ./ebin ./deps/*/ebinStart (m)using
Read some JSON data
1> {ok, Data} = file:read_file("./test/channels_list.json").
{ok,<<239,187,191,91,13,10,32,32,32,32,123,13,10,32,32,
32,32,32,32,32,32,34,110,117,109,98,101,...>>}
Decode JSON using, say, jsx (provided you have jsx in your load path)
2> Node = jsx:decode(Data).
[[{<<"number">>,1},
{<<"lcn">>,2},
{<<"name">>,<<"France 2">>},
{<<"sap_group">>,<<>>},
{<<"ip_multicast">>,<<"239.100.10.1">>},
{<<"port_multicast">>,1234},
{<<"num_clients">>,0},
{<<"scrambling_ratio">>,0},
{<<"is_up">>,1},
{<<"pcr_pid">>,120},
{<<"pmt_version">>,4},
{<<"unicast_port">>,0},
{<<"service_id">>,257},
{<<"service_type">>,
<<"Please report : Unknown service type doc : EN 30"...>>},
{<<"pids_num">>,7},
{<<"pids">>,
...Ok. Now define what we are looking for, and what we want to get
Find somewhere in a list, an object with
* a {"ip_multicast", "239.100.10.4"} pair
* a key "pcr_pid", whatever value captured in variable "pcr",
* a key "pids", which value is either a list or an object into which there are
* an object with
* a key "language" which value matches regex "^fr",
* a key "number", whatever value captured in variable "apid"
* a key "type", whatever value captured in variable "acodec"
* an object with
* a key "type", which value matches regex "Video" captured in variable "vcodec"
* a key "number", whatever value captured in variable "vpid"3> O = ejpet:compile("[*, {\"ip_multicast\":\"239.100.10.4\",
\"pcr_pid\":(?<pcr>_),
\"pids\":<{\"language\": #\"^fr\",
\"number\": (?<apid>_),
\"type\": (?<acodec>_)},
{\"type\": (?<vcodec>#\"Video\"),
\"number\": (?<vpid>_)}>}, *]", jsx).
{ejpet,jsx,#Fun<ejpet_jsx_generators.9.11467207>}Run and seek …
4> ejpet:run(Node, O).Here you are !
{true,[{"vpid",520},
{"vcodec",[<<"Video (MPEG2)">>]},
{"acodec",[<<"Audio (MPEG1)">>]},
{"apid",530},
{"pcr",520}]}How ?
Express what you want to match using a simple expression language.
Expression syntax
| pattern | match ? | Notes |
——————|————|——————————
| true | true | |
| false | false | |
| null | null | |
| "string" | the string "string" | UTF-8 encoded string (with escaping) |
| #"regex" | any string matching regex "regex" | UTF-8 encoded string (no escaping) |
| number | the number number e.g. (42, 3.14159, -3395.1264e-22 ) | |
| { kv* } | object for which all kv (key/value) patterns are matched | Order does not matter |
| [ item* (, *)?] | list for which all item patterns are matched | Order DOES matter |
| < value* > | value set (list, or object values) for which all value patterns are matched | Order does not matter
| < value* >/g | same as previous but search for ALL matches. Useful only when capturing | Order does not matter
| <! value* !> | same as < value* > but search deep. |
| <! value* !>/g | same as previous but search for ALL matches. Useful only when capturing |
| (?<name>expr) | capture expression expr in return value name | Every JSON expression may be captured
| (!<name>type) | match json object of type type against parameter named name |
kv may be one of the form
- _:pattern
"key":_"key":pattern
item may be one of the form
*,pattern- pattern
value is a pattern
kv, item and value are separated by ,.
In parameter injection typemay be
numberbooleanstringregex
Notes
Numbers
number matching may be strict or loose, depending on an option passed are compile-time.
1> ejpet:match(<<"42.0">>, "42").
{true,<<"{}">>}
2> ejpet:match(<<"42.0">>, "42", [{number_strict_match, true}]).
{false,<<"{}">>}Strings and Regex
string and regex are UTF-8 encoded byte streams.
They may contain escaping sequences, as in "\\b", or "\u00E9". When found in a string these sequences are interpreted by default (but they may be left as-is with option string_apply_escape_sequence set to false). Found in regex they are not interpreted.
3> ejpet:match(<<"\"\x{00E9}\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, true}]).
{true,<<"{}">>}
4> ejpet:match(<<"\"\x{00E9}\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, false}]).
{false,<<"{}">>}
5> ejpet:match(<<"\"\\\\u00E9\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, false}]).
{true,<<"{}">>}
Codepoint produced by evaluating an escape sequence of the form \uABCD is NOT checked. One can insert any codepoint, valid or not, in a string or regex.
Captures
Every pattern p can be captured by simply substituing it by (?<variable_name>p). Captures are returned as a JSON object, where each variable_name ìs a key, and the list if captures found for that variable is the value.
This JSON object is build with repect to the backend indicated when compiling the pattern.
Warning : if there is no captures to return, the empty JSON object {} will be returned. But its actual form depends on the backend.
-
jsx:
[{}] -
jiffy:
{[]} -
mochijson:
{struct, []}
One may wonder why return captures as a encoded JSON object. There is 2 reasons :
- captures objects are captured “as is” in the parsed document, i.e. in their encoded form. Using the backend encoding for the result is more coherent;
- capture JSON object can itself be pattern matched.
Parameters Injection
It is possible to provide some matching values at match-time, through parameter injection forms like (!<param_name>param_type), where param_type may be number, string, boolean and regex.
At match-time, produced matching functions will look for an entry named param_name in the provided parameters list. See ejpet:run/3 and ejpet:match/4.
Note that string values should be binaries, and regex values MUST be mp() opaque objects returned by re:compile/2.
API
backend() = jsx | jiffy | mochijson2
epm() = {ejpet, term(), term()}
expr_src() = string()
compile_option() = {string_apply_escape_sequence, boolean()}
| {number_strict_match, boolean()}
json_input() = string() | binary()
json_src() = binary()
json_term() = jsx_term() | jiffy_term() | mochijson2_term()
run_param_name = binary()
run_param_value = boolean() | number() | binary() | re::mp()
run_param = {run_param_name(), run_param_value()}
run_res() = {match_stat(), json_term()}
match_res() = {match_stat(), json_src()}
match_stat() = true | false
ejpet:decode(JSONText, Backend) -> json_term()
JSONText = json_input()
Backend = backend()
ejpet:encode(JSONTerm, Backend) -> json_term()
JSONTerm = json_term()
Backend = backend()
ejpet:compile(Expr, Backend, Options) -> epm()
Expr = expr_src()
Backend = backend()
Options = [Option]
Option = compile_option()
ejpet:compile(Expr, Backend) -> epm()
Same as ejpet:compile(Expr, Backend, [])
ejpet:compile(Expr) -> epm()
Same as ejpet:compile(Expr, jsx, [])
ejpet:backend(EPM) -> backend()
EPM = epm()
ejpet:run(JSONTerm, EPM, Params) -> run_res()
EPM = epm()
JSONTerm = json_term()
Params = [Param]
Param = run_param()
ejpet:run(JSONTerm, EPM) -> run_res()
Same pas ejpet:run(JSONTerm, EPM, [])
ejpet:match(JSONText, Expr, Options, Params) -> match_res()
JSONText = json_input()
Expr = expr_src() | epm()
Options = [Option]
Option = compile_option()
Params = [Param]
Param = run_param()
ejpet:match(JSONText, Expr, Options) -> match_res()
Same as ejpet:match(JSONText, Expr, Options, [])
ejpet:match(JSONText, Expr) -> match_res()
Same as ejpet:match(JSONText, Expr, [], [])
ejpet:get_status(Res) -> match_stat()
Res = run_res() | match_res()
get_captures(Res) -> json_term()
Res = run_res() | match_res()
get_capture(Res, Name) -> {ok, json_term()} | not_found
Same as get_captures(Res, Name, jsx)
get_capture(Res, Name, Backend) -> {ok, json_term()} | not_found
Res = run_res()
Name = string() | binary()
Backend = backend()
empty_capture_set() -> json_term()
Same as empty_capture_set(jsx)
empty_capture_set(Backend) -> json_term()
Backend = backend()Examples
Basics
| Expression | Match | No match | Code snippet |
——————|——————————————|————————————-|—–
| 42 | 42 | "42", [42], {"key": 42} | ejpet:match(<<"42">>, "42"). |
| "42" | "42" | 42, ["42"], {"key": "42"} | ejpet:match(<<"\"42\"">>, "\"42\""). |
| true | true | "true", [true] | ejpet:match(<<"true">>, "true"). |
| false | false | "false", [false] | ejpet:match(<<"false">>, "false"). |
| null | null | "null", [null] | ejpet:match(<<"null">>, "null"). |
| #"foo" | "foobar", "barfoo" | "barfo" | ejpet:match(<<"\"foobar\"">>, "#\"foo\""). |
| #"^foo" | "foobar" | "barfoo" | ejpet:match(<<"\"foobar\"">>, "#\"^foo\""). |
| #"bar$" | "foobar" | "barfoo" | ejpet:match(<<"\"foobar\"">>, "#\"bar$\""). |
Objects
| Expression | Match | No match | Code snippet |
——————|——————————————|————————————-|—–
| {_:42} | {"bar": 42}, {"bar": 47, "foo": 42} | {"bar": 47}, {"foo": "42"} | ejpet:match(<<"{\"foo\": 42}">>, "{_:42}"). |
| {"foo":_} | {"foo": 42}, {"bar": 42, "foo": {}} | {"bar": "foo"} | ejpet:match(<<"{\"foo\": 42}">>, "{\"foo\":_}"). |
| {"foo":42} | {"foo": 42}, {"bar": "42", "foo": 42} | {"bar": 42, "foo": "42"} | ejpet:match(<<"{\"foo\": 42}">>, "{\"foo\":42}").|
| {_:{"foo": 42}, "bar": {_:#"bar"}} | {"neh": {"foo": 42}, "bar": {"nimp": "foobar"}} | {"neh": {"notfoo": 42}, "bar": {"nimp": "foobar"}} | ejpet:match(<<"{\"neh\": {\"foo\": 42}, \"bar\": {\"nimp\": \"foobar\"}}">>, "{_:{\"foo\": 42}, \"bar\": {_:#\"bar\"}}").|
Lists
| Expression | Match | No match | Code snippet |
——————|——————————————|————————————-|—–
| ["42"] | ["42"] | {"bar": "42"}, {"foo": 42}, [42], ["42", "42"] | ejpet:match(<<"[\"42\"]">>, "[\"42\"]"). |
| [*, "42"] | ["42"], ["42", "42"], [true, "42"] | {"bar": "42"}, {"foo": 42}, [42], ["42", true] | ejpet:match(<<"[true, \"42\"]">>, "[*, \"42\"]"). |
| [*, "42", *] | ["42"], ["42", "42"], [true, "42"], ["42", true], [{}, "42", true] | {"bar": "42"}, {"foo": 42}, [42] | ejpet:match(<<"[true, \"42\", {}]">>, "[*, \"42\", *]"). |
| [[42]] | [[42]] | [42], [[42], 42] | ejpet:match(<<"[[42]]">>, "[[42]]"). |
| [*, [42]] | [[42]], ["42", [42]] | [[42], 42] | ejpet:match(<<"[\"42\", [42]]">>, "[*, [42]]"). |
| [[42], *] | [[42]], [[42], 42] | ["42", [42]] | ejpet:match(<<"[[42], \"42\"]">>, "[[42], *]"). |
Value sets (lists or object value set)
| Expression | Match | No match | Code snippet |
——————|——————————————|————————————-|—–
| <42> | [42], {"key": 42} | 42, "42" | ejpet:match(<<"{\"key\": 42}">>, "<42>"). |
| <"42"> | ["42"], {"bar": "42"}, [42, "42"], ["42", 42] | [42], {"bar": 47}, {"foo": 42} | ejpet:match(<<"{\"bar\": \"42\"}">>, "<\"42\">"). |
| <!"42"!> | ["42"], [true, "42"], ["foo", ["42", true], {}], [{}, {"foo": "42"}, true], {"bar": "42"}, {"bar": {"foo": "42"}} | "42", {"foo": 42}, [42] | ejpet:match(<<"[true, [null, {\"foo\": \"42\"}, \"bar\"], {}]">>, "<!\"42\"!>"). |
| <!<!"42"!>!> | [["42"]], [{}, {"foo": "42"}, true], {"bar": {"foo": "42"}} | ["42"], {"bar": "42"} | ejpet:match(<<"[{\"foo\":\"42\"}]">>, "<!<!\"42\"!>!>"). |
Captures
| Expression | Test | Capture(s) | Code snippet |
—|—|—|—-
| <!(?<subnode>{_:42})!> | [{"foo": null}, {"foo": 42, "bar": {}}] | subnode: [{"foo":42,"bar":{}}] | ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "<!(?<subnode>{_:42})!>").
| (?<all><!(?<subnode>{_:42})!>) | [{"foo": null}, {"foo": 42, "bar": {}}] | all: [[{"foo":null},{"foo":42,"bar":{}}]],subnode: [{"foo":42,"bar":{}}] | ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "(?<all><!(?<subnode>{_:42})!>)").
Global captures
| Expression | Test | Capture(s) | Code snippet |
—|—|—|—-
|<(?<node>{\"codec\":_, \"lang\":(?<lang>_)})>/g|[{"codec": "audio", "lang": "fr"}, {"codec": "video", "lang": "en"}, {"codec": "foo", "lang": "it"}]|node: [{"codec":"audio","lang":"fr"}, {"codec":"video","lang":"en"}, {"codec":"foo","lang":"it"}] lang: ["fr", "en", "it"]| ejpet:match(<<"[{\"codec\": \"audio\", \"lang\": \"fr\"}, {\"codec\":\"video\", \"lang\": \"en\"}, {\"codec\": \"foo\", \"lang\": \"it\"}]">>, <<"<(?<node>{\"codec\":_, \"lang\":(?<lang>_)})>/g">>)
Injections
| Expression | Test | parameters | Capture(s) | Code snippet |
—|—|—|—|—
| <(?<subnode>(!<what>number))> | [41, 42, 43] | [{<<"what">>, 42}] | subnode: [42] | ejpet:match(<<"[41, 42, 43]">>, "<(?<subnode>(!<what>number))>", [], [{<<"what">>, 42}]).
Notes
In arrays above, captured values are expressed as “abstract JSON node”, for illustration purpose. As explained previously, actual capture result depends on the API function used, and may be:
-
serialized JSON nodes (as in the “Code snippet” column), with
ejpet:match()
1> ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "(?<all><!(?<subnode>{_:42})!>)").
{true,<<"{\"all\":[[{\"foo\":null},{\"foo\":42,\"bar\":{}}]],\"subnode\":[{\"foo\":42,\"bar\":{}}]}">>}-
(jsx | jiffy | mochijson2) JSON value, depending on the backend, for easier further processing, with
ejpet:run()
1> JSX = ejpet:compile("(?<all><!(?<subnode>{_:42})!>)", jsx, []).
{ejpet,jsx,#Fun<ejpet_jsx_generators.19.98422695>}
2> ejpet:run((ejpet:backend(JSX)):decode(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>), JSX).
{true,[{"all",
[[[{<<"foo">>,null}],[{<<"foo">>,42},{<<"bar">>,[{}]}]]]},
{"subnode",[[{<<"foo">>,42},{<<"bar">>,[{}]}]]}]}
39> Mochi = ejpet:compile("(?<all><!(?<subnode>{_:42})!>)", mochijson2, []).
{ejpet,mochijson2,
#Fun<ejpet_mochijson2_generators.19.110863078>}
40> ejpet:run((ejpet:backend(Mochi)):decode(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>), Mochi).
{true,{struct,[{<<"all">>,
[[{struct,[{<<"foo">>,null}]},
{struct,[{<<"foo">>,42},{<<"bar">>,{struct,[]}}]}]]},
{<<"subnode">>,
[{struct,[{<<"foo">>,42},{<<"bar">>,{struct,[]}}]}]}]}}