The gpb is a compiler for Google protocol buffer definitions files for Erlang.

See https://developers.google.com/protocol-buffers/ for further information on the Google protocol buffers.

Basic example of using gpb

Let's say we have a protobuf file, x.proto

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}

We can generate code for this definition in a number of different ways. Here we use the command line tool. For info on integration with rebar, see further down.

# .../gpb/bin/protoc-erl -I. x.proto

Now we've got x.erl and x.hrl. First we compile it and then we can try it out in the Erlang shell:

# erlc -I.../gpb/include x.erl
# erl
Erlang/OTP 19 [erts-8.0.3] [source] [64-bit] [smp:12:12] [async-threads:10] [kernel-poll:false]
Eshell V8.0.3 (abort with ^G)
1> rr("x.hrl").
['Person']
2> x:encode_msg(#'Person'{name="abc def", id=345, email="a@example.com"}).
<<10,7,97,98,99,32,100,101,102,16,217,2,26,13,97,64,101,
120,97,109,112,108,101,46,99,111,109>>
3> Bin = v(-1).
<<10,7,97,98,99,32,100,101,102,16,217,2,26,13,97,64,101,
120,97,109,112,108,101,46,99,111,109>>
4> x:decode_msg(Bin, 'Person').
#'Person'{name = "abc def",id = 345,email = "a@example.com"}

In the Erlang shell, the rr("x.hrl") reads record definitions, and the v(-1) references a value one step earlier in the history.

Mapping of protocol buffer datatypes to erlang

Protobuf typeErlang type
double, floatfloat() | infinity | '-infinity' | nan
When encoding, integers, too, are accepted
int32, int64
uint32, uint64
sint32, sint64
fixed32, fixed64
sfixed32, sfixed64
integer()
booltrue | false
When encoding, the integers 1 and 0, too, are accepted
enumatom()
unknown enums decode to `integer()`
messagerecord (thus tuple())
or map() if the maps (-maps) option is specified
stringunicode string, thus list of integers
or binary() if the strings_as_binaries (-strbin) option is specified
When encoding, iolists, too, are accepted
bytesbinary()
When encoding, iolists, too, are accepted
oneof{ChosenFieldName, Value}
map<_,_>An unordered list of 2-tuples, [{Key,Value}]
or a map, if the maps (-maps) option is specified

Repeated fields are represented as lists.

Optional fields are represented as either the value or undefined if not set. However, for maps, if the option maps_unset_optional is set to omitted, then unset optional values are omitted from the map, instead of being set to undefined.

Examples of Erlang format for protocol buffer messages

Repeated and required fields

message m1 {
repeated uint32 i = 1;
required bool b = 2;
required eee e = 3;
required submsg sub = 4;
}
message submsg {
required string s = 1;
required bytes b = 2;
}
enum eee {
INACTIVE = 0;
ACTIVE = 1;
}
Corresponding Erlang
#m1{i = [17, 4711],
b = true,
e = 'ACTIVE',
sub = #submsg{s = "abc",
b = <<0,1,2,3,255>>}}
%% If compiled to with the option maps:
#{i => [17, 4711],
b => true,
e => 'ACTIVE',
sub => #{s => "abc",
b => <<0,1,2,3,255>>}}

Optional fields

message m2 {
optional uint32 i1 = 1;
optional uint32 i2 = 2;
}
Corresponding Erlang
#m2{i1 = 17} % i2 is implicitly set to undefined
%% With the maps option
#{i1 => 17,
i2 => undefined}
%% With the maps option and the maps_unset_optional set to omitted:
#{i1 => 17}

Oneof fields

This construct first appeared in Google protobuf version 2.6.0.

message m3 {
oneof u {
int32 a = 1;
string b = 2;
}
}
Corresponding Erlang

A oneof field is automatically always optional.

#m3{u = {a, 17}}
#m3{u = {b, "hello"}}
#m3{} % u is implicitly set to undefined
%% With the maps option
#{u => {a, 17}}
#{u => {b, "hello"}}
#{u => undefined} % If maps_unset_optional = present_undefined (default)
#{} % With the maps_unset_optional set to omitted

Map fields

Not to be confused with Erlang maps. This construct first appeared in Google protobuf version 3.0.0 (for both the proto2 and the proto3 syntax)

message m4 {
map<uint32,string> f = 1;
}
Corresponding Erlang

For records, the order of items is undefined when decoding.

#m4{f = []}
#m4{f = [{1, "a"}, {2, "b"}, {13, "hello"}]}
%% With the maps option
#{f => #{}}
#{f => #{1 => "a", 2 => "b", 13 => "hello"}}

Unset optionals and the default option

For proto2 syntax

This describes how decoding works for optional fields that are not present in the binary-to-decode.

The documentation for Google protobuf says these decode to the default value if specified, or else to the field's type-specific default. The code generated by Google's protobuf compiler also contains has_<field>() methods so one can examine whether a field was actually present or not.

However, in Erlang, the natural way to set and read fields is to just use the syntax for records (or maps), and this leaves no good way to at the same time both convey whether a field was present or not and to read the defaults.

So the approach in gpb is that you have to choose: either or. Normally, it is possible to see whether an optional field is present or not, eg by checking if the value is undefined. But there are options to the compiler to instead decode to defaults, in which case you lose the ability to see whether a field is present or not. The options are defaults_for_omitted_optionals and type_defaults_for_omitted_optionals, for decoding to default=<x> values, or to type-specific defaults respectively.

It works this way:

message o1 {
optional uint32 a = 1 [default=33];
optional uint32 b = 2; // the type-specific default is 0
}

Given binary data <<>>, that is, neither field a nor b is present, then the call decode_msg(Input, o1) results in:

#o1{a=undefined, b=undefined} % None of the options
#o1{a=33, b=undefined} % with option defaults_for_omitted_optionals
#o1{a=33, b=0} % with both defaults_for_omitted_optionals
% and type_defaults_for_omitted_optionals
#o1{a=0, b=0} % with only type_defaults_for_omitted_optionals

The last of the alternatives is perhaps not very useful, but still possible, and implemented for completeness.

Google's Reference

For proto3 syntax

For proto3, there is neither required nor optional nor default=<x> for fields. Instead all fields are implicitly optional, and if missing in the binary to decode, they always decode to the type-specific default value. Also, it is not possible to determine whether a value was present---with a type-specific value---or not; no has_<field>() methods are generated (at least for scalars). If you need detection of "missing" data, you must define has_<field> boolean fields and set them appropriately.

This maps directly and naturally to Erlang.

Features of gpb

Interaction with rebar

For info on how to use gpb with rebar3, see https://www.rebar3.org/docs/using-available-plugins#section-using-gpb

In rebar there is support for gpb since version 2.6.0. See the proto compiler section of rebar.sample.config file at https://github.com/rebar/rebar/blob/master/rebar.config.sample

For older versions of rebar---prior to 2.6.0---the text below outlines how to proceed:

Place the .proto files for instance in a proto/ subdirectory. Any subdirectory, other than src/, is fine, since rebar will try to use another protobuf compiler for any .proto it finds in the src/ subdirectory. Here are some some lines for the rebar.config file:

%% -*- erlang -*-
{pre_hooks,
[{compile, "mkdir -p include"}, %% ensure the include dir exists
{compile,
"/path/to/gpb/bin/protoc-erl -I`pwd`/proto"
"-o-erl src -o-hrl include `pwd`/proto/*.proto"
}]}.
{post_hooks,
[{clean,
"bash -c 'for f in proto/*.proto; "
"do "
" rm -f src/$(basename $f .proto).erl; "
" rm -f include/$(basename $f .proto).hrl; "
"done'"}
]}.
{erl_opts, [{i, "/path/to/gpb/include"}]}.

Performance

Here is a comparison between gpb (interpreted by the erlang vm) and the C++, Python and Java serializers/deserializers of protobuf-2.6.1rc1

[MB/s] | gpb |pb/c++ |pb/c++ | pb/c++ | pb/py |pb/java| pb/java|
| |(speed)|(size) | (lite) | |(size) | (speed)|
--------------+-------+-------+-------+--------+-------+-------+--------+
small msgs | | | | | | | |
serialize | 52 | 1240 | 85 | 750 | 6.5 | 68 | 1290 |
deserialize | 63 | 880 | 85 | 950 | 5.5 | 90 | 450 |
--------------+-------+-------+-------+--------+-------+-------+--------+
large msgs | | | | | | | |
serialize | 36 | 950 | 72 | 670 | 4.5 | 55 | 670 |
deserialize | 54 | 620 | 71 | 480 | 4.0 | 60 | 360 |
--------------+-------+-------+-------+--------+-------+-------+--------+

The performances are measured as number of processed MB/s, serialized form. Higher values means better performance.

The benchmarks are run with small and large messages (228 and 84584 bytes, respectively, in serialized form)

The Java benchmark is run with optimization both for code size and for speed. The Python implementation cannot optimize for speed.

SW: Python 2.7.11, Java 1.8.0_77 (Oracle JDK), Erlang/OTP 18.3, g++ 5.3.1
Linux kernel 4.4, Debian (in 64 bit mode), protobuf-2.6.1rc1
HW: Intel Core i7 5820k, 3.3GHz, 6x256 kB L2 cache, 15MB L3 cache
(CPU frequency pinned to 3.3 GHz)

The benchmarks are all done with the exact same messages files and proto files. The source of the benchmarks was found in the Google protobuf's svn repository. The gpb does not support groups, but the benchmarks in the protobuf used groups, so I converted the google_message*.dat to use sub message structures instead. For protobuf, that change was only barely noticeable.

For performance, the generated Erlang code avoids creating sub binaries as far as possible. It has to for sub messages, strings and bytes, but for the rest of the types, it avoids creating sub binaries, both during encoding and decoding (for info, compile with the bin_opt_info option)

The Erlang code ran in the smp emulator, though only one CPU core was utilized.

The generated C++ core was compiled with -O3.

Version numbering

The gpb version number is fetched from the git latest git tag matching N.M where N and M are integers. This version is inserted into the gpb.app file as well as into the include/gpb_version.hrl. The version is the result of the command

git describe --always --tags --match '[0-9]*.[0-9]*'

Thus, to create a new version of gpb, the single source from where this version is fetched, is the git tag. (If you are importing gpb into another version control system than git, or using another build tool than rebar, you might have to adapt rebar.config and src/gpb.app.src accordingly.)

The version number on the master branch of the gpb on github is intended to always be only integers with dots, in order to be compatible with reltool. In other words, each push to github is considered a release, and the version number is bumped. To ensure this, there is a pre-push git hook and two scripts, install-git-hooks and tag-next-minor-vsn, in the helpers subdirectory. The ChangeLog file will not necessarily reflect all minor version bumps, only important updates.

Places to update when making a new version:

Contributing

Contributions are welcome, preferably as pull requests or git patches or git fetch requests. Here are some guide lines: