The gpb is a compiler for Google protocol buffer definitions files for Erlang.

See https://code.google.com/p/protobuf/ for further information on the Google protocol buffers.

Features of gpb

Performance

Here is a comparison between gpb (interpreted by the erlang vm) and the C++, Python and Java serializers/deserializers of protobuf-2.4.1

[MB/s]        | gpb   |pb/c++ |pb/c++ | pb/c++ | pb/py |pb/java| pb/java|
              |       |(speed)|(size) | (lite) |       |(size) | (speed)|
--------------+-------+-------+-------+--------+-------+-------+--------+
small msgs    |       |       |       |        |       |       |        |
  serialize   | 44.32 | 863.4 | 54.67 |  510.8 |  6.77 | 60.28 |  907.9 |
  deserialize | 59.62 | 534.1 | 52.89 |  532.5 |  5.75 | 81.36 |  337.2 |
--------------+-------+-------+-------+--------+-------+-------+--------+
large msgs    |       |       |       |        |       |       |        |
  serialize   | 28.76 | 667.9 | 45.71 |  396.5 |  4.71 | 58.28 |  562.9 |
  deserialize | 37.01 | 396.3 | 44.84 |  276.3 |  4.07 | 55.33 |  356.8 |
--------------+-------+-------+-------+--------+-------+-------+--------+

The performances are measured as number of processed MB/s, serialized form. Higher values means better performance.

The benchmarks are run with small and large messages (228 and 84584 bytes, respectively, in serialized form)

The Java benchmark is run with optimization both for code size and for speed. The Python implementation cannot optimize for speed.

SW: Python 2.7.9, Java 1.7.0_75 (OpenJDK), Erlang/OTP 17.3, g++ 4.9.2
    Linux kernel 3.16, Debian (in 32 bit mode), protobuf-2.6.1,
HW: Intel Core i7 5820k, 3.3GHz, 6x256 kB L2 cache, 15MB L3 cache
    (CPU frequency pinned to 3.3 GHz)

The benchmarks are all done with the exact same messages files and proto files. The source of the benchmarks was found in the Google protobuf’s svn repository. The gpb does not support groups, but the benchmarks in the protobuf used groups, so I converted the google_message*.dat to use sub message structures instead. For protobuf, that change was only barely noticable.

For performance, the generated Erlang code avoids creating sub binaries as far as possible. It has to for sub messages, strings and bytes, but for the rest of the types, it avoids creating sub binaries, both during encoding and decoding (for info, compile with the bin_opt_info option)

The Erlang code ran in the smp emulator, though only one CPU core was utilized.

The generated C++ core was compiled with -O3.

Mapping of protocol buffer datatypes to erlang

.proto type           Erlang type
----------------------------------------------------------------
double, float         floating point number
                      when encoding, integers, too, are accepted
----------------------------------------------------------------
int32, int64,
uint32, uint64,
sint32, sint64,
fixed32, fixed64,
sfixed32, sfixed64    integer
----------------------------------------------------------------
bool                  true | false
----------------------------------------------------------------
enum                  atom
----------------------------------------------------------------
message               record (thus tuple)
----------------------------------------------------------------
string                unicode string, thus list of integers
----------------------------------------------------------------
bytes                 binary
----------------------------------------------------------------
oneof                 {ChosenFieldName, Value}

Interaction with rebar

Place the .proto files for instance in a proto/ subdirectory. Any subdirectory, other than src/, is fine, since rebar will try to use another protobuf compiler for any .proto it finds in the src/ subdirectory. Here are some some lines for the rebar.config file:

%% -*- erlang -*-
{pre_hooks,
 [{compile, "mkdir -p include"}, %% ensure the include dir exists
  {compile,
   "/path/to/gpb/bin/protoc-erl -I`pwd`/proto"
   "-o-erl src -o-hrl include `pwd`/proto/*.proto"
  }]}.

{post_hooks,
 [{clean,
   "bash -c 'for f in proto/*.proto; "
   "do "
   "  rm -f src/$(basename $f .proto).erl; "
   "  rm -f include/$(basename $f .proto).hrl; "
   "done'"}
 ]}.

{erl_opts, [{i, "/path/to/gpb/include"}]}.

Version numbering

The gpb version number is fetched from the git latest git tag matching N.M where N and M are integers. This version is inserted into the gpb.app file as well as into the include/gpb_version.hrl. The version is the result of the command

git describe –always –tags –match ‘[0-9].[0-9]

Thus, to create a new version of gpb, the single source from where this version is fetched, is the git tag. (If you are importing gpb into another version control system than git, or using another build tool than rebar, you might have to adapt rebar.config and src/gpb.app.src accordingly.)

The version number of the gpb on github is intended to always be only integers with dots, in order to be compatible with reltool. In other words, each push to github is considered a release, and the version number is bumped. To ensure this, there is a pre-push git hook and two scripts, install-git-hooks and tag-next-minor-vsn, in the helpers subdirectory. The ChangeLog file will not necessarily reflect all minor version bumps, only important updates.

Places to update when making a new version:

Contributing

Contributions are welcome, preferably as pull requests or git patches or git fetch requests. Here are some guide lines: