It’s been a while since my last post, where I explained how to build protobuf with zlib support. In this post I’ll explain how you can use zlib-based streams in C++ to serialize and deserialize protobuf messages. We’ll use a small Mona Lisa bitmap for this example, since its uncompressed format will compress well with zlib’s default Deflate lossless compression algorithm.
Take a look at Simverge/howto-protobuf-zlib in GitHub for a more complete implementation of this example.
Define a Protobuf Message Schema
First, let’s create a file called blob.proto that defines a simple Protobuf message to store an arbitrary blob of data along with a string identifying its source:
1 2 3 4 5 6 7 8 9 |
syntax = "proto3"; package simverge; message Blob { string source = 1; bytes data = 2; } |
Compile blob.proto with the Protobuf compiler (protoc) using the instructions from the official Protobuf C++ tutorial. For example, the following command will generate the blob.pb.cc C++ source file and the blob.pb.h C++ header file in the same directory as blob.proto:
1 |
protoc -I=. --cpp_out=. blob.proto |
You can instead use the FindProtobuf module if you are building with CMake, for example:
1 2 3 4 5 6 |
find_package(Protobuf REQUIRED) # Generated files will be in PROTOBUF_SOURCES and PROTOBUF_HEADERS protobuf_generate_cpp(PROTOBUF_SOURCES PROTOBUF_HEADERS blob.proto) # Remainder of CMake build script ... |
Once blob.proto is compiled into C++ files and blob.pb.cc is included in your build, you can include blob.pb.h to create a simverge::Blob Protobuf message, set its source and data fields, and determine the uncompressed Protobuf message size:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
#include "blob.pb.h" #include <fstream> #include <iostream> #include <string> int main(int argc, char **argv) { GOOGLE_PROTOBUF_VERIFY_VERSION; std::string source("Mona_Lisa.bmp"); std::ifstream in(source, std::ios::binary); if (in) { simverge::Blob blob; *blob.mutable_source() = source; auto data = blob.mutable_data(); // Compute stream length to reserve bytes in blob's data field in.unsetf(std::ios::skipws); in.seekg(0, std::ios::end); data->reserve(in.tellg()); in.seekg(0, std::ios::beg); data->assign(std::ifstreambuf_iterator<char>(in), std::ifstreambuf_iterator<char>()); in.close(); std::cout << "Created Protobuf message with " << data->size() << " bytes from " << source << std::endl; std::cout << "Total uncompressed Protobuf size is " << blob.ByteSizeLong() << " bytes" << std::endl; } return 0; } |
Running this program reveals the size of the Mona Lisa bitmap and the Protobuf message that wraps it:
1 2 |
Creating Protobuf message with 366414 bytes from Mona_Lisa.bmp Total uncompressed Protobuf size is 366433 bytes |
Compress and Write the Protobuf Message to Disk
We can then use a GzipOutputStream from the google::protobuf::io namespace to serialize and compress the message. The constructor for this class requires a pointer to a ZeroCopyOutputStream substream to process the compressed data. We will use a ArrayOutputStream to determine the size of the compressed data and then an std::ofstream to write the buffer disk. We will also check that there are no zlib errors before writing the compressed Protobuf message to disk:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
#include "blob.pb.h" #include <google/protobuf/io/zero_copy_stream_impl.h> #include <google/protobuf/io/zero_copy_stream_impl_lite.h> #include <google/protobuf/io/gzip_stream.h> #include <fstream> #include <iostream> #include <memory> #include <string> int main(int argc, char **argv) { GOOGLE_PROTOBUF_VERIFY_VERSION; std::string source("Mona_Lisa.bmp"); std::string target("Mona_Lisa.pbz"); std::ifstream in(source, std::ios::binary); if (in) { simverge::Blob blob; *blob.mutable_source() = source; auto data = blob.mutable_data(); // Compute stream length to reserve bytes in blob's data field in.unsetf(std::ios::skipws); in.seekg(0, std::ios::end); data->reserve(in.tellg()); in.seekg(0, std::ios::beg); data->assign(std::ifstreambuf_iterator<char>(in), std::ifstreambuf_iterator<char>()); in.close(); std::cout << "Created new Protobuf message with " << data->size() << " bytes read from " << source << std::endl; auto uncompressedBytes = blob.ByteSizeLong(); std::cout << "Total uncompressed Protobuf size is " << uncompressedBytes << " bytes" << std::endl; std::unique_ptr<char[]> buffer(new char[uncompressedBytes]); google::protobuf::io::ArrayOutputStream aos(buffer.get(), (int) uncompressedBytes); google::protobuf::io::GzipOutputStream gos(&aos); if (blob.SerializeToZeroCopyStream(&gos)) { gos.Close(); if (gos.ZlibErrorCode() > 0) { auto compressedBytes = aos.ByteCount(); std::ofstream out(target, std::ios::binary); out.write(buffer.get(), compressedBytes); std::cout << "Wrote compressed Protobuf message of size " << compressedBytes << " bytes (" << (100.0 * (uncompressedBytes - compressedBytes) / uncompressedBytes) << "% compression ratio): " << target << std::endl; } } } return 0; } |
Running the updated program reveals that the default GzipOutputStream settings yield roughly a 22.6% compression ratio on the Mona Lisa bitmap. You can tweak the zlib compression settings by passing a GzipOutputStream::Options object to the GzipOutputStream constructor.
1 2 3 4 5 |
Creating Protobuf message with 366414 bytes from Mona_Lisa.bmp Total uncompressed Protobuf size is 366433 bytes Wrote compressed Protobuf message of size 283638 bytes (22.5949% compression ratio): Mona_Lisa.pbz |
A more straightforward way to serialize the compressed data is to use an OstreamOutputStream or a FileOutputStream instead of a ArrayOutputStream as the substream.
Read and Decompress Protobuf Message from Disk
We will use the GzipInputStream class from the google::protobuf::io namespace to read and decompress the message. The constructor for this class requires a pointer to a ZeroCopyInputStream substream to read the data before decompression. We will use an IstreamInputStream as the substream (which requires a pointer to an std::istream) since we already used an array-based stream on the output example. Just like in the output example, we will check if there are any zlib errors before processing the decompressed data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
#include "blob.pb.h" #include <google/protobuf/io/zero_copy_stream_impl.h> #include <google/protobuf/io/zero_copy_stream_impl_lite.h> #include <google/protobuf/io/gzip_stream.h> #include <fstream> #include <iostream> #include <string> int main(int argc, char **argv) { std::string protobufPath("Mona_Lisa.pbz"); std::string bitmapPath("Mona_Lisa_decompressed.bmp"); std::ifstream in(protobufPath, std::ios::binary); if (in) { in.unsetf(std::ios::skipws); in.seekg(0, std::ios::end); std::cout << "Read file with " << in.tellg() << " bytes: " << protobufPath << std::endl; in.seekg(0, std::ios::beg); google::protobuf::io::IstreamInputStream iss(&in); google::protobuf::io::GzipInputStream gis(&iss); simverge::Blob blob; if (blob.ParseFromZeroCopyStream(&gis) && gis.ZlibErrorCode() > 0) { std::cout << "Decompressed and parsed Protobuf message: " << blob.ByteSize() << " bytes" << std::endl; std::cout << "Message contains " << blob.data().size() << " bytes from " << blob.source() << std::endl; std::ofstream out(bitmapPath, std::ios::binary); out.write(blob.data().c_str(), blob.data().size());. std::cout << "Wrote Protobuf message data to " << bitmapPath << std::endl; } } return 0; } |
Running this program shows that the decompressed and parsed Protobuf message matches the one reported by the writer, as well as the bitmap data size.
1 2 3 4 5 6 |
Read file with 283638 bytes: Mona_Lisa.pbz Decompressed and parsed Protobuf message: 366433 bytes Message contains 366414 bytes from Mona_Lisa.bmp Wrote Protobuf message data to Mona_Lisa_decompressed.bmp |
This work by Simverge Software LLC is licensed under a Creative Commons Attribution 4.0 International License.