Tuesday 4 August 2015

Protocol Buffers & Lists

Introduction
For this example I will be using the following example .proto and looking at serializing large repeated data sets in C#.

.proto
 syntax = "proto2";  
   
 package pcdf;  
   
 message Boiler  
 {  
   required string BoilerId = 1;  
   required int32 TableNumber = 2;  
   required string TableName = 3;  
   required string BoilerData = 4;  
   required string BrandName = 5;  
   required string Qualifier = 6;  
   required string ModelName = 7;  
 }  
   
 message AllBoilers  
 {  
   repeated Boiler Boiler = 1;  
 }  

Serialization Code
 var listBuilder = new AllBoilers.Builder();  
   
 for (var i = 0; i < 7000; i++)  
 {  
   var builder = new Boiler.Builder  
   {  
     BoilerId = "ID",  
     TableNumber = 2,  
     TableName = "Name",  
     BoilerData = "Data",  
     BrandName = "Brand",  
     ModelName = "Model",  
     Qualifier = "Qualifier"  
   };  
   
   listBuilder.AddBoiler(builder);  
 }  
   
 var list = listBuilder.Build();  
 var bytes = list.ToByteArray();  
   
 var fileStream = new FileStream(OUTPUT_FILE, FileMode.Create);  
 fileStream.Write(bytes, 0, bytes.Length);  
For the actual tests real test data was used.

Deserialization Code
 var fileStream = new FileStream(OUTPUT_FILE, FileMode.Open);  
   
 var boilers = new AllBoilers.Builder();  
 boilers.MergeFrom(fileStream);  


Comparison
For a list of over 7k items Json.net serializes about 10ms slower on my machine, not quite what I expected from what is supposed to be a super fast binary serializer. Upon further reading it appears Protocol Buffers are not designed to handle large data sets, but this can be worked around by serializing the items individually.

Fix
To fix this we will serialize each item induvidually but add them all to the same stream with the length of the item added before each item.

Fixed Serialize
 var fileStream = new FileStream(OUTPUT_FILE, FileMode.Create);  
   
 for (var i = 0; i < 7000; i++)  
 {  
   var builder = new Boiler.Builder  
   {  
     BoilerId = "ID",  
     TableNumber = 2,  
     TableName = "Name",  
     BoilerData = "Data",  
     BrandName = "Brand",  
     ModelName = "Model",  
     Qualifier = "Qualifier"  
   };  
   
   var bytes = builder.Build().ToByteArray();  
   var intBytes = BitConverter.GetBytes(bytes.Length);  
   
   fileStream.Write(intBytes, 0, intBytes.Length);  
   fileStream.Write(bytes, 0, bytes.Length);  
 }  

Fixed Deserialize
 var fileStream = new FileStream(OUTPUT_FILE, FileMode.Open);  
   
 var list = new List<Boiler>();  
   
 int size;  
 while ((size = GetSize(fileStream)) != -1)  
 {  
   var bytes = new byte[size];  
   fileStream.Read(bytes, 0, bytes.Length);  
   
   var boiler = new Boiler.Builder();  
   boiler.MergeFrom(bytes);  
   
   list.Add(boiler.Build());  
 }  

This updated code resulted in a significant difference (160ms -> 25ms) between JSON.Net and Protocol Buffers.

References
https://developers.google.com/protocol-buffers/
https://developers.google.com/protocol-buffers/docs/techniques#large-data
https://github.com/google/protobuf

No comments:

Post a Comment