Reducing IoT data bills using Serialization & Compression - Part 1
The Internet of Things (IoT) is a rapidly growing field. Industry projections anticipate that by 2025, IoT devices will generate an astonishing 79.4 zettabytes (ZB) of data. To put this in perspective, On a cellular network the data cost would cross trillion of dollars. This has led to a need for efficient data transmission.
Serialization and compression techniques can help us to reduce data size by up to 80%. Thus, saving a lot on your IoT bills. In part 1 of this multipart guide, We will explain Serialization and compression algorithm. We will go through their examples and their importance in IoT. So let's get started.
What is Serialization?
Binary serialization is the process of converting data into a compact binary format that can be efficiently transmitted over a network or stored in a file. Contrary to text-based formats like JSON or XML, which can be more verbose and less efficient.
For instance, a JSON file can be transformed into a binary file that takes up less storage space and can be transmitted faster. For example, a JSON file containing data of 1MB can be serialized into a binary format of 300KB, resulting in a 70% reduction in storage space. To understand better let's take an example of a serialization format called protobuf.
json string = '{"light_state_1": 0, "temp": "27.7", "light_state_2": 0, "sensor_profile": 5, "lux": "652.5", "battery": 100, "humid": "49.8", "device_id": "ABC123467890", "timestamp": 1683059017}'
JSON object size: 229 bytes
The original JSON object size comes out to be 229 bytes.
Protobuf message size: 43 bytes
Contrary to the original JSON object, The size of the Protobuf message comes out to be 43 bytes. That’s around an 80.6% reduction in the data size
Importance of Serialization in IoT
Binary serialization is particularly important because it can help to reduce the amount of data that needs to be transmitted, improve data transmission efficiency, and reduce bandwidth usage. Additionally, Serialization also helps with reducing latency, high throughput, and larger speed of transmission. IoT systems consume most of the power while making network connections and transmission. It can help in reducing battery consumption, by effectively serializing IoT data. Binary serialization can help to ensure that data can be easily exchanged between different devices and platforms. By using a common binary serialization format, IoT devices can communicate more easily with each other and with cloud-based services.
Examples of Binary Serialization Formats in IoT Applications
Serialization formats are broadly classified into two parts schemaless serialization and schema-based Serialization
Schemaless Serialization
Schemaless serialization is a type of binary serialization where the data being serialized does not have a predefined schema or structure. Instead, the serializer can dynamically determine the structure of the data at runtime, allowing for greater flexibility and adaptability.
Examples of Schemaless Serialization
- JSON: JSON (JavaScript Object Notation) is a lightweight data-interchange format.
- BSON: BSON (Binary JSON) is a binary-encoded serialization of JSON-like documents
- CBOR: Concise Binary Object Representation (CBOR) is a schemaless data format with a relatively small message size.
- Message Pack: MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON.
- Pickel: Pickel binary protocols for serializing and de-serializing a Python object structure.
- JSON: JSON (JavaScript Object Notation) is a lightweight data-interchange format.
- BSON: BSON (Binary JSON) is a binary-encoded serialization of JSON-like documents
- CBOR: Concise Binary Object Representation (CBOR) is a schemaless data format with a relatively small message size.
- Message Pack: MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON.
- Pickel: Pickel binary protocols for serializing and de-serializing a Python object structure.
Schema Based Serialization
schema-based serialization is a type of binary serialization where the structure and layout of the data being serialized are defined by a pre-defined schema. The schema describes the type, format, and structure of the data, and the serializer uses this schema to encode the data in a binary format.
Examples of Schema-based Serialization
- Cap’n Proto: Cap'n Proto is a data serialization format and Remote Procedure Call (RPC) framework
- FlatBuffers: Flat buffer is an efficient cross-platform serialization library developed by Google
- Protocol Buffers: Protocol Buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.
What are Compression Algorithms?
Compression algorithms are a set of techniques that can be used to reduce data size by removing redundant information while maintaining its essential characteristics.
Examples of Compression Algorithms in IoT
- LZ4: LZ4 is a lossless compression algorithm known for its high compression and decompression speeds. It is commonly used in real-time applications, where speed is a critical factor. LZ4 has a compression speed of 780 MB/s and a decompression speed of 4970 MB/s.
- Snappy: Snappy is a compression algorithm developed by Google that is designed for speed and efficiency. It is particularly well-suited to applications where high compression ratios are not essential, but fast processing times are crucial.
- Zlib: Zlib is a widely-used compression algorithm known for its high compression ratios. It is often used in applications where a high compression ratio is essential, such as when storing data on disk.
- Zstd: Zstd (short for "Zstandard") is a fast lossless compression algorithm developed by Facebook. Zstd aims to provide high compression ratios and fast processing times. It provides a better compression ratio as compared to other compression algorithms.
Let's take the example of Snappy
# Example JSON
input_json = '{"light_state_1": 0, "temp": "27.7", "light_state_2": 0, "sensor_profile": 5, "lux": "652.5", "battery": 100, "humid": "49.8", "device_id": "ABC123467890", "timestamp": 1683059017}'
JSON object size: 229 bytes
The original JSON object size comes out to be 229 bytes.
Compressed input size: 161 bytes
Compression time: 2.7418136596679688e-05 seconds
Decompressed input size: 180 bytes
Decompression time: 6.198883056640625e-06 seconds
So using snappy it takes around 20uS to compress 229 bytes of JSON and 60uS to decompress it.
Conclusion
In conclusion, the use of serialization and compression algorithms can greatly improve the efficiency of IoT systems. These techniques can improve network performance, reduce bandwidth requirements, enhance security, and enable more efficient data analytics.
We will be releasing part 2 of this guide shortly where we will compare different serialization formats about compression algorithms.
I hope you gained valuable insights from this comparison summary. As we continue to come up with more interesting tutorials and maker content, we invite you to explore Bytebeam further and see how our platform can empower your projects.
Stay tuned!