Fast, spec-compliant CBOR encoding and decoding for mruby.
CBOR (Concise Binary Object Representation, RFC 8949) is a binary data format like JSON, msgpack, or Protocol Buffers. It's small, fast, and human-readable in diagnostic notation. Use it when you need:
- Wire efficiency: Encode numbers, arrays, and maps in the fewest bytes possible
- Language-agnostic data exchange: Send structs between mruby and Python, Node.js, Go, Rust, etc.
- Self-describing types: Distinguish between integers and floats, byte strings and text strings
- Recursive data structures: Encode cyclic arrays and shared references without duplication
- Custom types: Extend CBOR with application-defined tags for domain objects
This gem gives you ~30% faster encoding than msgpack, 1.3–3× faster selective decoding than simdjson (lazy mode), and zero security surprises—depth limits, overflow protection, and deterministic behavior by default.
- Quick Start
- Installation
- Core Features
- Usage Examples
- Advanced Topics
- Determinism Guarantees
- Performance & Tuning
- Error Handling
- Specification & Interoperability
# Encode
data = { "users" => [{ "id" => 1, "name" => "Alice" }], "count" => 1 }
buffer = CBOR.encode(data)
puts "Encoded #{buffer.bytesize} bytes"
# Decode (eager)
result = CBOR.decode(buffer)
p result # => {"users"=>[{"id"=>1, "name"=>"Alice"}], "count"=>1}
# Decode (lazy — access only what you need)
lazy = CBOR.decode_lazy(buffer)
first_user_name = lazy["users"][0]["name"].value
puts "First user: #{first_user_name}"
# Shared references (deduplication + cyclic structures)
shared = [1, 2, 3]
obj = { "x" => shared, "y" => shared }
buf = CBOR.encode(obj, sharedrefs: true)
result = CBOR.decode(buf)
result["x"].equal?(result["y"]) # => true (same object)Add to your mruby build_config.rb:
MRuby::Build.new do |conf|
# ... other config ...
conf.gem mgem: "mruby-cbor"
endThen build:
cd /path/to/mruby
rake compileTo run tests:
rake test| Type | Support | Notes |
|---|---|---|
| Integers | ✅ Full | Fixnum + Bigint (Tag 2/3 RFC 8949) |
| Floats | ✅ Full | Preferred serialization: f16→f32→f64 (smallest lossless width) |
| Strings | ✅ Full | UTF-8 text and binary byte strings |
| Arrays | ✅ Full | Nested, empty, arbitrary size |
| Maps/Hashes | ✅ Full | Any keys, arbitrary nesting |
| Booleans & Nil | ✅ Full | true, false, nil |
| Symbols | ✅ Full | Three modes: strip, tag+string, tag+uint32 (presym) |
| Classes & Modules | ✅ Full | Auto round-trip via Tag 49999 |
| Custom Classes | ✅ Full | Via native_ext_type DSL or register_tag proc |
| Shared Objects | ✅ Full | Tag 28/29 for deduplication and cycles |
Both eager and lazy decoding operate directly on the input buffer without copying:
Eager decoding: Strings and byte strings reference the original buffer via views (no copy):
buf = CBOR.encode("hello world")
result = CBOR.decode(buf)
result.bytesize # => 11 (string doesn't own its data, references buf)Lazy decoding (CBOR::Lazy) wraps the wire buffer without decoding:
lazy = CBOR.decode_lazy(huge_buffer)
field = lazy["metadata"]["version"].value # Parse only what you accessConstant-time repeated access via built-in caching. No intermediate allocations.
Same input → same output, always. Encoding is deterministic:
- Float width determined by bit-pattern, not floating-point rounding
- Hash field order follows insertion order (mruby's hash implementation)
- NaN always encodes as canonical
0xF97E00(quiet f16 NaN per RFC 8949) - Negation of large bignums computed in integer-only arithmetic, no FP edge cases
See Determinism Guarantees for full details.
- Recursion depth limits (configurable, profile-dependent default 32–128)
- Integer overflow protection (explicit bounds checks, no wraparound)
- Buffer bounds checking on every read (no out-of-bounds access)
- UTF-8 validation (required for text strings; binary strings are untouched)
- Type safety (schema-based validation for registered tags)
- No indefinite-length items (use streaming instead if needed)
Decode CBOR sequences from strings, files, or sockets:
# From a string of concatenated CBOR documents
CBOR.stream(buf) { |doc| process(doc.value) }
# From a file (with readahead for large documents)
File.open("data.cbor") { |f| CBOR.stream(f) { |doc| ... } }
# From a socket (incremental recv loop)
CBOR.stream(socket) { |doc| ... }# Integers
CBOR.decode(CBOR.encode(42)) # => 42
CBOR.decode(CBOR.encode(-1)) # => -1
CBOR.decode(CBOR.encode(2**100)) # => 1267650600228229401496703205376 (bigint)
# Floats
CBOR.decode(CBOR.encode(1.5)) # => 1.5
CBOR.decode(CBOR.encode(Float::INFINITY)) # => Infinity
CBOR.decode(CBOR.encode(Float::NAN)) # => NaN
# Strings
CBOR.decode(CBOR.encode("hello")) # => "hello"
CBOR.decode(CBOR.encode("Ñoño")) # => "Ñoño" (UTF-8 preserved)
# Arrays & nested structures
data = [1, [2, [3, 4]], "text"]
CBOR.decode(CBOR.encode(data)) == data # => true
# Maps
h = { "x" => 10, "y" => { "nested" => true } }
CBOR.decode(CBOR.encode(h)) == h # => trueParse only what you access. Perfect for large documents where you only need a few fields:
# Wire buffer (not decoded yet)
lazy = CBOR.decode_lazy(big_payload)
# Navigate by chaining `[]` or `dig`:
value = lazy["response"]["data"][0]["id"].value
# Only the path you access is decoded; rest stays compressed
# `dig` is safe (returns nil for missing keys):
status = lazy.dig("response", "data", "status").value
status = lazy.dig("nonexistent", "path").value # => nil
# Repeated access uses cache (O(1) after first access):
id1 = lazy["response"]["data"][0]["id"].value
id2 = lazy["response"]["data"][0]["id"].value # Same object, no re-parse
id1.equal?(id2) # => true
# Negative array indices work:
last = lazy["items"][-1].value # Last itemPerformance: O(n) only in skipped elements, not the full document. For a 10 MB payload where you access 1% of fields, lazy decoding pays off immediately.
Eliminate duplication. Represent cycles without infinite loops:
# Two variables pointing to the same array
shared_array = [1, 2, 3]
obj = { "x" => shared_array, "y" => shared_array }
# Encode with deduplication (Tag 28/29)
buf = CBOR.encode(obj, sharedrefs: true)
# Decode: identity is preserved
decoded = CBOR.decode(buf)
decoded["x"].equal?(decoded["y"]) # => true ✓ Same object
# Cyclic structures (array containing itself)
cyclic = []
cyclic << cyclic
buf = CBOR.encode(cyclic, sharedrefs: true)
result = CBOR.decode(buf)
result.equal?(result[0]) # => true (self-referential)
# Works with lazy decoding too:
lazy = CBOR.decode_lazy(buf)
decoded = lazy.value
decoded.equal?(decoded[0]) # => trueHow it works: First occurrence is tagged with Tag 28 (shareable). Subsequent references use Tag 29 (shared ref) with an index into the shareable table. On decode, the index is resolved to the already-decoded object, preserving identity.
Define a schema for your classes using the native_ext_type DSL:
class Address
native_ext_type :@street, String
native_ext_type :@city, String
native_ext_type :@zip, String
end
class Person
native_ext_type :@name, String
native_ext_type :@age, Integer
native_ext_type :@address, Address # Nested class!
native_ext_type :@active, TrueClass, FalseClass # Multiple types OK
def _after_decode
puts "Person #{@name} loaded"
self
end
def _before_encode
@age += 1 if @age < 18 # Modify before encoding
self
end
end
# Register with a tag number
CBOR.register_tag(1000, Person)
CBOR.register_tag(1001, Address)
# Encode
addr = Address.new
addr.instance_variable_set(:@street, "Main St")
addr.instance_variable_set(:@city, "Berlin")
addr.instance_variable_set(:@zip, "10115")
person = Person.new
person.instance_variable_set(:@name, "Alice")
person.instance_variable_set(:@age, 30)
person.instance_variable_set(:@address, addr)
person.instance_variable_set(:@active, true)
encoded = CBOR.encode(person)
# Decode (hooks are called automatically)
decoded = CBOR.decode(encoded) # _after_decode called
decoded.instance_variable_get(:@name) # => "Alice"
decoded.instance_variable_get(:@address).instance_variable_get(:@city) # => "Berlin"Type constraints use standard Ruby classes: String, Integer, Float, Array, Hash, TrueClass, FalseClass, NilClass, and any registered class. Inheritance works: Numeric matches both Integer and Float.
Security: Only declared ivars are populated (allowlist model). Extra fields in the payload are silently ignored.
Decode CBOR sequences (multiple concatenated documents):
# From a string
buf = CBOR.encode("hello") + CBOR.encode("world") + CBOR.encode([1,2,3])
results = []
CBOR.stream(buf) { |doc| results << doc.value }
# => ["hello", "world", [1,2,3]]
# As an enumerator (no block)
docs = CBOR.stream(buf).map(&:value)
# From a file (with intelligent readahead)
File.open("data.cbor", "rb") do |f|
CBOR.stream(f) { |doc| process(doc.value) }
end
# From a socket (event-driven)
socket = TCPSocket.new("host", port)
CBOR.stream(socket) { |doc| handle(doc.value) }
# Manual socket handling (for async frameworks)
decoder = CBOR::StreamDecoder.new { |doc| handle(doc.value) }
while chunk = socket.recv(4096)
decoder.feed(chunk)
endDispatch rules: Automatically detects string (via bytesize/byteslice), file (via seek/read), or socket (via recv), and handles buffering transparently.
Human-readable output for debugging and logging (RFC 8949 §8.1):
# Integers and basic types
CBOR.diagnose(CBOR.encode(1)) # => "1"
CBOR.diagnose(CBOR.encode(true)) # => "true"
CBOR.diagnose(CBOR.encode(nil)) # => "null"
# Floats with width suffix (_1=f16, _2=f32, _3=f64)
CBOR.diagnose(CBOR.encode(1.0)) # => "1.0_1" (f16, 3 bytes)
CBOR.diagnose(CBOR.encode(1.0e10)) # => "10000000000.0_2" (f32, 5 bytes)
CBOR.diagnose(CBOR.encode(3.14)) # => "3.14_3" (f64, 9 bytes)
# NaN and Infinity
CBOR.diagnose(CBOR.encode(Float::NAN)) # => "NaN_1"
CBOR.diagnose(CBOR.encode(Float::INFINITY)) # => "Infinity_1"
# Strings and bytes
CBOR.diagnose(CBOR.encode("hello")) # => '"hello"'
CBOR.diagnose(CBOR.encode("\x01\x02\x03".b)) # => "h'010203'"
# Containers
CBOR.diagnose(CBOR.encode([1, [2, 3]])) # => "[1,[2,3]]"
CBOR.diagnose(CBOR.encode({"a" => 1})) # => '{"a":1}'
# Tags
CBOR.diagnose("\xc1\x1a\x51\x4b\x67\xb0") # => "1(1363896240)"
# Indefinite-length items (from external sources)
CBOR.diagnose("\x9f\x01\x02\xff") # => "[_ 1,2]"
CBOR.diagnose("\xbf\x61a\x01\xff") # => "{_ \"a\":1}"MRB_NO_FLOAT builds: Use exact rational arithmetic for f16/f32 and hex-float notation for f64 (RFC 8610 Appendix G), avoiding floating-point operations entirely.
Three strategies, each with tradeoffs:
# Strategy 1: Strip symbols (default, no round-trip)
CBOR.no_symbols
sym = :hello
encoded = CBOR.encode(sym)
decoded = CBOR.decode(encoded) # => "hello" (lost the symbol!)
# Strategy 2: Tag 39 + string (RFC 8949, interoperable)
CBOR.symbols_as_string
sym = :hello
encoded = CBOR.encode(sym)
decoded = CBOR.decode(encoded) # => :hello (preserved!)
# Works with other implementations that recognize tag 39
# Strategy 3: Tag 39 + uint32 (mruby-only, fastest)
CBOR.symbols_as_uint32
sym = :hello
encoded = CBOR.encode(sym)
decoded = CBOR.decode(encoded) # => :hello
# Faster: just an integer ID, but requires same mruby build and presym
# In data structures
CBOR.symbols_as_string
h = { name: "Alice", age: 30 }
decoded = CBOR.decode(CBOR.encode(h))
decoded[:name] # => "Alice"| Mode | Encoding | Interop | Round-trip | Speed |
|---|---|---|---|---|
no_symbols |
Plain string | ✅ All | ❌ No | Fast |
symbols_as_string |
Tag 39 + string | ✅ All | ✅ Yes | Good |
symbols_as_uint32 |
Tag 39 + uint32 | ❌ mruby only | ✅ Yes | Fastest |
symbols_as_uint32 requires:
- Same mruby build (encoder and decoder must use identical
libmruby.a) - Compile-time symbols (presym enabled)
- No dynamic symbol creation during decoding
Use symbols_as_string for external/untrusted data. Use symbols_as_uint32 only for internal mruby-to-mruby IPC.
For types that can't use native_ext_type (e.g., Exception, Time, or built-in C types), register encode/decode procs:
# Exception serialization (encode + decode)
CBOR.register_tag(50000) do
encode Exception do |e|
[e.class, e.message, e.backtrace]
end
decode Array do |a|
exc = a[0].new(a[1])
exc.set_backtrace(a[2]) if a[2]
exc
end
end
# Now encode and re-raise exceptions:
buf = begin
raise ArgumentError, "something went wrong"
rescue => e
CBOR.encode(e)
end
exc = CBOR.decode(buf)
raise exc # Re-raises ArgumentError with original message & backtrace
# Time serialization (if mruby-time available)
CBOR.register_tag(1) do # Tag 1 is RFC 8949 epoch-based time
encode Time do |t|
t.to_f # Seconds since Unix epoch as float
end
decode Float do |v|
Time.at(v)
end
end
CBOR.encode(Time.now) # => tag(1, <float>)
CBOR.decode(buf) # => Time objectEncode type matching uses kind_of?: Register Exception and it matches StandardError, ArgumentError, and all subclasses. Natively-encoded types (String, Integer, Array, Hash, etc.) are rejected as encode types (they're handled by the core encoder and would be ignored).
Classes and modules round-trip automatically via a private tag (49999):
CBOR.encode(String) # => tag(49999, "String")
CBOR.decode(buf) # => String (the class itself)
CBOR.encode(CBOR::UnhandledTag) # => tag(49999, "CBOR::UnhandledTag")
# In structures
h = { "exception_type" => StandardError, "code" => 123 }
decoded = CBOR.decode(CBOR.encode(h))
decoded["exception_type"] # => StandardError
# Anonymous classes raise ArgumentError
CBOR.encode(Class.new) # Error: "cannot encode anonymous class/module"mruby-to-mruby only: The constant path syntax (:: separator) is Ruby-specific. Other runtimes won't know how to resolve StandardError from the string path.
Floats use the smallest CBOR float width that represents them losslessly:
CBOR.encode(0.0).bytesize # => 3 bytes (f16)
CBOR.encode(1.0).bytesize # => 3 bytes (f16)
CBOR.encode(1.5).bytesize # => 3 bytes (f16)
CBOR.encode(1.0e10).bytesize # => 5 bytes (f32)
CBOR.encode(3.14).bytesize # => 9 bytes (f64)Algorithm: Pure bit-pattern arithmetic, zero floating-point operations:
| Value | Encoding | Rationale |
|---|---|---|
| NaN (any payload) | f16 0x7E00 |
Canonical per RFC 8949 Appendix B |
| ±Inf, ±0 | f16 | Always representable |
| f16 normal | f16 if exp32 ∈ [113..142] and low 13 mant bits = 0 | Lossless in f16 |
| f16 subnormal | f16 if exp32 ∈ [103..112] and shift is exact | Lossless in f16 |
| Fits in f32 | f32 if low 29 f64 mant bits = 0 and exp in range | Lossless in f32 |
| Everything else | f64 | Need full precision |
MRB_USE_FLOAT32 builds: Start at f32 and try f16, skipping f64 entirely.
CBOR documents may contain tags your code doesn't recognize. Rather than failing, unknown tags decode as CBOR::UnhandledTag objects:
# Decode a tag(100, 42) that you didn't register
result = CBOR.decode("\xD8\x64\x18\x2A")
result.is_a?(CBOR::UnhandledTag) # => true
result.tag # => 100
result.value # => 42
# In nested structures
result = CBOR.decode("\xD8\x64\x83\x01\x02\x03")
result.is_a?(CBOR::UnhandledTag) # => true
result.value # => [1, 2, 3]begin
CBOR.decode(malformed_buffer)
rescue RangeError => e
# Out-of-bounds access, integer overflow, length too large
puts "Bounds error: #{e.message}"
rescue RuntimeError => e
# Truncated buffer, invalid encoding, nesting too deep
puts "Runtime error: #{e.message}"
rescue TypeError => e
# Type mismatch in registered tags
puts "Type error: #{e.message}"
rescue KeyError => e
# Missing key in lazy map access (use dig for safe access)
puts "Missing key: #{e.message}"
rescue NameError => e
# Tag 49999 resolves to unknown constant
puts "Unknown class: #{e.message}"
endThis section answers: Will the same input always produce the same output?
-
Integer encoding (all bases: 2, 10, 16, etc.)
- Fixnum (
mrb_int): Encoded in varint-style (shortest form, no padding) - Bignum (
mrb_bigint): Converted to big-endian byte string (Tag 2/3), no lossy conversions - Negative integers: Computed via
-1 - nrule, not two's complement tricks - Same integer value → same wire bytes, always, regardless of how the integer was created
- Example:
2**100and(1 << 100)encode identically, even across mruby builds
- Fixnum (
-
Float width selection (bit-pattern only)
- A float's wire encoding depends solely on its bit-pattern, not FP rounding
- NaN always → canonical f16
0xF97E00, regardless of NaN payload or sign bit - Same float value → same wire bytes, always
-
Hash field order
- Encoding follows insertion order (guaranteed in mruby)
- Hashes are always encoded in the order fields are created, forever
- Reproducible across encodes, mruby versions, and across different machines
-
Bignum negation
- Negative bignums computed in pure integer arithmetic (no FP edge cases)
-n=-1 - (n-1)for correctness; no rounding, no overflow surprises
-
Integer overflow protection
- Explicit bounds checks prevent wraparound
- Out-of-range values raise
RangeErrorconsistently
| Factor | Impact | Details |
|---|---|---|
| mruby build config | Symbols, float width range | MRB_USE_FLOAT32, presym settings affect encoding choices |
| Compile flags | Might affect numeric representation | Different CFLAGS could theoretically affect float behavior (though unlikely in practice) |
| Symbol IDs | Non-portable across mruby binaries | Presym IDs differ between mruby builds; use symbols_as_string for portability |
Practical: Same mruby binary + same input = same output, forever. For cross-machine reproducibility, use symbols_as_string (portable) instead of symbols_as_uint32 (binary-specific).
This implementation strictly follows RFC 8949:
- §3.1 (Unsigned integers): Encoded in shortest form (varint-style)
- §3.2 (Negative integers): Follows
-1 - nrule consistently - §3.3 (Byte strings): Uninterpreted binary; no UTF-8 check
- §3.4 (Text strings): Validated UTF-8; major type 3
- §3.5 (Arrays & maps): Definite length only (no indefinite, use streaming instead)
- §4.1 (Float preferred serialization): Smallest lossless width (f16→f32→f64)
- §4.2 (Simplicity values):
false,true,nullonly (notundefined) - §3.4.3 (Bignums/Tags 2&3): RFC 8949 zero-length payload rule respected
- §3.4.1 (Tags 28&29): Shared references and identity-preserving decoding
- Appendix B (CBOR Canonical CBOR, CTAP2): NaN always
0xF97E00
| Operation | Time | vs. msgpack | vs. simdjson |
|---|---|---|---|
| Encode small struct | 1× | ~30% faster | N/A |
| Encode large array | 1× | ~25% faster | N/A |
| Decode (eager) full | 1× | ~20% faster | N/A |
| Decode (lazy) selective access | 1× | N/A | 1.3–3× faster |
Lazy decoding shines: When you only need a few fields from a 10 MB payload, lazy is 10–100× faster than eager.
Default limits depend on mruby profile:
MRB_PROFILE_MAIN / MRB_PROFILE_HIGH → CBOR_MAX_DEPTH = 128
MRB_PROFILE_BASELINE → CBOR_MAX_DEPTH = 64
Constrained / other → CBOR_MAX_DEPTH = 32Override at compile time:
cd /path/to/mruby
MRUBY_CFLAGS="-DCBOR_MAX_DEPTH=256" rake compileExceeding depth raises RuntimeError: "CBOR nesting depth exceeded".
Encoding uses a small-buffer optimization (SBO) with 16 KB stack buffer:
- Small documents (< 16 KB) → stack only, no allocation
- Larger → spills to heap, grows geometrically
- No unnecessary copies; final result is the heap string
File streaming uses adaptive readahead with doubling strategy:
- First read: 9 bytes (minimum needed to parse any CBOR header)
- Use
doc_end()to find where the document ends by skipping its contents - If the buffer contains the full document: yield it, move to next
- If not complete: double the read size, re-read from the same offset, retry
- Continue doubling until the full document is buffered
- Then read exactly the remaining bytes needed (if any) to avoid over-reading
Why doubling? CBOR documents can be arbitrarily nested (arrays, maps, tags wrapping each other). Finding the document boundary requires parsing the structure, not just reading a length header. The doubling strategy balances:
- Most documents (< 16 KB) fit in 1–2 reads
- Large documents don't require excessive seeks
- No fixed buffer size that wastes memory or fails on edge cases
| Error | Cause | Example |
|---|---|---|
ArgumentError |
Invalid option or reserved tag number | CBOR.encode(obj, invalid: true) |
RangeError |
Integer overflow or invalid length | Encoding bigint larger than wire allows |
RuntimeError |
Malformed CBOR, depth exceeded, or unimplemented | Truncated buffer, nested too deep, indefinite-length |
TypeError |
Type mismatch in registered tag or unknown source | Field is Array but schema expects String |
KeyError |
Lazy map access to missing key | lazy["nonexistent"] (use .dig for safe access) |
IndexError |
Lazy array out of bounds or invalid shared ref | lazy[999] on 3-item array |
NameError |
Unknown constant in Tag 49999 | Decoding a class name not defined on this side |
NotImplementedError |
Indefinite-length or presym unavailable | CBOR.symbols_as_uint32 on non-presym mruby |
Full RFC 8949 compliance. Official specification: https://tools.ietf.org/html/rfc8949
- RFC 8949 Section 1–6: Complete support (except indefinite-length)
- RFC 8949 Appendix A–D: Diagnostic notation, CDDL, CBOR in JSON, test vectors
Tested against:
- Python:
cbor2library (comprehensive test suite viainterop.py)
RFC 8949 compliance: This implementation strictly adheres to RFC 8949, so it should interoperate with any spec-compliant CBOR decoder in any language.
Guaranteed portability via symbols_as_string: If you use string-based symbols (Tag 39), your CBOR documents are readable by any RFC 8949-compliant decoder in any language.
Official RFC 8949 test vectors (Appendix A) included in /test-vectors/. Run:
# Validate against official vectors
rake testApache License 2.0. See LICENSE file.
Issues, PRs, and bug reports welcome. See interop.py for testing against other implementations.
1. Extract sign, exponent, mantissa from f64 bit-pattern
2. If NaN: emit canonical f16 0xF97E00, done
3. If can fit in f32 (29 low mant bits = 0, exp in range): continue to f16 check
4. If can fit in f16 (subnormal or normal range checks): emit f16
5. If fits in f32: emit f32
6. Else: emit f64
Key insight: No float rounding—entire algorithm is integer bit manipulation.
Encoding: When sharedrefs: true, maintain a hash of seen objects by mrb_obj_id:
1. Pre-pass (skip_cbor): Walk the object graph, tagging each unique object once with Tag 28
2. Encode: When re-encountering an object, emit Tag 29 + index into shareable table
3. Identity preserved: Decoder maintains a parallel array, resolving indexes to same decoded objects
Decoding:
1. Tag 28 (Shareable): Decode the wrapped value, push to shareable table
2. Tag 29 (Shared ref): Read index, fetch from shareable table, return that object
3. Result: `decoded["x"].equal?(decoded["y"])` is true if `obj["x"]` and `obj["y"]` were identical
Key insight: Lazy object is a (buffer, offset, key-cache) triple
1. cbor_lazy_new: Create Lazy wrapping buffer + offset, cache empty
2. lazy["key"]: Seek to offset, decode just enough to find key, create new Lazy for value, cache result
3. lazy.value: Decode from current offset, return fully-realized value, cache in vcache
4. Repeated access: Fetch from cache (O(1))
Buffer is not copied; each Lazy view just changes the offset. Perfect for partial extraction from large files.
RFC 8949 §3.4.3: Bignums outside int64 range encoded as:
- Tag 2: Positive bigint as byte string (big-endian)
- Tag 3: Negative bigint as byte string (|magnitude| - 1, big-endian)
Example: (1 << 200) + 1 → Tag 2 wrapping 26-byte hex string
Zero-length payloads: tag(2, h'') = 0, tag(3, h'') = -1 (edge case handled per RFC 8949)
| Mode | Wire Format | Lookup Speed | Interop |
|---|---|---|---|
no_symbols |
Plain string | N/A | Universal, loses type |
symbols_as_string |
Tag 39 + string | O(string compare) | RFC 8949 compatible |
symbols_as_uint32 |
Tag 39 + uint32 | O(array index) | mruby-only, requires presym |
Presym IDs are non-portable: Symbol ID 42 on your mruby might be ID 100 on another mruby built with different --enable-presym-inline settings.
Each decode call increments a per-Reader depth counter. At CBOR_MAX_DEPTH, raises RuntimeError. Prevents:
- Deeply nested arrays: [[[[[...]]]]]]
- Deeply nested maps: {"a": {"a": {"a": ...}}}
- Circular references without sharedrefs (would loop forever)
Limit is configurable at compile time and profile-aware.
When encoding, strings are automatically classified:
# UTF-8 string → encoded as text (major type 3)
CBOR.encode("hello") # => major 3 (text string)
# Binary string → encoded as bytes (major type 2)
CBOR.encode("\x00\xFF\xFE".b) # => major 2 (byte string)The encoder checks each string's UTF-8 validity at encode time and chooses the appropriate major type (3 for text, 2 for binary).
When decoding, text strings (major type 3) are validated as UTF-8 when mruby is compiled with MRB_UTF8_STRING. If mruby was compiled without UTF-8 string support, the validation is skipped (the strings are still decoded, just not validated).
Byte strings (major type 2) are never validated or touched—they're uninterpreted binary, regardless of compile flags.
This matches RFC 8949 (which requires UTF-8 for text strings) and prevents UTF-8 injection attacks when validation is enabled.
Built with ❤️ for mruby. Fast. Reliable. Deterministic.