Skip to content

z5labs/avro-go

Repository files navigation

avro

A library for working with Apache Avro encoded data in Go.

Packages

Package Description
github.com/z5labs/avro-go Binary encoding and decoding of Avro primitives
github.com/z5labs/avro-go/canonical Parsing Canonical Form schema types with JSON marshaling
github.com/z5labs/avro-go/idl Avro IDL tokenizer, parser, and printer

avro — Binary Encoding

Marshaling

Implement BinaryMarshaler on your type and call MarshalBinary to encode it:

type Message struct {
    Content string
}

func (m Message) MarshalAvroBinary(w *avro.BinaryWriter) error {
    return w.WriteString(m.Content)
}

var buf bytes.Buffer
err := avro.MarshalBinary(&buf, Message{Content: "hello"})

BinaryWriter exposes one write method per Avro primitive:

Method Avro type
WriteBool(bool) boolean
WriteInt(int32) int
WriteLong(int64) long
WriteFloat(float32) float
WriteDouble(float64) double
WriteBytes([]byte) bytes
WriteFixed([]byte) fixed
WriteString(string) string

Unmarshaling

Implement BinaryUnmarshaler on your type and call UnmarshalBinary to decode it:

func (m *Message) UnmarshalAvroBinary(r *avro.BinaryReader) error {
    var err error
    m.Content, err = r.ReadString()
    return err
}

var msg Message
err := avro.UnmarshalBinary(bytes.NewReader(data), &msg)

BinaryReader mirrors BinaryWriter with corresponding Read* methods.

Single-Object Encoding

The Avro single-object encoding prepends a 2-byte magic header and an 8-byte schema fingerprint to the binary payload, allowing readers to identify the schema at runtime.

Implement SingleObjectMarshaler (embeds BinaryMarshaler plus a Fingerprint() [8]byte method) and call MarshalSingleObject:

func (m Message) Fingerprint() [8]byte {
    var fp [8]byte
    binary.LittleEndian.PutUint64(fp[:], avro.Fingerprint64([]byte(`"string"`)))
    return fp
}

var buf bytes.Buffer
err := avro.MarshalSingleObject(&buf, msg)

Decode with SingleObjectUnmarshaler and UnmarshalSingleObject:

var msg Message
err := avro.UnmarshalSingleObject(r, &msg)

UnmarshalSingleObject returns ErrBadMagic when the header is invalid and ErrFingerprintMismatch when the schema fingerprint in the payload does not match the one returned by Fingerprint().

Schema Fingerprinting

Fingerprint64 computes the 64-bit Rabin fingerprint (CRC-64-AVRO) of a schema JSON string, as defined in the Avro specification:

fp := avro.Fingerprint64([]byte(`"string"`))

canonical — Parsing Canonical Form

The canonical package provides typed Go representations of Avro schemas in Parsing Canonical Form. The top-level Schema type implements json.Marshaler and json.Unmarshaler, producing canonical JSON with correct field ordering and no extra whitespace.

Creating schemas

Use the constructor functions to build schemas:

s := canonical.RecordSchema(canonical.Record{
    Name: "com.example.Person",
    Fields: []canonical.Field{
        {Name: "name", Type: canonical.PrimitiveSchema(canonical.String)},
        {Name: "age", Type: canonical.PrimitiveSchema(canonical.Int)},
    },
})

Primitive constants are provided for all Avro primitives: Null, Boolean, Int, Long, Float, Double, Bytes, String.

Marshaling to canonical JSON

b, err := json.Marshal(s)
// {"name":"com.example.Person","type":"record","fields":[{"name":"name","type":"string"},{"name":"age","type":"int"}]}

Unmarshaling from canonical JSON

var s canonical.Schema
err := json.Unmarshal(data, &s)

r, ok := s.Record()
if ok {
    fmt.Println(r.Name) // com.example.Person
}

Accessor methods (Primitive(), Record(), Enum(), Array(), Map(), Union(), Fixed()) provide type-safe access to the underlying concrete type.

Converting from IDL

SchemaFrom converts a parsed idl.Schema into canonical form:

f, err := idl.Parse(strings.NewReader(`
    namespace com.example;
    schema int;
    record Person {
        string name;
        int    age;
    }
`))

schemas, err := canonical.SchemaFrom(f.Schema)

b, err := json.Marshal(schemas[1])
// {"name":"com.example.Person","type":"record","fields":[{"name":"name","type":"string"},{"name":"age","type":"int"}]}

The function returns a slice because an IDL schema can define multiple named types. Namespace qualification, type references, and all structural schema information are preserved; non-canonical attributes (doc comments, aliases, defaults) are stripped.


idl — Avro IDL

The idl package parses Avro IDL source files into an AST and can print an AST back to IDL text.

Parsing

Parse reads an Avro IDL source from any io.Reader and returns a *File AST:

f, err := idl.Parse(strings.NewReader(`
    namespace com.example;
    schema record User {
        string name;
        int    age;
    }
`))

The File struct contains either a *Schema or a *Protocol. A *Schema holds the top-level named types (Record, Enum, Fixed) and primitive type identifiers.

Printing

Print formats a *File AST back to Avro IDL text:

var buf bytes.Buffer
err := idl.Print(&buf, f)
fmt.Println(buf.String())

Tokenizing

Tokenize exposes the low-level lexer as an iter.Seq2[Token, error] iterator (Go 1.23+):

for tok, err := range idl.Tokenize(r) {
    if err != nil {
        // handle error
    }
    fmt.Println(tok)
}

Token types include TokenComment, TokenDocComment, TokenIdentifier, TokenSymbol, TokenString, TokenNumber, and TokenAnnotation.


License

Released under the MIT License.

About

A library for working with Apache Avro encoded data in Go.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages