Skip to content

[Go] How to approach to implement Parquet-Go file format  #54

@hkpeaks

Description

@hkpeaks

Describe the usage question you have. Please include as many useful details as possible.

Based on current testing my Golang dataframe project which is achieving outstanding performance when data source is csv file. Next step I want to implement parquet file format when it becomes very popular. After a basic reserach in this topic, I am still not clear to approach how to implement Parquet using your golang library as there are little code sample I can find in the community.

I have built a in-memory and streaming engine which is using simple data structure bytearray similar to csv, I have developed a set of algorithms to support byte-to-byte conversion directly for ETL functons such as Read/Write file, Distinct, GroupBy, JoinTable and Select column/row. Also solve JoinTable for billion of row using 32GB memory.

My requirement to support Parquet format is simple. I need to know how to read bytearray of particular columns of Parquet format. After I can read, of course need to know how to write to disk.

Today I have published the pre-release 64-bit runtime for Windows/Linux of my project @ https://github.com/hkpeaks/peaks-consolidation/releases

Component(s)

Go

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions