Is your feature request related to a problem? Please describe.
NeuG currently has limited support for external data sources. Users cannot read or import data from diverse file formats (e.g., GraphAR, Iceberg), cannot access files on remote storage systems (S3, OSS, HTTP), and cannot export query results to external formats like Parquet. These gaps block basic data ingestion workflows before any graph computation can take place.
Describe the solution you'd like
This is a tracking issue for the full External Data Support roadmap. It covers the following sub-tasks:
Sub-Issues
-
[Feature] Support more external data formats (Parquet, GraphAR)
- Parquet: basic support is largely in place; remaining gaps to be addressed.
- GraphAR: support reading GraphAR-formatted graph data as external tables (graph) via
LOAD FROM.
-
[Feature] Support remote filesystem access (S3, OSS, HTTP)
- Enable
LOAD FROM to access files on Amazon S3, Alibaba Cloud OSS, and plain HTTP endpoints.
- Implement a pluggable filesystem abstraction layer to support multiple remote backends.
-
[Feature] Support data lake format: Apache Iceberg
- Allow reading Iceberg tables as external data sources in
LOAD FROM queries.
- Support schema inference and snapshot-level reads.
-
[Feature] Query optimization for external data
- Partition pruning on Iceberg: skip irrelevant partitions based on query predicates.
- Predicate pushdown on GraphAR / Parquet: push filter conditions into the scan layer to reduce I/O and improve performance.
- Other scan-level optimizations as formats are added.
-
[Feature] Export query results to Parquet
- Support
COPY ... TO '...' (FORMAT PARQUET) or equivalent syntax for exporting query results as Parquet files.
- Enable writing to both local filesystem and remote storage (S3, OSS, HTTP).
Describe alternatives you've considered
- Requiring users to manually import all external data into NeuG before querying — this is the current workaround but adds friction and storage overhead.
- Using external ETL pipelines to pre-convert data before loading — shifts format conversion complexity entirely to the user.
Additional context
- Parquet read support via extension is already partially implemented.
- Remote filesystem abstraction work has started (see
extension/s3/).
- All sub-features above should be tracked as individual child issues linked to this parent issue.
Is your feature request related to a problem? Please describe.
NeuG currently has limited support for external data sources. Users cannot read or import data from diverse file formats (e.g., GraphAR, Iceberg), cannot access files on remote storage systems (S3, OSS, HTTP), and cannot export query results to external formats like Parquet. These gaps block basic data ingestion workflows before any graph computation can take place.
Describe the solution you'd like
This is a tracking issue for the full External Data Support roadmap. It covers the following sub-tasks:
Sub-Issues
[Feature] Support more external data formats (Parquet, GraphAR)
LOAD FROM.[Feature] Support remote filesystem access (S3, OSS, HTTP)
LOAD FROMto access files on Amazon S3, Alibaba Cloud OSS, and plain HTTP endpoints.[Feature] Support data lake format: Apache Iceberg
LOAD FROMqueries.[Feature] Query optimization for external data
[Feature] Export query results to Parquet
COPY ... TO '...' (FORMAT PARQUET)or equivalent syntax for exporting query results as Parquet files.Describe alternatives you've considered
Additional context
extension/s3/).