Summary
fetch_dataset_items accumulates all pages into a single Vec<T> with no upper bound, causing OOM for large Apify datasets with millions of items.
Location
- File:
src/lib.rs
- Line(s): 274–301
Severity
Medium
Details
let mut out: Vec<T> = Vec::new();
loop {
let chunk: Vec<T> = resp.json().await?;
out.extend(chunk); // unbounded growth
if n < limit { break; }
}
For large datasets this consumes all available memory. There is no way to specify a maximum item count or process items as a stream.
Suggested Fix
Add a max_items parameter:
pub async fn fetch_dataset_items_limited<T: DeserializeOwned>(
&self, max_items: usize
) -> Result<Vec<T>> {
// break when out.len() >= max_items
}
Or provide a streaming/async-iterator interface.
Automated finding by repo-monitor
Summary
fetch_dataset_itemsaccumulates all pages into a singleVec<T>with no upper bound, causing OOM for large Apify datasets with millions of items.Location
src/lib.rsSeverity
Medium
Details
For large datasets this consumes all available memory. There is no way to specify a maximum item count or process items as a stream.
Suggested Fix
Add a
max_itemsparameter:Or provide a streaming/async-iterator interface.
Automated finding by repo-monitor