Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ futures-util = "0.3"
blake3 = "1"
lol_html = "2"
reqwest = { version = "0.12", features = ["cookies", "gzip", "brotli", "deflate", "json"] }
rquest = { version = "5", features = ["cookies", "gzip", "brotli", "deflate", "json", "stream", "socks"] }
rquest = { version = "5", features = ["cookies", "gzip", "brotli", "deflate", "json", "stream", "socks", "multipart"] }
rquest-util = "2"
parking_lot = "0.12"
base64 = "0.22"
Expand Down
23 changes: 13 additions & 10 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Pardus Browser Roadmap

**Version:** 0.4.0-dev | **Branch:** dev/roadmap | **Updated:** April 3, 2026
**Version:** 0.4.0-dev | **Branch:** dev/roadmap | **Updated:** April 4, 2026

---

Expand All @@ -12,7 +12,7 @@ Core engine, CLI, and all major subsystems are stable. Summary of shipped featur
|------|-----------|
| **Semantic Engine** | ARIA role tree, navigation graph, element IDs (`[#N]`), action annotations (navigate/click/fill/toggle/select), interactive-only mode, 4 output formats (md, tree, json, llm) |
| **Page Interaction** | Click, type, submit, wait-for-selector, scroll pagination, JS-level interaction (deno_core DOM), inline event handler registration, DOM mutation serialization |
| **JavaScript** | V8 via deno_core, 35+ Rust DOM ops, thread-based timeouts, inline script execution, analytics/problematic script filtering |
| **JavaScript** | V8 via deno_core, 42+ Rust DOM ops, thread-based timeouts, inline script execution, analytics/problematic script filtering |
| **Security** | SSRF protection (private IPs, metadata endpoints, scheme blocking), Basic/Bearer auth, CSP parsing & enforcement, certificate pinning (SPKI hash + CA), sandbox mode (off/strict/moderate/minimal) |
| **Session & Cache** | Cookie/localStorage/auth persistence, HTTP cache (RFC 7234: ETag, Last-Modified, 304), disk cache, shared HTTP client factory |
| **Proxy** | HTTP/HTTPS/SOCKS5, per-command flags, env var support (HTTP_PROXY etc.), no-proxy exclusions |
Expand All @@ -27,6 +27,9 @@ Core engine, CLI, and all major subsystems are stable. Summary of shipped featur
| **Adapters** | Playwright (Python + Node.js), Puppeteer (Node.js), Docker image with health check |
| **CLI** | 8 subcommands (navigate, interact, serve, repl, tab, map, clean), rustyline REPL, verbose logging |
| **Perf** | Connection pooling, HTTP/2 push simulation, configurable memory limits, ~200ms page parse |
| **AI Agent Intelligence** | Action planning (page-type classification, suggested next actions), auto-form filling with validation, smart wait conditions (network idle, DOM stability, content mutations), session recording & replay (JSON serialization, deterministic replay) |
| **Anti-bot Detection** | Challenge detection (reCAPTCHA, hCaptcha, Turnstile, JS challenges), risk scoring, human-in-the-loop resolution |
| **Meta Refresh** | `<meta http-equiv="refresh">` parsing with delay, relative URLs, query params, fragments, base tag support, redirect depth limiting |

---

Expand All @@ -50,12 +53,12 @@ _(Currently empty)_

### AI Agent Intelligence

- [ ] **Action planning** — Suggested next actions based on page state
- [ ] **Auto-form filling** — AI-guided form completion with validation
- [ ] **Smart wait conditions** — Wait for network idle, DOM stability, or content mutations instead of fixed timers
- [ ] **Session recording & replay** — Serialize action sequences to JSON, replay deterministically
- [x] **Action planning** — Suggested next actions based on page state
- [x] **Auto-form filling** — AI-guided form completion with validation
- [x] **Smart wait conditions** — Wait for network idle, DOM stability, or content mutations instead of fixed timers
- [x] **Session recording & replay** — Serialize action sequences to JSON, replay deterministically
- [ ] **Page diff** — Compare semantic trees between navigations; detect what changed (new elements, removed content, state transitions)
- [ ] **Anti-bot detection hints** — Report Cloudflare/PerimeterX/DataDome challenges in semantic output so agents know they're blocked
- [x] **Anti-bot detection hints** — Report Cloudflare/PerimeterX/DataDome challenges in semantic output so agents know they're blocked
- [ ] **Login flow templates** — Declarative YAML/JSON descriptors for common auth patterns (email+password, SSO click-through, MFA TOTP)
- [ ] **Content extraction** — Article/main-content extraction (Readability-style) stripping nav, ads, footers; output clean text for LLM ingestion
- [ ] **Structured data extraction** — Detect and expose JSON-LD, Open Graph, microdata, RDFa from pages as typed Rust structs
Expand All @@ -75,7 +78,7 @@ _(Currently empty)_
- [ ] **Cookie API in JS** — `document.cookie` getter/setter wired to the session cookie store
- [ ] **localStorage/sessionStorage in JS** — Persistent and per-session storage backed by pardus-core session store
- [ ] **MutationObserver shim** — Allow JS to observe DOM changes for SPA reactivity detection
- [ ] **Event dispatch** — Allow agents to fire arbitrary DOM events (change, input, submit, custom) for frameworks that listen on native events
- [x] **Event dispatch** — Allow agents to fire arbitrary DOM events (change, input, submit, custom) for frameworks that listen on native events

### Network & Protocol

Expand All @@ -91,12 +94,12 @@ _(Currently empty)_
- [x] **PDF text extraction** — Parse PDF bytes to semantic tree with table, form-field (AcroForm), and image metadata extraction
- [x] **RSS/Atom feed parsing** — Detect and parse RSS/Atom feed content into structured items (title, link, date, summary)
- [ ] **Robots.txt parser** — Respect crawl directives; expose `is_allowed(url)` for the knowledge graph crawler
- [ ] **Meta refresh & redirects** — Parse `<meta http-equiv="refresh">` and JS `location.href` assignments as navigations
- [x] **Meta refresh & redirects** — Parse `<meta http-equiv="refresh">` and JS `location.href` assignments as navigations
- [ ] **Content encoding** — Handle gzip/brotli/zstd transfer encodings beyond what reqwest provides automatically

### CDP Completeness

- [ ] **DOM manipulation** — Implement stubbed methods: setNodeValue, setNodeName, removeAttribute, copyTo, moveTo, undo/redo
- [x] **DOM manipulation** — Implement stubbed methods: setNodeValue, setNodeName, removeAttribute, copyTo, moveTo, undo/redo
- [ ] **Input event dispatch** — Wire mouse/keyboard events through pardus-core interaction system (currently stubbed)
- [ ] **File upload** — Implement DOM.setFileInputFiles for `<input type="file">` handling
- [ ] **Network interception in CDP** — Fetch.enable / Fetch.requestPaused for request/response modification over CDP
Expand Down
2 changes: 1 addition & 1 deletion crates/pardus-cdp/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ version.workspace = true
edition.workspace = true

[dependencies]
pardus-core = { path = "../pardus-core", features = ["tls-pinning"] }
pardus-core = { path = "../pardus-core", features = ["tls-pinning", "js"] }
pardus-debug = { path = "../pardus-debug" }
scraper = "0.22"
tokio = { workspace = true }
Expand Down
223 changes: 201 additions & 22 deletions crates/pardus-cdp/src/domain/dom.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,34 @@ fn resolve_target_id(session: &CdpSession) -> &str {
session.target_id.as_deref().unwrap_or("default")
}

/// Parse target HTML into DomDocument, apply a mutation, serialize back.
async fn mutate_dom<F>(
ctx: &DomainContext,
target_id: &str,
f: F,
) -> HandleResult
where
F: FnOnce(&mut pardus_core::js::dom::DomDocument, &NodeMap),
{
let html_str = match ctx.get_html(target_id).await {
Some(h) => h,
None => return HandleResult::Ack,
};
let url = ctx.get_url(target_id).await.unwrap_or_default();

let mut doc = pardus_core::js::dom::DomDocument::from_html(&html_str);
let nm = ctx.node_map.lock().await;

f(&mut doc, &nm);

let new_html = doc.to_html();
let title = doc.get_title();
drop(nm);
ctx.update_target_with_data(target_id, url, new_html, Some(title));

HandleResult::Ack
}

#[async_trait(?Send)]
impl CdpDomainHandler for DomDomain {
fn domain_name(&self) -> &'static str {
Expand Down Expand Up @@ -201,23 +229,68 @@ impl CdpDomainHandler for DomDomain {
}
"setAttributeValue" => {
let node_id = params["nodeId"].as_i64().unwrap_or(-1);
let attr_name = params["name"].as_str().unwrap_or("");
let attr_value = params["value"].as_str().unwrap_or("");
let selector = {
let nm = ctx.node_map.lock().await;
nm.get_selector(node_id).map(|s| s.to_string())
};

if let Some(_sel) = selector {
let _ = (attr_name, attr_value);
}

HandleResult::Ack
let attr_name = params["name"].as_str().unwrap_or("").to_string();
let attr_value = params["value"].as_str().unwrap_or("").to_string();
mutate_dom(ctx, target_id, |doc, nm| {
if let Some(selector) = nm.get_selector(node_id) {
if let Some(elem_id) = doc.query_selector(0, selector) {
doc.set_attribute(elem_id, &attr_name, &attr_value);
}
}
}).await
}
"removeAttribute" => {
let node_id = params["nodeId"].as_i64().unwrap_or(-1);
let attr_name = params["name"].as_str().unwrap_or("").to_string();
mutate_dom(ctx, target_id, |doc, nm| {
if let Some(selector) = nm.get_selector(node_id) {
if let Some(elem_id) = doc.query_selector(0, selector) {
doc.remove_attribute(elem_id, &attr_name);
}
}
}).await
}
"removeNode" => {
let node_id = params["nodeId"].as_i64().unwrap_or(-1);
mutate_dom(ctx, target_id, |doc, nm| {
if let Some(selector) = nm.get_selector(node_id) {
if let Some(elem_id) = doc.query_selector(0, selector) {
if let Some(parent_id) = doc.get_parent(elem_id) {
doc.remove_child(parent_id, elem_id);
}
}
}
}).await
}
"setNodeValue" => {
let node_id = params["nodeId"].as_i64().unwrap_or(-1);
let value = params["value"].as_str().unwrap_or("").to_string();
mutate_dom(ctx, target_id, |doc, nm| {
if let Some(selector) = nm.get_selector(node_id) {
if let Some(elem_id) = doc.query_selector(0, selector) {
// For text nodes discovered as children of elements
let children = doc.get_children(elem_id);
for &child_id in &children {
if doc.get_node_type(child_id) == 3 {
doc.set_node_value(child_id, &value);
return;
}
}
}
}
}).await
}
"setNodeName" => {
let node_id = params["nodeId"].as_i64().unwrap_or(-1);
let new_name = params["name"].as_str().unwrap_or("").to_string();
mutate_dom(ctx, target_id, |doc, nm| {
if let Some(selector) = nm.get_selector(node_id) {
if let Some(elem_id) = doc.query_selector(0, selector) {
doc.set_node_name(elem_id, &new_name);
}
}
}).await
}
"removeAttribute" => HandleResult::Ack,
"removeNode" => HandleResult::Ack,
"setNodeValue" => HandleResult::Ack,
"setNodeName" => HandleResult::Ack,
"getBoxModel" => {
HandleResult::Success(serde_json::json!({
"model": {
Expand Down Expand Up @@ -273,7 +346,73 @@ impl CdpDomainHandler for DomDomain {
let body_id = nm.get_or_assign("body");
HandleResult::Success(serde_json::json!({ "nodeId": body_id }))
}
"setFileInputFiles" => HandleResult::Ack,
"setFileInputFiles" => {
let node_id = params["backendNodeId"].as_i64()
.or(params["nodeId"].as_i64())
.unwrap_or(-1);

let selector = {
let nm = ctx.node_map.lock().await;
nm.get_selector(node_id).map(|s| s.to_string())
};

if let Some(selector) = selector {
let (html_str, url) = (ctx.get_html(target_id).await, ctx.get_url(target_id).await);
if let (Some(html_str), Some(url)) = (html_str, url) {
let page = pardus_core::Page::from_html(&html_str, &url);
if let Some(handle) = page.query(&selector) {
if handle.input_type.as_deref() == Some("file") || handle.action.as_deref() == Some("upload") {
let file_paths: Vec<std::path::PathBuf> = params["files"]
.as_array()
.map(|arr| arr.iter().filter_map(|v| v.as_str().map(|s| std::path::PathBuf::from(s))).collect())
.unwrap_or_default();

if file_paths.is_empty() {
return HandleResult::Error(CdpErrorResponse {
id: 0,
error: crate::error::CdpErrorBody {
code: INVALID_PARAMS,
message: "No files specified".to_string(),
},
session_id: None,
});
}

let max_size = 50 * 1024 * 1024;
match pardus_core::interact::upload::upload_files(&page, &handle, &file_paths, max_size) {
Ok(files) => {
let file_names: Vec<&str> = files.iter().map(|f| f.file_name.as_str()).collect();
let count = file_names.len();
return HandleResult::Success(serde_json::json!({
"files": file_names,
"count": count,
}));
}
Err(e) => {
return HandleResult::Error(CdpErrorResponse {
id: 0,
error: crate::error::CdpErrorBody {
code: INVALID_PARAMS,
message: e.to_string(),
},
session_id: None,
});
}
}
}
}
}
}

HandleResult::Error(CdpErrorResponse {
id: 0,
error: crate::error::CdpErrorBody {
code: INVALID_PARAMS,
message: "Node is not a file input".to_string(),
},
session_id: None,
})
}
"getFileInfo" => {
HandleResult::Error(CdpErrorResponse {
id: 0,
Expand Down Expand Up @@ -307,11 +446,51 @@ impl CdpDomainHandler for DomDomain {
"classNames": []
}))
}
"copyTo" => HandleResult::Ack,
"moveTo" => HandleResult::Ack,
"undo" => HandleResult::Ack,
"redo" => HandleResult::Ack,
"markUndoableState" => HandleResult::Ack,
"copyTo" => {
let node_id = params["nodeId"].as_i64().unwrap_or(-1);
let target_parent_id = params["targetNodeId"].as_i64().unwrap_or(-1);
mutate_dom(ctx, target_id, |doc, nm| {
let source = nm.get_selector(node_id)
.and_then(|s| doc.query_selector(0, s));
let parent = nm.get_selector(target_parent_id)
.and_then(|s| doc.query_selector(0, s));
if let (Some(src_id), Some(par_id)) = (source, parent) {
doc.copy_to(src_id, par_id);
}
}).await
}
"moveTo" => {
let node_id = params["nodeId"].as_i64().unwrap_or(-1);
let target_parent_id = params["targetNodeId"].as_i64().unwrap_or(-1);
let before_id = params["insertBeforeNodeId"].as_i64();
mutate_dom(ctx, target_id, |doc, nm| {
let source = nm.get_selector(node_id)
.and_then(|s| doc.query_selector(0, s));
let parent = nm.get_selector(target_parent_id)
.and_then(|s| doc.query_selector(0, s));
let before = before_id
.and_then(|id| nm.get_selector(id))
.and_then(|s| doc.query_selector(0, s));
if let (Some(src_id), Some(par_id)) = (source, parent) {
doc.move_to(src_id, par_id, before);
}
}).await
}
"undo" => {
mutate_dom(ctx, target_id, |doc, _nm| {
doc.undo();
}).await
}
"redo" => {
mutate_dom(ctx, target_id, |doc, _nm| {
doc.redo();
}).await
}
"markUndoableState" => {
mutate_dom(ctx, target_id, |doc, _nm| {
doc.mark_undoable_state();
}).await
}
"focus" => HandleResult::Ack,
"getFlattenedDocument" => {
let (html_str, url) = (ctx.get_html(target_id).await, ctx.get_url(target_id).await);
Expand Down
Loading
Loading