file level parallelization of the dataflow analysis by EagleoutIce · Pull Request #2088 · flowr-analysis/flowr

EagleoutIce · 2025-12-16T11:03:20Z

No description provided.

worker.ts: handles each tasks by calling the appropriate workerTasks task-registry.ts: contains all definitions for the workerTasks, used by worker.ts threadpool.ts: wrapper for tinypool, handles dispatching tasks and creating / deleting the worker threads

basic dispatch of all files to analyze to threadpool. currently fails, because path to worker file is not correctly resolved diff --git a/src/dataflow/extractor.ts b/src/dataflow/extractor.ts index 34dfe00..83f6be3 100644 --- a/src/dataflow/extractor.ts +++ b/src/dataflow/extractor.ts @@ -26,6 +26,8 @@ import type { ControlFlowInformation } from '../control-flow/control-flow-graph' import { getBuiltInDefinitions } from './environments/built-in-config'; import type { FlowrAnalyzerContext } from '../project/context/flowr-analyzer-context'; import { FlowrFile } from '../project/context/flowr-file'; +import { Threadpool } from './parallel/threadpool'; +import { SourceFilePayload } from './parallel/task-registry'; /** * The best friend of {@link produceDataFlowGraph} and {@link processDataflowFor}. @@ -118,6 +120,20 @@ export function produceDataFlowGraph<OtherInfo>( }; let df = processDataflowFor<OtherInfo>(files[0].root, dfData); + // first call with threadpool + const pool = new Threadpool(); + + // submit all files + const result = pool.submitTasks<SourceFilePayload<OtherInfo>, void>( + "testPool", + files.map((file, i) => ({ + index: i, + file, + data: dfData, + dataflowInfo: df, + })) + ) + for(let i = 1; i < files.length; i++) { /* source requests register automatically */ df = standaloneSourceFile(i, files[i], dfData, df);

threadpool.ts: contains piscina wrapper > now handles worker creation with MessagePorts correctly, tasks can submit more tasks into the queue worker.ts: actual worker file for threadpool > handles port registration to main thread, chooses appropriate handler for tasks, handles subtask submission and collection task-registry.ts: handler definitions for tasks > contains relevant types and interfaces, specifies each handler for a given task extractor.ts: dataflow extractor > modified to dispatch dummy call for all files, threadpool creation currently for each call -> needs to be moved out

- extractor.ts: aggregates all dataflow information and the merges them back together via reduction - built-in-source.ts: dataflow merging is now in seperate function

- feature-def.ts: contains all features and their default value - feature-manager.ts: exposes functionality for setting and checking the feature flags

worker.ts: handles each tasks by calling the appropriate workerTasks task-registry.ts: contains all definitions for the workerTasks, used by worker.ts threadpool.ts: wrapper for tinypool, handles dispatching tasks and creating / deleting the worker threads

basic dispatch of all files to analyze to threadpool. currently fails, because path to worker file is not correctly resolved diff --git a/src/dataflow/extractor.ts b/src/dataflow/extractor.ts index 34dfe00..83f6be3 100644 --- a/src/dataflow/extractor.ts +++ b/src/dataflow/extractor.ts @@ -26,6 +26,8 @@ import type { ControlFlowInformation } from '../control-flow/control-flow-graph' import { getBuiltInDefinitions } from './environments/built-in-config'; import type { FlowrAnalyzerContext } from '../project/context/flowr-analyzer-context'; import { FlowrFile } from '../project/context/flowr-file'; +import { Threadpool } from './parallel/threadpool'; +import { SourceFilePayload } from './parallel/task-registry'; /** * The best friend of {@link produceDataFlowGraph} and {@link processDataflowFor}. @@ -118,6 +120,20 @@ export function produceDataFlowGraph<OtherInfo>( }; let df = processDataflowFor<OtherInfo>(files[0].root, dfData); + // first call with threadpool + const pool = new Threadpool(); + + // submit all files + const result = pool.submitTasks<SourceFilePayload<OtherInfo>, void>( + "testPool", + files.map((file, i) => ({ + index: i, + file, + data: dfData, + dataflowInfo: df, + })) + ) + for(let i = 1; i < files.length; i++) { /* source requests register automatically */ df = standaloneSourceFile(i, files[i], dfData, df);

threadpool.ts: contains piscina wrapper > now handles worker creation with MessagePorts correctly, tasks can submit more tasks into the queue worker.ts: actual worker file for threadpool > handles port registration to main thread, chooses appropriate handler for tasks, handles subtask submission and collection task-registry.ts: handler definitions for tasks > contains relevant types and interfaces, specifies each handler for a given task extractor.ts: dataflow extractor > modified to dispatch dummy call for all files, threadpool creation currently for each call -> needs to be moved out

- extractor.ts: aggregates all dataflow information and the merges them back together via reduction - built-in-source.ts: dataflow merging is now in seperate function

- feature-def.ts: contains all features and their default value - feature-manager.ts: exposes functionality for setting and checking the feature flags

…s' of github.com:flowr-analysis/flowr into 2042-file-level-parallelization-of-the-dataflow-analysis

now uses the threadId provided by piscina more logging statements <- remove later diff --git a/src/dataflow/parallel/worker.ts b/src/dataflow/parallel/worker.ts index 5fd490b..bb0bba2 100644 --- a/src/dataflow/parallel/worker.ts +++ b/src/dataflow/parallel/worker.ts @@ -1,4 +1,4 @@ -import { parentPort, MessageChannel, workerData } from 'node:worker_threads'; +import { parentPort, MessageChannel, workerData, threadId } from 'node:worker_threads'; import type { TaskName } from './task-registry'; import { workerTasks } from './task-registry'; import type { SubtaskReceivedMessage } from './threadpool'; @@ -15,27 +15,27 @@ const pending = new Map< PendingEntry<unknown> >(); + const { port1: workerPort, port2: mainPort } = new MessageChannel(); if(!parentPort){ dataflowLogger.error('Worker started without parentPort present, Aborting worker'); } else { + //console.log(`Worker ${workerData.workerId} registering port to main thread.`); + console.log(threadId); parentPort.postMessage({ type: 'register-port', - workerId: typeof workerData === 'object' && - workerData !== null && - typeof (workerData as { id?: number }).id === 'number' - ? (workerData as { id: number }).id : Math.floor(Math.random() * 1e9), + workerId: threadId, port: mainPort, }, [mainPort] // transfer port to main thread ); } - workerPort.on('message', (msg: unknown) => { if(isSubtaskResponseMessage(msg)){ const { id, result, error } = msg; + console.log(`got response for ${id}`); const entry = pending.get(id); if(!entry) { return; @@ -59,7 +59,7 @@ async function runSubtask<TInput, TOutput>(taskName: TaskName, taskPayload: TInp //return undefined as unknown as TOutput; return new Promise((resolve, reject) => { pending.set(id, { resolve: resolve as (value: unknown) => void, reject }); - + console.log(`submitting subtask with ${id} from ${threadId}`); // submit the subtask to main thread workerPort.postMessage({ type: 'subtask',

- fix for broken worker path - fixed subtask resolution - allowed deferred dataflow merge

- definitions for clonable dataflow data - example usage of clonable data

- use workerWrapper to load and register ts-node for the worker file

threadpool now waits for worker initialization to conclude

feature manager is now a class and not an global object

integrated the feature manager into the flowrAnalyzerBuilder and flowrAnalyzer.

close the analyzer after each test, but without shutting down the parser backend

worker tracks internal state and returns the data with each successfull task -> best effort workerpool tracks own internal state and worker state with best effort approach

workerpool: tests for any kind of memory or message port leak -> will certainly break if run in parallel parallel-dataflow: tests if the output is equivalent to sequential analysis

any pool messages and type checking helpers were moved to own file for clarity

tests are now grouped into simple test suites of files. Testrunner marks the execution of each test.

graphs had false labels in test

post merge now correctly discrads builtin references, which can be resolved by the merged env and hold no other user defined vars -> it is a placeholder

- new builtin redef monitoring - fallback to sequential analysis - slightly improved linking stage - new test cases - split tests into new files - new test suites

- restructured tests into seperate suites - added new test cases

- updated tests to pass - current failing tests are marked

jang1er and others added 30 commits November 27, 2025 21:45

dep(dataflow): added tinypool as dependency #2043

2c572d0

feat-fix(parallel-dataflow): fixed lintr issues

f9a77ea

refactor: first draft at worker

ed5076c

dep(parallel-dataflow): switched to piscina as threadpool #2043

2114c0b

refactor(parallel-dataflow): fixed all linter issues

0817ac4

feat(parallel-dataflow): dataflow merging in own function

3999d07

- extractor.ts: aggregates all dataflow information and the merges them back together via reduction - built-in-source.ts: dataflow merging is now in seperate function

feat(parallel-dataflow): implemented feature flags #2054

e98ea89

- feature-def.ts: contains all features and their default value - feature-manager.ts: exposes functionality for setting and checking the feature flags

dep(dataflow): added tinypool as dependency #2043

80ac00e

feat-fix(parallel-dataflow): fixed lintr issues

5944839

refactor: first draft at worker

63578ab

dep(parallel-dataflow): switched to piscina as threadpool #2043

f65e0ae

refactor(parallel-dataflow): fixed all linter issues

eebfd91

feat(parallel-dataflow): dataflow merging in own function

2e4799a

- extractor.ts: aggregates all dataflow information and the merges them back together via reduction - built-in-source.ts: dataflow merging is now in seperate function

feat(parallel-dataflow): implemented feature flags #2054

056efe4

- feature-def.ts: contains all features and their default value - feature-manager.ts: exposes functionality for setting and checking the feature flags

Merge branch '2042-file-level-parallelization-of-the-dataflow-analysi…

65235d4

…s' of github.com:flowr-analysis/flowr into 2042-file-level-parallelization-of-the-dataflow-analysis

feat-fix(parallel-dataflow): updated threadpool

c00d056

- fix for broken worker path - fixed subtask resolution - allowed deferred dataflow merge

refactor(parallel-dataflow): linter fixes

d219eeb

feat(parallel-dataflow): clonable data draft

544686c

- definitions for clonable dataflow data - example usage of clonable data

refactor(parallel-dataflow): linter fixes

fad9c6d

feat-fix(parallel-dataflow): fixed worker loading

327755b

- use workerWrapper to load and register ts-node for the worker file

feat(parallel-dataflow): init promise for workers

18c1ec0

threadpool now waits for worker initialization to conclude

refactor(parallel-dataflow): feature manager now own class

5dacc81

feature manager is now a class and not an global object

feat(parallel-dataflow): featureFlags for analyzer

63b426f

integrated the feature manager into the flowrAnalyzerBuilder and flowrAnalyzer.

EagleoutIce force-pushed the main branch from 4949ae7 to 9940ae8 Compare January 15, 2026 19:47

feat(parallel-dataflow): included wtfnode as dev-dep

ca54fc3

EagleoutIce force-pushed the main branch from ca40fb5 to 3ec39cb Compare January 18, 2026 11:29

jang1er added 27 commits January 19, 2026 23:43

feat(parallel-dataflow): implemented debug tasks

09dd818

refactor(parallel-dataflow): added option to keep parser alive

070554f

refactor(parallel-dataflow): close analyzer after each tests

2481c60

close the analyzer after each test, but without shutting down the parser backend

feat(parallel-dataflow): clean up and stats collection

904019c

worker tracks internal state and returns the data with each successfull task -> best effort workerpool tracks own internal state and worker state with best effort approach

feat(parallel-dataflow): utilities for leak detection

ed381b8

feat(parallel-dataflow): simple tests for functionality

43ae4c4

workerpool: tests for any kind of memory or message port leak -> will certainly break if run in parallel parallel-dataflow: tests if the output is equivalent to sequential analysis

deps(parallel-dataflow): added serialize javascript as dependency

57852a3

deps(audit): applied audit fixes

50873e5

deps(parallel-dataflow): missing package added

7085f37

feat(tests): new tests for parallel-dataflow

73fbf38

refactor(parallel-dataflow): bundled messages and helpers

75f4907

refactor(parallel-dataflow): incomplete merging and env serialization

4b4798b

feat(parallel-dataflow): dataflow graph can now correctly be serialized

441846f

refactor(parallel-dataflow): moved message definition

bca3836

any pool messages and type checking helpers were moved to own file for clarity

refactor(parallel-dataflow): new tests for parallel dataflow

447fe45

tests are now grouped into simple test suites of files. Testrunner marks the execution of each test.

refactor(parallel-dataflow): switched labels in test

5c6a915

graphs had false labels in test

refactor(parallel-dataflow): replace placeholder builtins

a108bf3

post merge now correctly discrads builtin references, which can be resolved by the merged env and hold no other user defined vars -> it is a placeholder

refactor(parallel-dataflow): linter fixes

dc3ede1

refactor(parallel-files): expected error is handled

1c4e902

refactor(parallel-files): better test execution

0796359

feat(parallel-files): improved serialization tests

7f5b2ea

refactor(parallel-files): small updates

3302de8

refactor(parallel-files): updated and fixed linking

8b3307a

refactor(workerpool): removed unecessary type

fe70ed2

feat(parallel-files): new incomplete state

cc5071b

- new builtin redef monitoring - fallback to sequential analysis - slightly improved linking stage - new test cases - split tests into new files - new test suites

feat(parallel-analysis): updated df tests

0007c71

- restructured tests into seperate suites - added new test cases

refactor(parallel-files): final draft

82cf10c

- updated tests to pass - current failing tests are marked

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file level parallelization of the dataflow analysis#2088

file level parallelization of the dataflow analysis#2088
EagleoutIce wants to merge 74 commits intomainfrom
2042-file-level-parallelization-of-the-dataflow-analysis

EagleoutIce commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

EagleoutIce commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants