Simplify query for runs to export for samples and data during folder export by XingY · Pull Request #7448 · LabKey/platform

XingY · 2026-02-23T22:18:01Z

Rationale

DataClassFolderWriter.write() and SampleTypeFolderWriter.write() follow the problematic pattern that has the potential of large memory usage.

Pull all ExpData or ExpMaterial into memory
Pull all ExpRun that reference the data or materials into memory
Filter them based run protocol and other criteria in Java
Add the RowIds for the filtered runs to the export

This PR simplifies the process by getting rid of the intermediate steps to query for all materials and runs, and instead query for run.rowId directly based on specified criteria.

Related Pull Requests

Changes

Copilot

Pull request overview

This pull request refactors the folder export logic for sample types and data classes to reduce memory usage by replacing in-memory filtering with direct SQL queries. Instead of loading all ExpData, ExpMaterial, and ExpRun objects into memory and filtering them in Java, the new implementation queries for run IDs directly from the database with appropriate filters.

Changes:

Replaced memory-intensive object loading and Java-based filtering with SQL-based queries for determining which runs to export
Added two new methods to ExperimentService API: getDerivationRunIdsForDataClassExport and getDerivationRunIdsForSampleTypesExport
Removed intermediate data structures and helper methods (isValidRunType) that were used for in-memory filtering

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
experiment/src/org/labkey/experiment/samples/SampleTypeFolderWriter.java	Simplified run selection by replacing material loading and filtering with direct SQL query; removed `isValidRunType` method
experiment/src/org/labkey/experiment/samples/DataClassFolderWriter.java	Simplified run selection by replacing data loading and filtering with direct SQL query; removed `isValidRunType` method
experiment/src/org/labkey/experiment/api/ExperimentServiceImpl.java	Implemented two new SQL-based query methods that return filtered run IDs for folder export
api/src/org/labkey/api/exp/api/ExperimentService.java	Added interface declarations for the two new query methods

Comments suppressed due to low confidence (3)

experiment/src/org/labkey/experiment/samples/SampleTypeFolderWriter.java:106

The comment states "only want the sample derivation runs" but the implementation also includes aliquot runs (SAMPLE_ALIQUOT_PROTOCOL_LSID). Consider updating the comment to clarify that both derivation and aliquot protocol runs are included.

        // only want the sample derivation runs; other runs will get included in the experiment xar.

api/src/org/labkey/api/exp/api/ExperimentService.java:811

The JavaDoc comment could be more detailed. Consider expanding it to explain: (1) what "derivation run IDs" means in this context, (2) that it includes runs where materials from the specified sample types are used as either inputs or outputs, (3) the behavior of the includeRunsWithDataIO parameter more explicitly (when true, includes all derivation and aliquot runs; when false, includes aliquot runs and only derivation runs without data inputs/outputs).

    /** Get derivation/aliquot run IDs for sample types — filtered by protocol and optionally excluding runs with data inputs/outputs */
    List<Long> getDerivationRunIdsForSampleTypesExport(Collection<String> sampleTypeLsids, Container c, boolean includeRunsWithDataIO);

api/src/org/labkey/api/exp/api/ExperimentService.java:808

The JavaDoc comment could be more detailed. Consider expanding it to explain: (1) that it returns run IDs where the specified data class is used as either input or output, (2) that it only includes runs with the SAMPLE_DERIVATION_PROTOCOL that have no material inputs or outputs (since those are handled by the sample type writer).

    /** Get derivation run IDs for a data class — runs with SAMPLE_DERIVATION_PROTOCOL that have no material inputs/outputs */
    List<Long> getDerivationRunIdsForDataClassExport(long dataClassRowId);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

labkey-nicka · 2026-02-27T22:14:37Z

experiment/src/org/labkey/experiment/samples/DataClassFolderWriter.java

-                })
-                    .collect(Collectors.toSet())
-                    .stream().map(ExpObject::getRowId).toList();
+                List<Long> exportedRunIds = ExperimentService.get().getDerivationRunIdsForDataClassExport(dataClass.getRowId());


nit: I realize this is a one-for-one replacement of the current logic but we could consider doing a coalesced query here that takes a collection of data class rowIds and produces the result with a single query. Much like the getDerivationRunIdsForSampleTypesExport implementation. Something like this:

List<Long> dataClassRowIdsForExport = new ArrayList<>(); for (ExpDataClass dataClass : ExperimentService.get().getDataClasses(c, false)) { // ignore data classes that are filtered out if (EXCLUDED_TYPES.contains(dataClass.getName())) continue; dataClasses.add(dataClass); typesSelection.addDataClass(dataClass); exportTypes = true; if (exportDataClassData) dataClassRowIdsForExport.add(dataClass.getRowId()); } if (!dataClassRowIdsForExport.isEmpty()) { // get the list of derivation runs for these data classes — only sample derivation protocol runs // with no material inputs/outputs (those are handled by the sample type writer) // Sample derivation protocols involving data classes can be either to/from another data // class or also to/from a sample type. If it's the latter, we will let the sample writer handle it // since on import, data classes run before sample types. List<Long> exportedRunIds = ExperimentService.get().getDerivationRunIdsForDataClassExport(dataClassRowIdsForExport); if (!exportedRunIds.isEmpty()) { runsSelection.addRunIds(exportedRunIds); exportRuns = true; } }

The query could then be:

// Note: I did not test this public List<Long> getDerivationRunIdsForDataClassExport(@NotNull Collection<Long> dataClassRowIds) { if (dataClassRowIds.isEmpty()) return List.of(); SQLFragment inClause = new SQLFragment().appendInClause(dataClassRowIds, getExpSchema().getSqlDialect()); SQLFragment sql = new SQLFragment("SELECT DISTINCT er.RowId FROM ") .append(getTinfoExperimentRun(), "er") .append(" WHERE er.ProtocolLSID = ? ").add(SAMPLE_DERIVATION_PROTOCOL_LSID); sql.append(""" AND EXISTS ( SELECT 1 FROM exp.ProtocolApplication pa LEFT JOIN exp.DataInput di ON di.TargetApplicationId = pa.RowId LEFT JOIN exp.Data d1 ON di.DataId = d1.RowId AND d1.classId\s """); sql.append(inClause); sql.append(" LEFT JOIN exp.Data d2 ON d2.SourceApplicationId = pa.RowId AND d2.classId "); sql.append(inClause); sql.append(""" WHERE pa.RunId = er.RowId AND (d1.RowId IS NOT NULL OR d2.RowId IS NOT NULL) ) AND NOT EXISTS ( SELECT 1 FROM exp.ProtocolApplication pa_m LEFT JOIN exp.MaterialInput mi ON mi.TargetApplicationId = pa_m.RowId LEFT JOIN exp.Material m ON m.SourceApplicationId = pa_m.RowId WHERE pa_m.RunId = er.RowId AND (mi.RowId IS NOT NULL OR m.RowId IS NOT NULL) ) ORDER BY er.RowId """); return new SqlSelector(getExpSchema(), sql).getArrayList(Long.class); }

Good point. But given we didn't adopt code review then manual testing workflow on this PR, I'll leave the code as currently is...

Simplify query for runs to export for samples and data

c162613

XingY requested a review from Copilot February 23, 2026 22:18

Copilot started reviewing on behalf of XingY February 23, 2026 22:18 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

XingY and others added 2 commits February 25, 2026 15:22

Merge branch 'develop' into fb_folderExportPerf

a857606

Merge remote-tracking branch 'origin/develop' into fb_folderExportPerf

0a463be

XingY requested review from labkey-adam and labkey-nicka February 27, 2026 20:21

labkey-nicka assigned XingY Feb 27, 2026

labkey-nicka approved these changes Feb 27, 2026

View reviewed changes

XingY merged commit f5a9fd7 into develop Feb 27, 2026
12 checks passed

XingY deleted the fb_folderExportPerf branch February 27, 2026 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify query for runs to export for samples and data during folder export#7448

Simplify query for runs to export for samples and data during folder export#7448
XingY merged 3 commits intodevelopfrom
fb_folderExportPerf

XingY commented Feb 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

labkey-nicka Feb 27, 2026

Uh oh!

XingY Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

XingY commented Feb 23, 2026

Rationale

Related Pull Requests

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

labkey-nicka Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

XingY Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants