Skip to content

Rules/gci24 avoid unlimited sql request#146

Open
fstormacq wants to merge 3 commits intogreen-code-initiative:mainfrom
snail-unamur:rules/GCI24-avoid-unlimited-SQL-request
Open

Rules/gci24 avoid unlimited sql request#146
fstormacq wants to merge 3 commits intogreen-code-initiative:mainfrom
snail-unamur:rules/GCI24-avoid-unlimited-SQL-request

Conversation

@fstormacq
Copy link
Copy Markdown

Summary

This Pull Request provides empirical energy measurements to justify the implementation of the rule GCI24 - Avoid Unlimited SQL Requests for Python. The developed rule targets SQL queries executed without a LIMIT clause, which forces the database engine to fetch and transfer entire result sets regardless of actual usage, leading to unnecessary CPU and DRAM consumption.

Motivation

As part of my UNamur master's degree, I have to open a pull request on the creedengo repository and justify the developed rule with empirical and scientific methods.

Energy Measurements

Measurements were performed on Apple Silicon (mac) using fstormacq/energyTracer, a cross-platform tool that captures fine-grained per-component energy samples (CPU, GPU, ANE/CO2 eq., DRAM) for a given process.

Code Under Test

The code under test for this experiment was the following:

def main():
    import sqlite3

    try:
        connection = sqlite3.connect('src/python/data/database.sqlite')
        cursor = connection.cursor()

        REQUEST = """
            SELECT EmployeeName, JobTitle, Benefits, TotalPayBenefits, Year
            FROM Salaries 
            WHERE BasePay >= 5000
            """

        cursor.execute(REQUEST)
        _ = cursor.fetchall()
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        if 'connection' in locals():
            connection.close()

if __name__ == "__main__":
    main()

with the following SQLite database: https://www.kaggle.com/datasets/kaggle/sf-salaries

The variant without the code smell was using the following query:

REQUEST = """
            SELECT  EmployeeName, JobTitle, Benefits, TotalPayBenefits, Year
            FROM Salaries 
            WHERE BasePay >= 5000
            LIMIT 10_000
            """

Plots

Comparisons

CPU energy consumption RAM energy consumption
cpu_energy_comparison dram_energy_comparison

Box plots

CPU box plot RAM box plot
cpu_moustache dram_moustache

Violins

CPU violins plot RAM violins plot
cpu_violin dram_violin

Analysis

  • 29,545 samples collected with the code smell (unbounded query)
  • 28,704 samples collected without the code smell (query with LIMIT)
  • Significance level: α = 0.05
Metric Δ mean p-value Cohen's d Effect Significant
cpu_mj +91.83% 0.00e+00 +5.236 large
gpu_mj N/A 0.00e+00 −0.045 negligible
ane_mj +94.18% 2.03e-113 +0.189 negligible
dram_mj +93.70% 0.00e+00 +9.383 large

Δ mean = (mean_with − mean_without) / mean_with × 100. A positive value means the smell consumes more energy.

Key Takeaways

  • CPU energy is ~92% higher with the smell - Cohen's d = 5.236 (large effect), indicating the processor works significantly harder to process unbounded rows.
  • DRAM energy is ~94% higher with the smell - Cohen's d = 9.383 (large effect), reflecting the memory pressure of loading entire result sets.
  • Both effects are statistically significant with p-values at or near machine epsilon, across nearly 30,000 samples per condition, making these results highly reliable.
  • GPU and ANE impacts are negligible, confirming the bottleneck is computation and memory bandwidth, not graphics or neural acceleration.

Dataset

The raw measurements and plots are publicly available on Zenodo:

DOI

How to Reproduce

Measurements can be reproduced using fstormacq/energyTracer on any supported platform.

⚠️ Results may vary across different platforms, hardware, and devices.

@olegoaer olegoaer added the 🗃️ rule rule improvment or rule development or bug label Apr 14, 2026
@dedece35 dedece35 requested a review from Copilot April 14, 2026 19:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the new Python rule GCI24 – Avoid Unlimited SQL Requests to detect SELECT ... FROM ... queries missing a LIMIT, and wires it into the rule repository, quality profile, and test suites.

Changes:

  • Added a new SQL-pattern-based check for unbounded SELECT queries (GCI24) and registered it in the Python rules repository/profile.
  • Introduced a shared AbstractSQLPatternCheck base and refactored AvoidFullSQLRequest (GCI74) to use it.
  • Added unit + integration test coverage and fixtures for the new rule, plus a changelog entry.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/main/java/org/greencodeinitiative/creedengo/python/checks/AvoidUnlimitedSQLRequest.java Implements the new GCI24 rule using a regex pattern.
src/main/java/org/greencodeinitiative/creedengo/python/checks/AbstractSQLPatternCheck.java Adds a reusable base for regex-based SQL string checks.
src/main/java/org/greencodeinitiative/creedengo/python/checks/AvoidFullSQLRequest.java Refactors GCI74 to reuse the new base class.
src/main/java/org/greencodeinitiative/creedengo/python/PythonRuleRepository.java Registers the new check class so Sonar can load it.
src/main/resources/org/greencodeinitiative/creedengo/python/creedengo_way_profile.json Enables GCI24 in the “creedengo way” profile.
src/test/java/org/greencodeinitiative/creedengo/python/checks/AvoidUnlimitedSQLRequestTest.java Adds a unit test verifying GCI24 on a fixture file.
src/test/resources/checks/avoidUnlimitedSQLRequest.py Adds unit-test fixture with compliant/noncompliant SQL strings.
src/it/test-projects/creedengo-python-plugin-test-project/src/avoidUnlimitedSQLRequest.py Adds IT test-project fixture for end-to-end validation.
src/it/java/org/greencodeinitiative/creedengo/python/integration/tests/GCIRulesIT.java Adds an integration test asserting GCI24 issues and metadata.
CHANGELOG.md Documents the addition of rule GCI24.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Rule(key = "GCI24")
public class AvoidUnlimitedSQLRequest extends AbstractSQLPatternCheck {

private static final String MESSAGE_RULE = "Don't use the query SELECT _ FROM _ WHERE _ without a limit";
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule message mentions SELECT _ FROM _ WHERE _ but the regex only checks for select followed by from without LIMIT (it doesn't require a WHERE). This can lead to misleading issue messages (e.g., it will also flag SELECT col FROM t with no WHERE). Align the message with the actual detection logic, or update the pattern to enforce the intended WHERE constraint.

Suggested change
private static final String MESSAGE_RULE = "Don't use the query SELECT _ FROM _ WHERE _ without a limit";
private static final String MESSAGE_RULE = "Don't use a SELECT _ FROM _ query without a limit";

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider Copilot feedback.

Comment on lines +25 to +28
@Test
public void test() {
PythonCheckVerifier.verify("src/test/resources/checks/avoidUnlimitedSQLRequest.py", new AvoidUnlimitedSQLRequest());
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test only exercises single-line SQL strings; however, SQL in Python is very commonly written as a multi-line triple-quoted string. Please add at least one test case using a multi-line query (with and without a LIMIT) to ensure the rule behaves correctly in that common format (and adjust the regex if needed to match across newlines).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please apply copilot suggestion in test file for Unit Test and also for Integration Test. maybe we can also consider it for the other rule (AvoidFullSQLRequest)

Comment on lines +68 to +71
private boolean lineAlreadyHasThisIssue(StringElement stringElement, SubscriptionContext ctx) {
final String classname = ctx.pythonFile().fileName();
final int line = stringElement.firstToken() != null ? stringElement.firstToken().line() : -1;
return linesWithIssuesByFile.getOrDefault(classname, new ArrayList<>()).contains(line);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lineAlreadyHasThisIssue allocates a new ArrayList on every call for files that don't have an entry yet (getOrDefault(classname, new ArrayList<>())). This creates unnecessary garbage during analysis. Consider using Collections.emptyList() (or Collections.emptySet() if switching to a set) and only allocating when actually adding the first line for a file.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, take into account Copilot suggestion. Regarding previous version of code, your optimization seems to be a good idea, but finally it implies the Copilot observation (i.e. unnecessary ArrayList allocation if line is -1). PLease recheck previous code inside "report" and "lineAlreadyHasThisIssue" functions

Comment on lines +35 to +36
private final Map<String, Collection<Integer>> linesWithIssuesByFile = new HashMap<>();

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linesWithIssuesByFile is used only for fast membership checks, but it stores line numbers in a Collection (currently an ArrayList), making contains O(n) and potentially slow if a file has many findings. Using a Set<Integer> would keep lookups O(1) and simplify the intent.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +7
display_message(' sElEcT user fRoM myTable WhErE id > 0') # Noncompliant
display_message(' sElEcT user fRoM myTable WhErE id > 0 LiMiT 10')

requestNonCompiliant = ' SeLeCt user FrOm myTable WhErE id > 0' # Noncompliant
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the IT test project, the # Noncompliant markers here don't include the expected message payload ({{...}}), while similar IT fixtures do (e.g., src/it/test-projects/.../avoidFullSQLRequest.py). For consistency/readability (and easier debugging when ITs fail), consider including the rule message in the Noncompliant markers.

Suggested change
display_message(' sElEcT user fRoM myTable WhErE id > 0') # Noncompliant
display_message(' sElEcT user fRoM myTable WhErE id > 0 LiMiT 10')
requestNonCompiliant = ' SeLeCt user FrOm myTable WhErE id > 0' # Noncompliant
display_message(' sElEcT user fRoM myTable WhErE id > 0') # Noncompliant {{Add a LIMIT clause to this SQL query.}}
display_message(' sElEcT user fRoM myTable WhErE id > 0 LiMiT 10')
requestNonCompiliant = ' SeLeCt user FrOm myTable WhErE id > 0' # Noncompliant {{Add a LIMIT clause to this SQL query.}}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please apply Copilot suggestion but with the real rule message, i.e. "Don't use the query SELECT _ FROM _ WHERE _ without a limit" as you did in test file for Unit Test


private static final String MESSAGE_RULE = "Don't use the query SELECT _ FROM _ WHERE _ without a limit";

private static final Pattern PATTERN = Pattern.compile("(?i).*select.*from(?!.*\\blimit\\b).*");
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex matches select/from as plain substrings, so it can produce false positives on identifiers containing those sequences (e.g., fromage, selection, etc.). Consider using word boundaries for the SQL keywords (e.g., \bselect\b and \bfrom\b) to reduce accidental matches.

Suggested change
private static final Pattern PATTERN = Pattern.compile("(?i).*select.*from(?!.*\\blimit\\b).*");
private static final Pattern PATTERN = Pattern.compile("(?i).*\\bselect\\b.*\\bfrom\\b(?!.*\\blimit\\b).*");

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider Copilot feedback. maybe we can also consider it for the other rule (AvoidFullSQLRequest)

Comment on lines +4 to +7
display_message(' sElEcT user fRoM myTable WhErE id > 0') # Noncompliant
display_message(' sElEcT user fRoM myTable WhErE id > 0 LiMiT 10')

requestNonCompiliant = ' SeLeCt user FrOm myTable WhErE id > 0' # Noncompliant
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please apply Copilot suggestion but with the real rule message, i.e. "Don't use the query SELECT _ FROM _ WHERE _ without a limit" as you did in test file for Unit Test

Comment on lines +25 to +28
@Test
public void test() {
PythonCheckVerifier.verify("src/test/resources/checks/avoidUnlimitedSQLRequest.py", new AvoidUnlimitedSQLRequest());
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please apply copilot suggestion in test file for Unit Test and also for Integration Test. maybe we can also consider it for the other rule (AvoidFullSQLRequest)

Comment on lines +68 to +71
private boolean lineAlreadyHasThisIssue(StringElement stringElement, SubscriptionContext ctx) {
final String classname = ctx.pythonFile().fileName();
final int line = stringElement.firstToken() != null ? stringElement.firstToken().line() : -1;
return linesWithIssuesByFile.getOrDefault(classname, new ArrayList<>()).contains(line);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, take into account Copilot suggestion. Regarding previous version of code, your optimization seems to be a good idea, but finally it implies the Copilot observation (i.e. unnecessary ArrayList allocation if line is -1). PLease recheck previous code inside "report" and "lineAlreadyHasThisIssue" functions

@Rule(key = "GCI24")
public class AvoidUnlimitedSQLRequest extends AbstractSQLPatternCheck {

private static final String MESSAGE_RULE = "Don't use the query SELECT _ FROM _ WHERE _ without a limit";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider Copilot feedback.


private static final String MESSAGE_RULE = "Don't use the query SELECT _ FROM _ WHERE _ without a limit";

private static final Pattern PATTERN = Pattern.compile("(?i).*select.*from(?!.*\\blimit\\b).*");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider Copilot feedback. maybe we can also consider it for the other rule (AvoidFullSQLRequest)

@dedece35
Copy link
Copy Markdown
Member

Hi @fstormacq,
thank you for your PR ... very good job !!!
Please take into account my review feedback.

@fstormacq fstormacq force-pushed the rules/GCI24-avoid-unlimited-SQL-request branch from c4e6849 to a0a1eb2 Compare May 5, 2026 09:03
…s tests and checks

Co-authored-by: Copilot <copilot@github.com>
@fstormacq
Copy link
Copy Markdown
Author

fstormacq commented May 5, 2026

Hi @dedece35 ,

Thank you for the review! Here is a summary of the changes I have applied:


AvoidUnlimitedSQLRequest.java

  • Update MESSAGE_RULE to align with actual detection logic (no WHERE requirement): "Don't use a SELECT _ FROM _ query without a limit"
  • Add word boundaries to the regex to avoid false positives on identifiers like "fromage" or "selection": Pattern.compile("(?i).*\\bselect\\b.*\\bfrom\\b(?!.*\\blimit\\b).*")

AbstractSQLPatternCheck.java

  • Change linesWithIssuesByFile type from Collection<Integer> to Set<Integer> for O(1) lookups instead of O(n).
  • Fix lineAlreadyHasThisIssue to avoid unnecessary ArrayList allocation on every call for files with no entry yet. Will use Collections.emptySet() and recheck the report() method accordingly.
  • Add multi-line triple-quoted SQL test cases (with and without LIMIT).

AvoidUnlimitedSQLRequestTest.java

  • Add unit test cases using multi-line SQL strings (with and without LIMIT).

avoidUnlimitedSQLRequest.py

  • Add the expected rule message to all # Noncompliant markers: # Noncompliant {{Don't use a SELECT _ FROM _ query without a limit}}

Bonus

  • Apply the same regex word boundary fix.
  • Add multi-line SQL test cases to the corresponding test file.

Let me know if anything needs further clarification!

@fstormacq fstormacq requested a review from dedece35 May 5, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants