Rules/gci24 avoid unlimited sql request#146
Rules/gci24 avoid unlimited sql request#146fstormacq wants to merge 3 commits intogreen-code-initiative:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces the new Python rule GCI24 – Avoid Unlimited SQL Requests to detect SELECT ... FROM ... queries missing a LIMIT, and wires it into the rule repository, quality profile, and test suites.
Changes:
- Added a new SQL-pattern-based check for unbounded
SELECTqueries (GCI24) and registered it in the Python rules repository/profile. - Introduced a shared
AbstractSQLPatternCheckbase and refactoredAvoidFullSQLRequest(GCI74) to use it. - Added unit + integration test coverage and fixtures for the new rule, plus a changelog entry.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/org/greencodeinitiative/creedengo/python/checks/AvoidUnlimitedSQLRequest.java | Implements the new GCI24 rule using a regex pattern. |
| src/main/java/org/greencodeinitiative/creedengo/python/checks/AbstractSQLPatternCheck.java | Adds a reusable base for regex-based SQL string checks. |
| src/main/java/org/greencodeinitiative/creedengo/python/checks/AvoidFullSQLRequest.java | Refactors GCI74 to reuse the new base class. |
| src/main/java/org/greencodeinitiative/creedengo/python/PythonRuleRepository.java | Registers the new check class so Sonar can load it. |
| src/main/resources/org/greencodeinitiative/creedengo/python/creedengo_way_profile.json | Enables GCI24 in the “creedengo way” profile. |
| src/test/java/org/greencodeinitiative/creedengo/python/checks/AvoidUnlimitedSQLRequestTest.java | Adds a unit test verifying GCI24 on a fixture file. |
| src/test/resources/checks/avoidUnlimitedSQLRequest.py | Adds unit-test fixture with compliant/noncompliant SQL strings. |
| src/it/test-projects/creedengo-python-plugin-test-project/src/avoidUnlimitedSQLRequest.py | Adds IT test-project fixture for end-to-end validation. |
| src/it/java/org/greencodeinitiative/creedengo/python/integration/tests/GCIRulesIT.java | Adds an integration test asserting GCI24 issues and metadata. |
| CHANGELOG.md | Documents the addition of rule GCI24. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @Rule(key = "GCI24") | ||
| public class AvoidUnlimitedSQLRequest extends AbstractSQLPatternCheck { | ||
|
|
||
| private static final String MESSAGE_RULE = "Don't use the query SELECT _ FROM _ WHERE _ without a limit"; |
There was a problem hiding this comment.
The rule message mentions SELECT _ FROM _ WHERE _ but the regex only checks for select followed by from without LIMIT (it doesn't require a WHERE). This can lead to misleading issue messages (e.g., it will also flag SELECT col FROM t with no WHERE). Align the message with the actual detection logic, or update the pattern to enforce the intended WHERE constraint.
| private static final String MESSAGE_RULE = "Don't use the query SELECT _ FROM _ WHERE _ without a limit"; | |
| private static final String MESSAGE_RULE = "Don't use a SELECT _ FROM _ query without a limit"; |
There was a problem hiding this comment.
please consider Copilot feedback.
| @Test | ||
| public void test() { | ||
| PythonCheckVerifier.verify("src/test/resources/checks/avoidUnlimitedSQLRequest.py", new AvoidUnlimitedSQLRequest()); | ||
| } |
There was a problem hiding this comment.
The unit test only exercises single-line SQL strings; however, SQL in Python is very commonly written as a multi-line triple-quoted string. Please add at least one test case using a multi-line query (with and without a LIMIT) to ensure the rule behaves correctly in that common format (and adjust the regex if needed to match across newlines).
There was a problem hiding this comment.
please apply copilot suggestion in test file for Unit Test and also for Integration Test. maybe we can also consider it for the other rule (AvoidFullSQLRequest)
| private boolean lineAlreadyHasThisIssue(StringElement stringElement, SubscriptionContext ctx) { | ||
| final String classname = ctx.pythonFile().fileName(); | ||
| final int line = stringElement.firstToken() != null ? stringElement.firstToken().line() : -1; | ||
| return linesWithIssuesByFile.getOrDefault(classname, new ArrayList<>()).contains(line); |
There was a problem hiding this comment.
lineAlreadyHasThisIssue allocates a new ArrayList on every call for files that don't have an entry yet (getOrDefault(classname, new ArrayList<>())). This creates unnecessary garbage during analysis. Consider using Collections.emptyList() (or Collections.emptySet() if switching to a set) and only allocating when actually adding the first line for a file.
There was a problem hiding this comment.
please, take into account Copilot suggestion. Regarding previous version of code, your optimization seems to be a good idea, but finally it implies the Copilot observation (i.e. unnecessary ArrayList allocation if line is -1). PLease recheck previous code inside "report" and "lineAlreadyHasThisIssue" functions
| private final Map<String, Collection<Integer>> linesWithIssuesByFile = new HashMap<>(); | ||
|
|
There was a problem hiding this comment.
linesWithIssuesByFile is used only for fast membership checks, but it stores line numbers in a Collection (currently an ArrayList), making contains O(n) and potentially slow if a file has many findings. Using a Set<Integer> would keep lookups O(1) and simplify the intent.
| display_message(' sElEcT user fRoM myTable WhErE id > 0') # Noncompliant | ||
| display_message(' sElEcT user fRoM myTable WhErE id > 0 LiMiT 10') | ||
|
|
||
| requestNonCompiliant = ' SeLeCt user FrOm myTable WhErE id > 0' # Noncompliant |
There was a problem hiding this comment.
In the IT test project, the # Noncompliant markers here don't include the expected message payload ({{...}}), while similar IT fixtures do (e.g., src/it/test-projects/.../avoidFullSQLRequest.py). For consistency/readability (and easier debugging when ITs fail), consider including the rule message in the Noncompliant markers.
| display_message(' sElEcT user fRoM myTable WhErE id > 0') # Noncompliant | |
| display_message(' sElEcT user fRoM myTable WhErE id > 0 LiMiT 10') | |
| requestNonCompiliant = ' SeLeCt user FrOm myTable WhErE id > 0' # Noncompliant | |
| display_message(' sElEcT user fRoM myTable WhErE id > 0') # Noncompliant {{Add a LIMIT clause to this SQL query.}} | |
| display_message(' sElEcT user fRoM myTable WhErE id > 0 LiMiT 10') | |
| requestNonCompiliant = ' SeLeCt user FrOm myTable WhErE id > 0' # Noncompliant {{Add a LIMIT clause to this SQL query.}} |
There was a problem hiding this comment.
please apply Copilot suggestion but with the real rule message, i.e. "Don't use the query SELECT _ FROM _ WHERE _ without a limit" as you did in test file for Unit Test
|
|
||
| private static final String MESSAGE_RULE = "Don't use the query SELECT _ FROM _ WHERE _ without a limit"; | ||
|
|
||
| private static final Pattern PATTERN = Pattern.compile("(?i).*select.*from(?!.*\\blimit\\b).*"); |
There was a problem hiding this comment.
The regex matches select/from as plain substrings, so it can produce false positives on identifiers containing those sequences (e.g., fromage, selection, etc.). Consider using word boundaries for the SQL keywords (e.g., \bselect\b and \bfrom\b) to reduce accidental matches.
| private static final Pattern PATTERN = Pattern.compile("(?i).*select.*from(?!.*\\blimit\\b).*"); | |
| private static final Pattern PATTERN = Pattern.compile("(?i).*\\bselect\\b.*\\bfrom\\b(?!.*\\blimit\\b).*"); |
There was a problem hiding this comment.
please consider Copilot feedback. maybe we can also consider it for the other rule (AvoidFullSQLRequest)
| display_message(' sElEcT user fRoM myTable WhErE id > 0') # Noncompliant | ||
| display_message(' sElEcT user fRoM myTable WhErE id > 0 LiMiT 10') | ||
|
|
||
| requestNonCompiliant = ' SeLeCt user FrOm myTable WhErE id > 0' # Noncompliant |
There was a problem hiding this comment.
please apply Copilot suggestion but with the real rule message, i.e. "Don't use the query SELECT _ FROM _ WHERE _ without a limit" as you did in test file for Unit Test
| @Test | ||
| public void test() { | ||
| PythonCheckVerifier.verify("src/test/resources/checks/avoidUnlimitedSQLRequest.py", new AvoidUnlimitedSQLRequest()); | ||
| } |
There was a problem hiding this comment.
please apply copilot suggestion in test file for Unit Test and also for Integration Test. maybe we can also consider it for the other rule (AvoidFullSQLRequest)
| private boolean lineAlreadyHasThisIssue(StringElement stringElement, SubscriptionContext ctx) { | ||
| final String classname = ctx.pythonFile().fileName(); | ||
| final int line = stringElement.firstToken() != null ? stringElement.firstToken().line() : -1; | ||
| return linesWithIssuesByFile.getOrDefault(classname, new ArrayList<>()).contains(line); |
There was a problem hiding this comment.
please, take into account Copilot suggestion. Regarding previous version of code, your optimization seems to be a good idea, but finally it implies the Copilot observation (i.e. unnecessary ArrayList allocation if line is -1). PLease recheck previous code inside "report" and "lineAlreadyHasThisIssue" functions
| @Rule(key = "GCI24") | ||
| public class AvoidUnlimitedSQLRequest extends AbstractSQLPatternCheck { | ||
|
|
||
| private static final String MESSAGE_RULE = "Don't use the query SELECT _ FROM _ WHERE _ without a limit"; |
There was a problem hiding this comment.
please consider Copilot feedback.
|
|
||
| private static final String MESSAGE_RULE = "Don't use the query SELECT _ FROM _ WHERE _ without a limit"; | ||
|
|
||
| private static final Pattern PATTERN = Pattern.compile("(?i).*select.*from(?!.*\\blimit\\b).*"); |
There was a problem hiding this comment.
please consider Copilot feedback. maybe we can also consider it for the other rule (AvoidFullSQLRequest)
|
Hi @fstormacq, |
c4e6849 to
a0a1eb2
Compare
…s tests and checks Co-authored-by: Copilot <copilot@github.com>
|
Hi @dedece35 , Thank you for the review! Here is a summary of the changes I have applied:
Bonus
Let me know if anything needs further clarification! |
Summary
This Pull Request provides empirical energy measurements to justify the implementation of the rule GCI24 - Avoid Unlimited SQL Requests for Python. The developed rule targets SQL queries executed without a
LIMITclause, which forces the database engine to fetch and transfer entire result sets regardless of actual usage, leading to unnecessary CPU and DRAM consumption.Motivation
As part of my UNamur master's degree, I have to open a pull request on the creedengo repository and justify the developed rule with empirical and scientific methods.
Energy Measurements
Measurements were performed on Apple Silicon (mac) using fstormacq/energyTracer, a cross-platform tool that captures fine-grained per-component energy samples (CPU, GPU, ANE/CO2 eq., DRAM) for a given process.
Code Under Test
The code under test for this experiment was the following:
with the following SQLite database: https://www.kaggle.com/datasets/kaggle/sf-salaries
The variant without the code smell was using the following query:
Plots
Comparisons
Box plots
Violins
Analysis
LIMIT)cpu_mjgpu_mjane_mjdram_mjKey Takeaways
Dataset
The raw measurements and plots are publicly available on Zenodo:
How to Reproduce
Measurements can be reproduced using fstormacq/energyTracer on any supported platform.