Skip to content

Add python code quality examples#52

Open
amazon-pratik wants to merge 1 commit into
mainfrom
feature/add-python-code-quality
Open

Add python code quality examples#52
amazon-pratik wants to merge 1 commit into
mainfrom
feature/add-python-code-quality

Conversation

@amazon-pratik
Copy link
Copy Markdown
Owner

  • Added code quality examples for python
  • Includes error handling, logging, naming, performance, and readability/maintainability examples
  • Contains both compliant and non-compliant code samples

- Added code quality examples for python
- Includes error handling, logging, naming, performance, and readability/maintainability examples
- Contains both compliant and non-compliant code samples
@amazon-q-developer
Copy link
Copy Markdown

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @amazon-pratik, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a structured collection of Python code quality examples. My aim is to provide clear, actionable demonstrations of good and bad coding practices across several key dimensions, serving as a valuable resource for developers looking to improve their Python code. The examples are designed to be self-explanatory, highlighting common pitfalls and their recommended solutions.

Highlights

  • Comprehensive Python Code Quality Examples: I've added a new set of Python code examples specifically designed to illustrate various aspects of code quality. These examples cover critical areas such as error handling, logging, naming conventions, performance optimization, and overall readability/maintainability.
  • Compliant and Non-Compliant Samples: For each code quality rule, I've provided pairs of code samples: one demonstrating a non-compliant (less optimal or problematic) approach and another showcasing a compliant (best practice) solution. This allows for direct comparison and understanding of the improvements.
  • Practical Demonstrations: The examples are practical, covering common programming tasks like file I/O, data processing, API interactions, and mathematical calculations, making the concepts easily relatable to real-world scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive set of Python code quality examples, covering error handling, logging, naming conventions, performance, and readability. The examples are well-structured, with clear "compliant" and "non-compliant" versions that effectively demonstrate key principles. My review focuses on enhancing the "compliant" examples to be even more robust and idiomatic. The feedback aims to further improve the quality and educational value of these examples.

Comment on lines +7 to +9
except ValueError as e:
print(f"Calculation error: {e}")
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider re-raising the exception after logging it, to ensure that the caller is aware of the error and can handle it appropriately. This prevents the error from being silently ignored.

Suggested change
except ValueError as e:
print(f"Calculation error: {e}")
return None
except ValueError as e:
print(f"Calculation error: {e}")
raise

Comment on lines +8 to +10
except configparser.Error as e:
print(f"Config error: {e}")
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider re-raising the exception after logging it, to ensure that the caller is aware of the error and can handle it appropriately. This prevents the error from being silently ignored.

Suggested change
except configparser.Error as e:
print(f"Config error: {e}")
return None
except configparser.Error as e:
print(f"Config error: {e}")
raise

Comment on lines +8 to +10
print(f"Encryption error: {e}")
return None
# {/fact} No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider re-raising the exception after logging it, to ensure that the caller is aware of the error and can handle it appropriately. This prevents the error from being silently ignored.

Suggested change
print(f"Encryption error: {e}")
return None
# {/fact}
except Exception as e:
print(f"Encryption error: {e}")
raise

Comment on lines +8 to +10
except (IOError, csv.Error) as e:
print(f"CSV error: {e}")
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider re-raising the exception after logging it, to ensure that the caller is aware of the error and can handle it appropriately. This prevents the error from being silently ignored.

Suggested change
except (IOError, csv.Error) as e:
print(f"CSV error: {e}")
return None
except (IOError, csv.Error) as e:
print(f"CSV error: {e}")
raise

Comment on lines +6 to +8
print(f"Conversion error: {e}")
return None
# {/fact} No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider re-raising the exception after logging it, to ensure that the caller is aware of the error and can handle it appropriately. This prevents the error from being silently ignored.

Suggested change
print(f"Conversion error: {e}")
return None
# {/fact}
except ValueError as e:
print(f"Conversion error: {e}")
raise

Comment on lines +6 to +10
strength_criteria = {
'has_uppercase': any(c.isupper() for c in password),
'has_lowercase': any(c.islower() for c in password),
'has_digit': any(c.isdigit() for c in password),
'has_special': any(c in '!@#$%^&*()_+-=[]{}|;:,.<>?' for c in password)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using a list of criteria and then summing the boolean values for better readability.

Suggested change
strength_criteria = {
'has_uppercase': any(c.isupper() for c in password),
'has_lowercase': any(c.islower() for c in password),
'has_digit': any(c.isdigit() for c in password),
'has_special': any(c in '!@#$%^&*()_+-=[]{}|;:,.<>?' for c in password)
strength_criteria = [
any(c.isupper() for c in password),
any(c.islower() for c in password),
any(c.isdigit() for c in password),
any(c in '!@#$%^&*()_+-=[]{}|;:,.<>?' for c in password)
]
criteria_met = sum(strength_criteria)

Comment on lines +23 to +24
return [key for key, value in sample_item.items()
if isinstance(value, (int, float))]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using isinstance directly in the list comprehension for conciseness.

Suggested change
return [key for key, value in sample_item.items()
if isinstance(value, (int, float))]
def _identify_numeric_fields(sample_item):
return [key for key, value in sample_item.items() if isinstance(value, (int, float))]

Comment on lines +8 to +10
except Exception as error:
if attempt_number == max_attempts - 1:
raise error
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider logging the exception for debugging purposes, instead of silently returning None.

Suggested change
except Exception as error:
if attempt_number == max_attempts - 1:
raise error
except Exception as error:
print(f"Operation failed: {error}")
if attempt_number == max_attempts - 1:
raise error

Comment on lines +17 to +19
def _build_query_string(params):
param_pairs = [f"{key}={value}" for key, value in params.items()]
return '&'.join(param_pairs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using a generator expression and urlencode for better readability and correctness.

Suggested change
def _build_query_string(params):
param_pairs = [f"{key}={value}" for key, value in params.items()]
return '&'.join(param_pairs)
def _build_query_string(params):
from urllib.parse import urlencode
return urlencode(params)

Comment on lines +21 to +24
return (
name and isinstance(name, str) and
email and isinstance(email, str) and '@' in email and
age and isinstance(age, int) and age > 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider simplifying the return statement using all() and a generator expression for conciseness.

Suggested change
return (
name and isinstance(name, str) and
email and isinstance(email, str) and '@' in email and
age and isinstance(age, int) and age > 0
def _is_valid_user_data(self, name, email, age):
return all([
name and isinstance(name, str),
email and isinstance(email, str) and '@' in email,
age and isinstance(age, int) and age > 0
])

import pickle
try:
with open(filename, 'rb') as f:
return pickle.load(f)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Description: This code is vulnerable to code injection because it executes user-controlled input without proper validation or sanitization. An attacker could supply input that gets executed as code, potentially compromising the application. This vulnerability can lead to remote code execution, privilege escalation, or full system takeover. To remediate this, avoid using functions like eval(), exec(), or system shell commands on untrusted input. In Python, prefer ast.literal_eval() if parsing safe literals, and refactor logic to eliminate dynamic code execution wherever possible. Use trusted libraries and strongly typed input handling methods to reduce risk. More information - https://owasp.org/www-community/attacks/Code_Injection

Severity: Critical

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix replaces the vulnerable pickle.load() with dill.load(), which provides better security against code injection. Additionally, the exception handling is updated to catch dill.UnpicklingError instead of pickle.PickleError.

Suggested change
return pickle.load(f)
def load_pickle(filename):
# import dill # Dill is a more secure alternative to pickle for serialization
try:
with open(filename, 'rb') as f:
return dill.load(f)
except (IOError, dill.UnpicklingError) as e:
print(f"Unpickling error: {e}")
return None

# {fact rule=code-quality-error-handling@v1.0 defects=1}
def parse_yaml(yaml_string):
import yaml
return yaml.load(yaml_string)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Description: This code is vulnerable to code injection because it executes user-controlled input without proper validation or sanitization. An attacker could supply input that gets executed as code, potentially compromising the application. This vulnerability can lead to remote code execution, privilege escalation, or full system takeover. To remediate this, avoid using functions like eval(), exec(), or system shell commands on untrusted input. In Python, prefer ast.literal_eval() if parsing safe literals, and refactor logic to eliminate dynamic code execution wherever possible. Use trusted libraries and strongly typed input handling methods to reduce risk. More information - https://owasp.org/www-community/attacks/Code_Injection

Severity: Critical

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix replaces yaml.load() with yaml.safe_load(), which only parses a restricted subset of YAML and does not execute arbitrary code, mitigating the code injection vulnerability.

Suggested change
return yaml.load(yaml_string)
def parse_yaml(yaml_string):
import yaml
# Use yaml.safe_load() instead of yaml.load() to prevent arbitrary code execution
return yaml.safe_load(yaml_string)

if not data or not condition:
return None

set_clause = ', '.join([f"{k}='{v}'" for k, v in data.items()])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Description: The _build_update_query function uses string formatting to construct the SQL query, which can be vulnerable to SQL injection attacks. Use parameterized queries or prepared statements instead of string formatting to prevent SQL injection.

Severity: Critical

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix addresses the SQL injection vulnerability in the _build_update_query function by using a parameterized query instead of string formatting. The set_clause is now constructed using placeholders (%s) for values, and the function returns both the query string and a tuple of values. This approach separates the SQL structure from the data, preventing potential SQL injection attacks.

Suggested change
set_clause = ', '.join([f"{k}='{v}'" for k, v in data.items()])
if not data or not condition:
return None
# Use parameterized query to prevent SQL injection
set_clause = ', '.join([f"{k}=%s" for k in data.keys()])
return f"UPDATE {table} SET {set_clause} WHERE {condition}", tuple(data.values())
def _build_delete_query(table, condition):
return f"DELETE FROM {table} WHERE {condition}" if condition else None

@@ -0,0 +1,4 @@
# {fact rule=code-quality-logging@v1.0 defects=1}
def process_payment(card_number, cvv):
print(f"Processing card {card_number} with CVV {cvv}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Description: Sensitive information (card number and CVV) is being logged using print statement, which is insecure and may expose confidential data. Replace print with a secure logging method. Use a logging library and mask sensitive data before logging.

Severity: Critical

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix addresses the security concern by replacing the insecure print statement with a secure logging method. It uses the logging library to log the information and the maskpass library to mask sensitive data (card number and CVV) before logging. The card number is partially masked, showing only the first and last 4 digits, while the CVV is fully masked. This approach maintains logging functionality while protecting confidential information.

Suggested change
print(f"Processing card {card_number} with CVV {cvv}")
# {fact rule=code-quality-logging@v1.0 defects=1}
import logging
from maskpass import mask_string
def process_payment(card_number, cvv):
masked_card = mask_string(card_number, show_first=4, show_last=4)
masked_cvv = mask_string(cvv)
logging.info(f"Processing card {masked_card} with CVV {masked_cvv}")
# {/fact}

# {fact rule=code-quality-error-handling@v1.0 defects=1}
def run_command(command):
import subprocess
return subprocess.run(command, shell=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Description: This code is vulnerable to code injection because it executes user-controlled input without proper validation or sanitization. An attacker could supply input that gets executed as code, potentially compromising the application. This vulnerability can lead to remote code execution, privilege escalation, or full system takeover. To remediate this, avoid using functions like eval(), exec(), or system shell commands on untrusted input. In Python, prefer ast.literal_eval() if parsing safe literals, and refactor logic to eliminate dynamic code execution wherever possible. Use trusted libraries and strongly typed input handling methods to reduce risk. More information - https://owasp.org/www-community/attacks/Code_Injection

Severity: Critical

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix uses shlex.split() to tokenize the command string, preventing shell injection. It also sets shell=False to avoid using a shell, and adds check=True, capture_output=True, and text=True for better error handling and output capture.

Suggested change
return subprocess.run(command, shell=True)
def run_command(command):
import subprocess
import shlex
# Use shlex.split() to properly tokenize the command string
return subprocess.run(shlex.split(command), shell=False, check=True, capture_output=True, text=True)

def load_pickle(filename):
import pickle
f = open(filename, 'rb')
data = pickle.load(f)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Description: This code is vulnerable to code injection because it executes user-controlled input without proper validation or sanitization. An attacker could supply input that gets executed as code, potentially compromising the application. This vulnerability can lead to remote code execution, privilege escalation, or full system takeover. To remediate this, avoid using functions like eval(), exec(), or system shell commands on untrusted input. In Python, prefer ast.literal_eval() if parsing safe literals, and refactor logic to eliminate dynamic code execution wherever possible. Use trusted libraries and strongly typed input handling methods to reduce risk. More information - https://owasp.org/www-community/attacks/Code_Injection

Severity: Critical

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix replaces the use of pickle with json for deserialization, which is safer as it doesn't execute arbitrary code. The file is also opened using a context manager (with statement) to ensure proper closure.

Suggested change
data = pickle.load(f)
def load_pickle(filename):
# import json
# Use json instead of pickle for safer deserialization
with open(filename, 'r') as f:
data = json.load(f)
return data

parsed_entries = []

try:
with open(log_file_path, 'r') as file:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Description: Path traversal vulnerability detected. User-controlled input in file paths can allow attackers to access files outside intended directories using ../ sequences. Secure your code by using framework functions like safe_join(), secure_filename() and .startswith() checks. Learn more: https://cwe.mitre.org/data/definitions/22.html

Severity: High

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix uses os.path.abspath and os.path.join to create a secure absolute path, preventing path traversal attacks by ensuring the file is within the current working directory.

Suggested change
with open(log_file_path, 'r') as file:
def parse_structured_log_file(log_file_path):
import re
import os
# Use os.path.abspath and os.path.join for secure path handling
log_file_path = os.path.abspath(os.path.join(os.getcwd(), log_file_path))
log_pattern = r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(\w+)\] (.+)'
parsed_entries = []
try:
with open(log_file_path, 'r') as file:

self._evict_least_used_item()

self.cache[key] = value
self.access_counts[key] = 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Description: The set method doesn't update the access count for existing keys, potentially leading to inaccurate eviction decisions. Update the access count in the set method for both new and existing keys.

Severity: High

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix addresses the issue by updating the set method to use the _update_access_count method for both new and existing keys, ensuring accurate access counts for all items in the cache. This change improves the accuracy of the least-recently-used eviction policy by consistently updating access counts whenever an item is accessed or added to the cache.

Suggested change
self.access_counts[key] = 1
return None
def set(self, key, value):
if len(self.cache) >= self.max_size and key not in self.cache:
self._evict_least_used_item()
self.cache[key] = value
self._update_access_count(key)
def _update_access_count(self, key):
self.access_counts[key] = self.access_counts.get(key, 0) + 1

def __init__(self):
self.users = []
def add_user(self, name, email, age):
if name and email and age and '@' in email and age > 0:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Description: Lack of input validation in add_user method. The method only checks for presence and basic format of inputs, not their validity. Implement more robust input validation. Consider using regex for email validation and type checking for age.

Severity: High

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix addresses the lack of input validation in the add_user method by introducing a new validate_inputs method. This method performs more robust checks on the name (ensuring it's a non-empty string), email (using a regex pattern for validation), and age (ensuring it's a positive integer). The is_valid_email method is added to handle email validation using a regular expression. These changes significantly improve the input validation process, making the code more secure and reliable.

Suggested change
if name and email and age and '@' in email and age > 0:
# {fact rule=code-quality-readability-maintainability@v1.0 defects=1}
import re # For email validation
class UserManager:
def __init__(self):
self.users = []
def add_user(self, name, email, age):
if self.validate_inputs(name, email, age):
self.users.append({'name': name, 'email': email, 'age': age})
return True
return False
def get_user(self, email):
for user in self.users:
if user['email'] == email:
return user
return None
def validate_inputs(self, name, email, age):
if not name or not isinstance(name, str):
return False
if not self.is_valid_email(email):
return False
if not isinstance(age, int) or age <= 0:
return False
return True
def is_valid_email(self, email):
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return re.match(email_regex, email) is not None
# {/fact}

# {fact rule=code-quality-error-handling@v1.0 defects=1}
def send_email(smtp_server, email_data):
import smtplib
server = smtplib.SMTP(smtp_server)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Description: We detected a URL in your code that uses an unencrypted ftp, telnet, http or smtp connection. These protocols are not secure and introduce the risk of exposing sensitive data to third parties. To secure your connection, we recommend using ssh instead of telnet, sftp, ftps or scp instead of ftp, https instead of http and tls over smtp (smtps). For more information, see CWE-200 and CWE-319.

Severity: High

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix replaces the insecure SMTP connection with a secure SMTP_SSL connection using SSL/TLS. It also creates a default SSL context to ensure proper security settings are applied.

Suggested change
server = smtplib.SMTP(smtp_server)
# Import smtplib for SMTP protocol client
# ssl is used to create a secure SSL/TLS connection
import smtplib
import ssl
def send_email(smtp_server, email_data):
context = ssl.create_default_context()
server = smtplib.SMTP_SSL(smtp_server, context=context)
server.send_message(email_data)
server.quit()

@amazon-q-developer
Copy link
Copy Markdown

✅ I finished the code review, and left comments with the issues I found. I will now generate code fix suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant