Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces evaluation configuration files for Google CI/CD deployment and pipeline design skills. The review feedback points out a critical logic error in the regex used to identify prohibited events, which would result in false positives. Additionally, the reviewer recommended improving portability by avoiding hardcoded credential paths and increasing execution efficiency by removing unnecessary sleep commands in the cleanup scripts.
| agentWorkingDir: /tmp/skill-activation/tinyjam-deploy | ||
| expectedSkill: google-cicd-deploy | ||
| prohibitedEventsBeforeActivation: | ||
| - "activate_skill.*(?!google-cicd-deploy)" |
There was a problem hiding this comment.
The regex activate_skill.*(?!google-cicd-deploy) will incorrectly match the intended skill activation. Because .* is greedy and consumes the entire string, the negative lookahead (?!google-cicd-deploy) at the end will always succeed (as the empty string at the end of the line is not google-cicd-deploy). This will cause the evaluation to fail even when the agent correctly activates the expected skill.
- "activate_skill(?!(.*google-cicd-deploy)).*"| agentWorkingDir: /tmp/skill-activation/tinyjam-pipeline-design | ||
| expectedSkill: google-cicd-pipeline-design | ||
| prohibitedEventsBeforeActivation: | ||
| - "activate_skill.*(?!google-cicd-pipeline-design)" |
There was a problem hiding this comment.
The regex activate_skill.*(?!google-cicd-pipeline-design) will incorrectly match the intended skill activation because the greedy .* consumes the string before the lookahead is evaluated. This will lead to false positives where the correct skill activation is treated as a prohibited event.
- "activate_skill(?!(.*google-cicd-pipeline-design)).*"| docker: | ||
| base: cicd-evals:latest | ||
| env: | ||
| GOOGLE_APPLICATION_CREDENTIALS: ~/.config/gcloud/application_default_credentials.json |
There was a problem hiding this comment.
Hardcoding the path to application_default_credentials.json in the user's home directory (~) makes the evaluation configuration non-portable. This configuration will fail in environments where the credentials are not stored in that specific location (e.g., CI runners or other developers' machines). Consider using a more flexible approach like an environment variable or a relative path.
| sleep 3 | ||
| rm -rf /tmp/skill-activation/tinyjam-deploy |
There was a problem hiding this comment.
The sleep 3 in the cleanup script appears unnecessary for local file system operations and adds cumulative delay to the evaluation suite (especially with 5 trials). If there is no specific asynchronous process to wait for, this should be removed to improve efficiency.
rm -rf /tmp/skill-activation/tinyjam-deploy| docker: | ||
| base: cicd-evals:latest | ||
| env: | ||
| GOOGLE_APPLICATION_CREDENTIALS: ~/.config/gcloud/application_default_credentials.json |
There was a problem hiding this comment.
| sleep 3 | ||
| rm -rf /tmp/skill-activation/tinyjam-pipeline-design |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces two new evaluation configuration files, eval.google-cicd-deploy.yaml and eval.google-cicd-pipeline-design.yaml, designed to test Google Cloud CI/CD skills. The review feedback highlights several critical improvements for the setup and cleanup scripts: specifically, using trial-specific directories (via ${_EVAL_TRIAL}) to ensure idempotency and support parallel execution, removing hardcoded credential paths to improve portability across environments, and eliminating unnecessary sleep commands to optimize performance.
| mkdir -p /tmp/skill-activation | ||
| git clone https://github.com/sdlc-graph/tinyjam.git /tmp/skill-activation/tinyjam-deploy |
There was a problem hiding this comment.
The setup script is not idempotent; git clone will fail if the directory already exists from a previous run. Furthermore, using a hardcoded path prevents parallel execution of trials. It is recommended to use a trial-specific directory using ${_EVAL_TRIAL}.
mkdir -p /tmp/skill-activation/${_EVAL_TRIAL}
git clone https://github.com/sdlc-graph/tinyjam.git /tmp/skill-activation/${_EVAL_TRIAL}/tinyjam-deploy| cleanup: | | ||
| sleep 3 | ||
| rm -rf /tmp/skill-activation/tinyjam-deploy | ||
| agentWorkingDir: /tmp/skill-activation/tinyjam-deploy |
| mkdir -p /tmp/skill-activation | ||
| git clone https://github.com/sdlc-graph/tinyjam.git /tmp/skill-activation/tinyjam-pipeline-design |
There was a problem hiding this comment.
| cleanup: | | ||
| sleep 3 | ||
| rm -rf /tmp/skill-activation/tinyjam-pipeline-design | ||
| agentWorkingDir: /tmp/skill-activation/tinyjam-pipeline-design |
| docker: | ||
| base: cicd-evals:latest | ||
| env: | ||
| GOOGLE_APPLICATION_CREDENTIALS: ~/.config/gcloud/application_default_credentials.json |
There was a problem hiding this comment.
Hardcoding the path to ~/.config/gcloud/application_default_credentials.json makes the evaluation non-portable. This path is specific to local developer environments and will likely fail in CI/CD pipelines or on systems with different configurations. Consider using the existing GOOGLE_APPLICATION_CREDENTIALS environment variable or a more generic path.
| sleep 3 | ||
| rm -rf /tmp/skill-activation/tinyjam-deploy |
There was a problem hiding this comment.
| docker: | ||
| base: cicd-evals:latest | ||
| env: | ||
| GOOGLE_APPLICATION_CREDENTIALS: ~/.config/gcloud/application_default_credentials.json |
| sleep 3 | ||
| rm -rf /tmp/skill-activation/tinyjam-pipeline-design |
Corresponding changes to skillgrade: sdlc-graph/skillgrade#21