1558 csv data reader#1640
Conversation
SFJohnson24
left a comment
There was a problem hiding this comment.
This looks good so far-- can you see messages on slack with me and Gerry re: the .env and taking that into account as well with csv runs. https://github.com/cdisc-org/cdisc-rules-engine/issues/1559 My parsing script matches your reader logic--there is a description of the .env structure there.
It would want to get the standard/version/substandard/CT/define xml path from the .env
…-dp, -d] parameters to run validation.
| raise NotImplementedError | ||
|
|
||
| def from_file(self, file_path): | ||
| with open(file_path, "r", encoding=self.encoding) as fp: |
There was a problem hiding this comment.
We will need encoding + general error handling wrapping similar to json reader here.
| first_record = {} | ||
|
|
||
| with open(self.file_path, encoding=self.encoding) as f: | ||
| dataset_length = sum(1 for _ in f) - 1 # subtract header |
There was a problem hiding this comment.
This can result in dataset_length = -1 for empty csv files. For those cases I guess it should be zero. So we should clamp it to zero.
| description = "JSON data is malformed." | ||
|
|
||
|
|
||
| class InvalidCSVFormat(EngineError): |
There was a problem hiding this comment.
I think better naming for this would be InvalidCSVFile instead of format because this error is for when csv is malformed or encoding failed which makes it invalid csv file. What do you think?
| type=click.Path(exists=True, readable=True, resolve_path=True), | ||
| help="Path to directory containing local rules.", | ||
| multiple=True, | ||
| envvar="LOCAL_RULES", |
| required=False, | ||
| help="Rule ID to get rule for.", | ||
| multiple=True, | ||
| envvar="RULEID", |
There was a problem hiding this comment.
noticed a lot of env params that dont exist:
PRODUCT=TIG
VERSION=1-0
SUBSTANDARD=SDTM
CT=sdtmct2020-12-18
DEFINE_XML=define_negative1.xml
above is a complete .env file with all possible params from the cdisc-open-rules repository. You can hop on the rules branch and see what it looks like if you want
| required=True, | ||
| default=None, | ||
| help="Standard version to validate against", | ||
| envvar="VERSION", |
There was a problem hiding this comment.
All else looks fine to me. Could you please adda test where we check the .env interoperability. You can cover all the flags in few tests.
…ut to match other readers logic
# Conflicts: # README.md
RamilCDISC
left a comment
There was a problem hiding this comment.
I ran the following command in the cli:
python core.py validate -s sdtmig -v 3.4 -dp tests/resources/CoreIssue1558/datasets/dm.csv -dep tests/resources/CoreIssue1558/test.env
The obtained repor tis attached. There are errors for rule CORE-000658 CORE-000711 CORE-000713 CORE-000714. The issue details sheet shows the error in the Value(s) column.
|
@SFJohnson24 Could you please confirm if you agree with the changes made on your request. I have done manual validation and as well as code review. All seems good to me. Once you give a heads up i will add final comment and approve. |
…disc-rules-engine into 1558-csv-data-reader
| MAX_REPORT_ROWS = 10 # integer for maximum number of issues per excel sheet (plus headers) in result report. Defaults to 10000. | ||
| MAX_ERRORS_PER_RULE = (10, True) # Tuple for maximum number of errors to report per rule during a validation run. Also has a per dataset flag described as second bool value in readme. example value | ||
| DEFINE_XML | ||
| DATA_DIR No newline at end of file |
There was a problem hiding this comment.
what is DATA_DIR? Is this an artifact?
There was a problem hiding this comment.
should this example file also have the other variables the .env can handle?
| "CDISC TIG Use Case for scoping a TIG Validation." | ||
| "Any of INDH, PROD, NONCLIN, or ANALYSIS." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
I tested this on TIG and it we will need envvar="USE_CASE" for this standard. I updated the open rules to reflect USE_CASE as an .env variable
SFJohnson24
left a comment
There was a problem hiding this comment.
this looks good--just 1 small tweak that came up in my TIG testing. Looks good for other standards though
|
i pushed the 2 lines up just to expedite this getting merged--looks good @alexfurmenkov |
Added support for data in .csv format.
Implemented params reading from .env files - based on .env in data folder.