Like I'd mentioned in #184, we need more complex control over what is considered valid data. For example, we have columns where we only want to allow data formatted as percentages (we want users to use % formatting only so there's no ambiguity if e.g. 0.3 was supposed to mean 0.3% or 30%). I'm not sure it makes sense to add a new percentage DataType though. There are other more specific formats we might want to check, and I'm not sure it would make sense to add a bunch of other data types for things that end up as the same data type (e.g. float). Really these would be formatting checks more than data type checks.
@lukapeschke I wanted to ask about what you think about the best way to handle this would be. Currently we get the format string from openpyxl and check that. Maybe I could update calamine so we could get the excel format string for a given cell to use in the create_..._with_errors functions. Then maybe the dtypes parameter or a new parameter could also specify required cell formats for columns. In that case, the create_..._with_errors would check the cell format and also add an error on mismatch. I think we would have to support regexp matching against the cell format or at the very least allow checking against any number of formats as the format string for percentages, for example, is different depending on the number of decimal places.
Alternatively, there could just be some way to get the cell formats, and we could handle the rest in Python.
Do you think either of these approaches would make sense?
Like I'd mentioned in #184, we need more complex control over what is considered valid data. For example, we have columns where we only want to allow data formatted as percentages (we want users to use % formatting only so there's no ambiguity if e.g. 0.3 was supposed to mean 0.3% or 30%). I'm not sure it makes sense to add a new percentage DataType though. There are other more specific formats we might want to check, and I'm not sure it would make sense to add a bunch of other data types for things that end up as the same data type (e.g. float). Really these would be formatting checks more than data type checks.
@lukapeschke I wanted to ask about what you think about the best way to handle this would be. Currently we get the format string from openpyxl and check that. Maybe I could update calamine so we could get the excel format string for a given cell to use in the
create_..._with_errorsfunctions. Then maybe thedtypesparameter or a new parameter could also specify required cell formats for columns. In that case, thecreate_..._with_errorswould check the cell format and also add an error on mismatch. I think we would have to support regexp matching against the cell format or at the very least allow checking against any number of formats as the format string for percentages, for example, is different depending on the number of decimal places.Alternatively, there could just be some way to get the cell formats, and we could handle the rest in Python.
Do you think either of these approaches would make sense?