The input files contain a lot of garbage.
We must at least clean up:
- HTML files from missing pages
- lines containing extra information (usually separated by " | ")
- lines containing multiple ":" separators? Not sure about this one, we may miss some useful data
Consider that this may be done in golastipass, also.
The input files contain a lot of garbage.
We must at least clean up:
Consider that this may be done in golastipass, also.