If a run takes the same dataset for two different inputs, it writes two copies. We ran into a problem with a pipeline that checked whether two inputs came from the same file by comparing the two file names. That failed when Kive made two copies of the same file.
If two inputs use the same dataset, name the file after the first input and pass that name to both inputs on the command line.
If a run takes the same dataset for two different inputs, it writes two copies. We ran into a problem with a pipeline that checked whether two inputs came from the same file by comparing the two file names. That failed when Kive made two copies of the same file.
If two inputs use the same dataset, name the file after the first input and pass that name to both inputs on the command line.