Great work!
I would like to inquire if there are any results available regarding the variation of semantic richness and acoustic fidelity as the number of n_q changes in XCodec. Specifically, I am interested in understanding how these two factors (semantic richness and acoustic fidelity) behave as n_q is increased or decreased.
Additionally, I have observed that XCodec operates at a sampling rate of 16kHz, and the reconstructed WAV files lose many acoustic details compared to raw 44.1kHz audio. I am curious about the challenges involved when applying XCodec's technology to a 44kHz codec. For instance, would it be feasible to enhance the DAC by integrating Hubert-based representations?
Any insights or experiences would be greatly appreciated.
Thank you!
Great work!
I would like to inquire if there are any results available regarding the variation of semantic richness and acoustic fidelity as the number of n_q changes in XCodec. Specifically, I am interested in understanding how these two factors (semantic richness and acoustic fidelity) behave as n_q is increased or decreased.
Additionally, I have observed that XCodec operates at a sampling rate of 16kHz, and the reconstructed WAV files lose many acoustic details compared to raw 44.1kHz audio. I am curious about the challenges involved when applying XCodec's technology to a 44kHz codec. For instance, would it be feasible to enhance the DAC by integrating Hubert-based representations?
Any insights or experiences would be greatly appreciated.
Thank you!