This repository contains the source code, data, and models in the paper.
To set up the environment and install the required dependencies, please follow these steps:
conda create -n NILC -c conda-forge python=3.12
conda activate NILC
pip install -r requirements.txt
Before processing the data or running experiments, you can choose to either download our pre-trained models or configure your own.
Our fine-tuned USNID and UnsupUSNID models can be downloaded from the link below:
- Fine-tuned Models: https://mega.nz/folder/Sew0kLzR#CdX4euhBMuGC_xiA3T1Hfg
Put the models in the root directory.
Our framework is designed to be flexible and extensible. You can easily integrate other models as encoders with minimal code modifications. We have built-in support for some strong previous works (USNID, MTP-CLNN, LatentEM) and several popular embeddings (SentenceBERT, Instructor, OpenAI), including but not limited to:
- USNID / UnsupUSNID: Pre-train or fine-tune your own versions from the official repository.
- MTP-CLNN: From the official repository.
- LatentEM: From the official repository.
- SentenceBERT
- Instructor
- OpenAI Embeddings
With a model selected, you can now prepare the data.
You can download the data already pre-processed by our fine-tuned USNID and UnsupUSNID models from the following link.
- Pre-processed Data: https://mega.nz/folder/PbZ3ED6R#MdBPcWcqTkgTQuBZyEi-KQ
Put the processed_data in the root directory.
If you are using a different model or want to generate the data embeddings yourself, follow these steps:
cd data_loaders
python preprocess_offline_data.py
Follow these steps to run the main experiments.
Open config.py in the root directory. Set the EMBEDDING_TYPE and DATASET_NAME variables.
Execute the main experiment script from the root directory.
python run_experiments.py
Once all experiments are complete, navigate to the results directory and run the processing script to generate a consolidated summary of all results.
cd results
python process_results.py