Cequel

This code is used for running experiments for the paper.

Data: Raw data has been included in the data folder.
Embedding: You can check the baselines folder and generate embeddings of all encoders used in the paper, like:
```
python generate_embedding.py --model tfidf --dataset tweet --output_dir new_embeddings
```
Or you can download the instructor and sentencebert embeddings from this link https://www.dropbox.com/scl/fo/vzlvs2quhg90l5v9821bk/AJndRy9yAQMjZyou0THB7nE?rlkey=kswrq8aghbv0b2jz25vrafrxj&st=lem6dl1t&dl=0
Baselines: When embeddings are ready, run the embedding_clustering.py in the baselines folder:
```
python embedding_clustering.py
```

Requirements: Install all dependencies.

conda create --name Cequel python=3.12.3
pip install scikit-learn
pip install sentence_transformers
pip install datasets
pip install InstructorEmbedding
pip install nltk
pip install gensim
pip install ortools
pip install jsonlines
pip install openai
pip install metric-learn
pip install sentence-transformers==2.2.2
pip install huggingface_hub==0.25.0
pip install openai==0.28

Main experiments: Before running the main.py to get results, please replace your api key in our EdgeLLM and TriangleLLM in llm_clustering\active_semi_supervised_clustering\active_semi_clustering\active\pairwise_constraints\gpt3_pc_oracle_edge.py and llm_clustering\active_semi_supervised_clustering\active_semi_clustering\active\pairwise_constraints\gpt3_pc_oracle_triangle.py, and check the parameters in it.

For example:
```
python main.py --corpus_name BBC_News --selection triangle --encoder_name instructor --clustering WeightedPCKMeans --mode max-sum --weight_method log_degree_ratio --eigen 0.1
```
Results: Check the prompts, responses, and final results.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
baselines		baselines
data		data
llm_clustering		llm_clustering
README.md		README.md
Supplement.pdf		Supplement.pdf
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cequel

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cequel

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages