FACE: A Fine-Grained Reference-Free Evaluator for Conversational Information Access

This is the repo for the paper: FACE: A Fine-Grained Reference-Free Evaluator for Conversational Information Access, Hideaki Joko, Faegheh Hasibi, SIGIR 2026.

Specifically, the repository contains:

The CRSArena-Eval dataset with human-annotated conversations and meta-evaluation scripts.
The CRSArena-Eval interface for interactive meta-evaluation of your evaluator vs. baselines.
The FACE implementation with particle generation and scoring tools.

What is CRSArena-Eval and FACE?

CRSArena-Eval is a meta-evaluation dataset of human-annotated conversations between users and 9 Conversational Recommender Systems (CRSs), designed for evaluating CRS evaluators.
FACE is a Fine-grained, Aspect-based Conversation Evaluation method that provides evaluation scores for diverse turn and dialogue level qualities of recommendation conversations.

CRSArena-Eval Dataset Release (`dataset/`)

The directory dataset/ contains the CRSArena-Eval dataset. This dataset is designed for meta-evaluation of CRS evaluators and is built on the CRSArena-Dial dataset.

crs_arena_eval.json: The main dataset file containing 467 conversations with 4,473 utterances, annotated with both turn-level and dialogue-level quality scores by human evaluators.

Evaluation Aspects

Turn-level aspects:

Relevance (0-3): Does the assistant's response make sense and meet the user's interests?
Interestingness (0-2): Does the response make the user want to continue the conversation?

Dialogue-level aspects:

Understanding (0-2): Does the assistant understand the user's request and try to fulfill it?
Task Completion (0-2): Does the assistant make recommendations that the user finally accepts?
Interest Arousal (0-2): Does the assistant try to spark the user's interest in something new?
Efficiency (0-1): Does the assistant suggest items matching the user's interests within the first three interactions?
Overall Impression (0-4): What is the overall impression of the assistant's performance?

Table: General statistics of the CRSArena-Eval dataset.

Statistic	Value
# Conversations	467
# Utterances	4,473
Avg. utterances per conversation	9.58
Avg. words per user utterance	7.53
Avg. words per system utterance	15.18
# Final labels (after aggregation)	6,805

👉 For detailed dataset schema and structure, see dataset/README.md.

Evaluation

The dataset/run/ directory contains scripts and data for reproducing the evaluation results reported in the paper.

eval.py: Evaluation script that computes Pearson and Spearman correlations between predictions and CRSArena-Eval human annotations.
face_run.json: FACE predictions for the CRSArena-Eval dataset in the standard run file format.

FACE Method (`face/`)

The face/ directory contains the implementation of the FACE evaluation method.

particle_generation/: Converts dialogue turns into atomic conversation particles -- self-contained information units consisting of dialogue acts, text mentions, and user feedback.
face_scoring/: Scores particle-based dialogues using 16 optimized prompts per aspect, aggregating results to turn/dialogue-level scores.
reproduce_result_table/: Scripts for reconstructing the main result table from the paper.

Quick Start

Install dependencies (requires uv):
```
cd face && uv sync
```

Generate particles from a conversation:

uv run particle_generation/particle_generator.py examples/example_conv.json \
    --turn-index 1 --speaker ASST --samples 10

Score a conversation with FACE:

uv run face_scoring/face.py --conversation examples/example_particles.json \
    --aspect dialogue_overall

👉 For detailed usage, LLM setup, and available aspects, see face/README.md.

CRSArena-Eval Interactive Meta-Evaluation Interface (`interface/`)

We provide an easy-to-use meta-evaluation interface to evaluate your evaluator against the CRSArena-Eval dataset. The public interface is hosted at https://informagi.github.io/face/interface/. See interface/README.md for detailed instructions on how to run the interface locally.

We also provide a python script to evaluate your evaluator on the CRSArena-Eval dataset.

👉 For detailed run file format and evaluation instructions, see dataset/run/README.md.

Citation

@inproceedings{Joko:2026:FACE,
  title={FACE: A Fine-Grained Reference-Free Evaluator for Conversational Information Access},
  author={Joko, Hideaki and Hasibi, Faegheh},
  booktitle={Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2026}
}

Contact

If you have any questions, please contact Hideaki Joko (hideaki.joko@ru.nl)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
dataset		dataset
demo		demo
face		face
images		images
interface		interface
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FACE: A Fine-Grained Reference-Free Evaluator for Conversational Information Access

What is CRSArena-Eval and FACE?

CRSArena-Eval Dataset Release (`dataset/`)

Evaluation Aspects

Evaluation

FACE Method (`face/`)

Quick Start

CRSArena-Eval Interactive Meta-Evaluation Interface (`interface/`)

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FACE: A Fine-Grained Reference-Free Evaluator for Conversational Information Access

What is CRSArena-Eval and FACE?

CRSArena-Eval Dataset Release (dataset/)

Evaluation Aspects

Evaluation

FACE Method (face/)

Quick Start

CRSArena-Eval Interactive Meta-Evaluation Interface (interface/)

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

CRSArena-Eval Dataset Release (`dataset/`)

FACE Method (`face/`)

CRSArena-Eval Interactive Meta-Evaluation Interface (`interface/`)

Packages