truthfulqa

Star

Here are 6 public repositories matching this topic...

upunaprosk / quantized-lm-confidence

Star

Code for NAACL paper When Quantization Affects Confidence of Large Language Models?

nlp compression quantization efficient-model large-language-models llm gptq truthfulqa

Updated Dec 30, 2024
Jupyter Notebook

NahuelGiudizi / llm-evaluation

Star

Enterprise-grade LLM evaluation framework | Multi-model benchmarking, honest dashboards, system profiling | Academic metrics: MMLU, TruthfulQA, HellaSwag | Zero fake data | PyPI: llm-benchmark-toolkit | Blog: https://dev.to/nahuelgiudizi/building-an-honest-llm-evaluation-framework-from-fake-metrics-to-real-benchmarks-2b90

visualization python benchmarking machine-learning performance-testing academic-metrics mmlu ollama llm-evaluation truthfulqa hellaswag

Updated Dec 5, 2025
Python

aaitorm / truthfulqa-llm-evaluation

Star

Evaluation of Llama-3.1-8B Base vs Instruct on TruthfulQA using few-shot prompting and automatic judge models

multilingual nlp evaluation transformers llama llm prompting truthfulqa

Updated Mar 18, 2026
Python

Shuichi346 / llm-benchmark-script

Star

A tool to evaluate and compare local LLMs running on Ollama or LM Studio under identical conditions using deepeval's public benchmarks (MMLU, TruthfulQA, GSM8K).

python macos benchmark quantization model-evaluation apple-silicon llm gsm8k local-llm mmlu ollama lmstudio truthfulqa deepeval

Updated Mar 14, 2026
Python

LadyPary / llm-conversational-judgment

Star

Official code for "From Fact to Judgment: Investigating the Impact of Task Framing on LLM Conviction in Dialogue Systems" (IWSDS 2026)

nlp conversational-ai llm sycophancy truthfulqa llm-as-a-judge

Updated Jan 8, 2026
Python

ravikirankrishnaprasad / multi-agent-hallucination-detection-and-correction

Star

Multi-agent framework for hallucination detection and correction in LLM outputs using retrieval-grounded verification. MSc AI/ML dissertation (LJMU).

nlp machine-learning ai-research rag llm generative-ai retrieval-augmented-generation hallucination-detection truthfulqa multi-agent-ai medhallu

Updated Mar 18, 2026
Python

Improve this page

Add a description, image, and links to the truthfulqa topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the truthfulqa topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

truthfulqa

Here are 6 public repositories matching this topic...

upunaprosk / quantized-lm-confidence

NahuelGiudizi / llm-evaluation

aaitorm / truthfulqa-llm-evaluation

Shuichi346 / llm-benchmark-script

LadyPary / llm-conversational-judgment

ravikirankrishnaprasad / multi-agent-hallucination-detection-and-correction

Improve this page

Add this topic to your repo