Skip to content

Commit 00c4a65

Browse files
chore: sync papers from Feishu [skip ci]
1 parent fc6849c commit 00c4a65

1 file changed

Lines changed: 6 additions & 2 deletions

File tree

data/papers.json

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -208,12 +208,16 @@
208208
},
209209
"作者信息(每人一行,分号换行,数字表示单位信息,*表示Equal Contribution, ^表示通讯作者)": "Yufei Gao, Jiaying Fei, Nuo Chen, Ruirui Chen, Guohang Yan, Yunshi Lan, Botian Shi",
210210
"单位信息(每个单位一行,分号换行)": "1 Shanghai AI Laboratory\n2 East China Normal University\n3 The Chinese University of Hong Kong, Shenzhen\n4\nInstitute of High Performance Computing, A*STAR",
211+
"录用类型": [
212+
"Poster"
213+
],
211214
"摘要": "Multimodal Large Language Models (MLLMs) perform strongly in high-resource languages, yet their effectiveness drops sharply in low-resource settings, largely due to the scarcity of aligned and culturally informative multimodal data. Existing multilingual enhancement approaches predominantly rely on text-only resources or translation-based pipelines, which improve surface-level fluency but often fail to capture culturally specific visual knowledge.\nIn this work, we present MELLA, a large-scale multimodal multilingual dataset designed to support both linguistic fluency and culturally grounded visual understanding in low-resource languages. MELLA is constructed using a dual-source data curation strategy that combines (i) native web image-alt-text pairs, which provide in-context, culture-specific visual-textual alignments, and (ii) high-quality image descriptions generated in a high-resource language and translated into target languages to ensure linguistic richness and structural completeness. Rather than expanding multilingual coverage alone, this design explicitly disentangles two complementary learning signals that are conflated in existing multilingual multimodal datasets.\nMELLA covers eight low-resource languages and contains 6.8M image-text pairs spanning diverse domains and visual categories. Through controlled diagnostic fine-tuning experiments on multiple MLLM backbones, we show that training on MELLA mitigates the cultural hallucination gap, often manifested as culturally “thin“ descriptions, by enabling models to recognize and articulate culturally specific entities that are systematically overlooked by translation-centric pipelines. Our findings underscore the central role of data alignment, rather than model modification, in achieving culturally grounded multimodal understanding for low-resource languages.",
212215
"是否为团队主导工作": true,
213-
"期刊/会议": "Under Submission",
216+
"期刊/会议": "IJCAI-2026",
214217
"记录创建日期": 1776268800000,
215-
"论文发表日期": 1754496000000,
218+
"论文发表日期": 1777996800000,
216219
"论文标题": "MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs",
220+
"论文状态": "已录用",
217221
"责任人": [
218222
{
219223
"email": "yanguohang@pjlab.org.cn",

0 commit comments

Comments
 (0)