Skip to content

Build BOW model for Semantic Domain Identification #30

@janetzki

Description

@janetzki

Goal

We want to build a GNN-based edge prediction BOW model for SDI. We hypothesize that it has a higher performance than the simple baseline model.
Motivation: SDI with F1 > 0.30 for 1 tpi/meu

Tasks

  • Acquire refined mappings from verses to semantic domains Acquire MARBLE Data #1
  • use refined mappings from words in verses to SDs to assign SDs to words in verses from LRL
    • simply assign SDs in eng to each aligned word in LRL
    • if many false positive mappings (i.e., low precision): refine assignments with generated SD dicts for LRL (set intersection)
  • collect BOW for every word with assigned SD (2 words before and after word in the middle)
  • aggregate BOWs by SD
  • perform SDI by extracting BOW for every candidate word in input sentence and compute cosine dist to aggregated BOW
  • try out baseline: look up each word in a dictionary
  • consider usefulness of WSD (word sense disambiguation) with pywsd or different tool: Eng verse → WordNet → SD (see Jonathan’s 2nd mail)

Metadata

Metadata

Assignees

No one assigned

    Labels

    GNNRQ 2Research Question #2 (GNN for semantic domain identification)modelnice-to-have

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions