Skip to content

Releases: FACTSlab/glazing

Glazing v0.2.2

06 Feb 21:05
8d3786b

Choose a tag to compare

This release rewrites and fixes converters/loaders for all four resources, migrates the repository to the FACTSlab organization, and adds CI/CD infrastructure.

✨ What's New

PyPI Publish Workflow

Added a GitHub Actions workflow that automatically publishes to PyPI when a new tag is created, using trusted publishers (OIDC) for secure, tokenless authentication.

Converter-to-Loader Round-Trip Integration Tests

Added comprehensive integration tests that verify the full converter → JSONL → loader round-trip for all four resources (FrameNet, PropBank, VerbNet, WordNet).

FrameNet Enrichments

Added frame relation, lexical unit enrichment, semantic type, and fulltext annotation parsing to the FrameNet converter and loader. Supplementary data files (framenet_semtypes.jsonl, framenet_fulltext.jsonl) are now converted during initialization.

Supplementary Data Conversion

glazing init now converts supplementary WordNet data files (wordnet_senses.jsonl, wordnet_exceptions.jsonl) and FrameNet data files (framenet_semtypes.jsonl, framenet_fulltext.jsonl).

🔄 What's Changed

WordNet Converter and Loader Rewrite

Rewrote the WordNet converter and loader to use enriched single-file JSONL output with supplementary sense and exception files, improving data completeness and load performance.

Relaxed Lemma Validation

Lemma validation now allows uppercase letters, digits at the start, and dots — supporting proper nouns (e.g., "Dog"), abbreviations (e.g., "Dr."), and numeric prefixes (e.g., "123abandon").

Repository Migration

Moved the repository from aaronstevenwhite/glazing to factslab/glazing. All URLs across documentation, CI, and configuration have been updated.

🐛 What's Fixed

VerbNet Converter Mappings

Fixed the VerbNet converter to populate framenet_mappings and propbank_mappings from member attributes, which were previously left empty.

PropBank AMR-UMR-91 Support

Added AMR-UMR-91 roleset conversion and fixed XML edge cases in the PropBank converter.

Ruff Lint Compliance

Fixed PLW0108 (unnecessary lambdas) and PLC0207 (unnecessary string splits) flagged by newer ruff versions.

📦 Updating

To get the latest version:

pip install --upgrade glazing

Important: You must reconvert datasets to benefit from the enriched output:

glazing init --force

Requirements remain unchanged:

  • Python 3.13 or higher

📜 Citation

If you use this version of Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/factslab/glazing},
  version = {0.2.2},
  doi = {10.5281/zenodo.17467082}
}

🙏 Acknowledgments

This project is funded by the National Science Foundation (BCS-2040831).

📮 Contact


Full Changelog: v0.2.1...v0.2.2

Glazing v0.2.1

28 Oct 16:00
25044e1

Choose a tag to compare

This patch release fixes a critical data completeness issue where FrameNet lexical units were not being loaded during dataset conversion.

Fixes: #4

🐛 What's Fixed

FrameNet Lexical Units Now Properly Loaded

Lexical units are now correctly parsed from luIndex.xml and associated with their frames during conversion. This fixes a critical issue where all frames had empty lexical_units fields despite the raw FrameNet data containing 13,575 lexical units.

Before v0.2.1:

>>> from glazing.framenet.loader import FrameNetLoader
>>> loader = FrameNetLoader()
>>> index = loader.build_frame_index(loader.frames)
>>> frame = index.get_frame_by_name("Abandonment")
>>> len(frame.lexical_units)
0  # All frames had empty lexical units!

After v0.2.1:

>>> from glazing.framenet.loader import FrameNetLoader
>>> loader = FrameNetLoader()
>>> index = loader.build_frame_index(loader.frames)
>>> frame = index.get_frame_by_name("Abandonment")
>>> len(frame.lexical_units)
5  # Lexical units now properly loaded
>>> frame.lexical_units[0].name
'abandon.v'
>>> frame.lexical_units[0].pos
'V'

Real-World Data Support

Validation patterns have been updated to handle all actual FrameNet data, including:

  • Proper nouns (e.g., "April.n", "Monday.n")
  • Multi-word expressions (e.g., "a bit.n", "give up.v")
  • Special characters (e.g., "(can't) help.v", "American [N and S Am].n")
  • Complex lexeme names (e.g., "Boxing Day", "Scud-B missile")

Conversion Success Rate:

  • Before: ~12,400 out of 13,575 LUs successfully parsed (91.3%)
  • After: 13,572 out of 13,575 LUs successfully parsed (99.98%)

📦 Updating

To get the latest version:

pip install --upgrade glazing

Important: You must reconvert FrameNet data to benefit from this fix:

glazing init --force

Requirements remain unchanged:

  • Python 3.13 or higher

🔧 Technical Changes

Converter Improvements

  • Added _parse_lu_from_index() method to parse lexical units from luIndex.xml
  • Added convert_lu_index_file() method to convert entire LU index
  • Modified convert_frames_directory() to load LUs and associate with frames by frame_id
  • Improved CLI output to show "Converting FrameNet frames and lexical units..."

Validation Updates

  • Relaxed LU_NAME_PATTERN from strict format to permissive ^.+\.[a-z]+$
  • Relaxed LEXEME_NAME_PATTERN from strict format to permissive ^.+$
  • Updated tests to match new validation behavior

Data Completeness

  • Approximately 13,500 lexical units now correctly associated with their frames
  • Enables querying frames by lexical unit name via the frame index
  • Full metadata preserved: POS tags, annotation status, sentence counts, lexeme structures

🔧 Compatibility

  • Fully backwards compatible with v0.2.0
  • No API changes
  • Existing code continues to work, but will now have access to lexical unit data
  • Dataset reconversion required to populate lexical units

📜 Citation

If you use this version of Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/aaronstevenwhite/glazing},
  version = {0.2.1},
  doi = {10.5281/zenodo.17185626}
}

🙏 Acknowledgments

This project is funded by the National Science Foundation (BCS-2040831).

📮 Contact


Full Changelog: v0.2.0...v0.2.1

Glazing v0.2.0

30 Sep 20:17
f708c7d

Choose a tag to compare

This minor release adds symbol parsing, fuzzy search, and syntax-based search capabilities to Glazing.

✨ What's New

Symbol Parsing

Parse linguistic identifiers from all four datasets to extract structured information:

>>> from glazing.propbank.symbol_parser import parse_roleset_id
>>> result = parse_roleset_id("give.01")
>>> result.lemma
'give'
>>> result.sense_number
1

Fuzzy String Matching

Find approximate matches using Levenshtein distance:

>>> from glazing.search import UnifiedSearch
>>> searcher = UnifiedSearch()
>>> results = searcher.search_with_fuzzy("comunication", fuzzy_threshold=0.8)
>>> # Returns results including "Communication" frames

Syntax-Based Search

Search across datasets using syntactic patterns:

>>> from glazing.search import UnifiedSearch
>>> searcher = UnifiedSearch()
>>> results = searcher.search_by_syntax(
...     pattern="NP V NP PP[to]",  # Ditransitive pattern
...     allow_wildcards=True
... )
>>> # Returns 52 matching patterns

Enhanced Cross-References

New MappingIndex class provides transitive reference resolution with caching:

  • Discover indirect mappings between datasets
  • File-based caching for performance
  • Configurable hop limits for transitive searches

🔧 CLI Enhancements

New Commands

  • glazing search query --fuzzy - Enable fuzzy matching for typo correction
  • glazing search fuzzy - Search with fuzzy matching and typo correction
  • glazing search syntax - Search for syntactic patterns with morphological features
  • glazing search args - Search for arguments with specific properties
  • glazing search role - Search for semantic roles across datasets
  • glazing search roles - Search for semantic roles with specific properties
  • glazing search elements - Search for frame elements with specific properties
  • glazing search relations - Search for synsets with specific relations
  • glazing xref resolve - Resolve cross-references for an entity
  • glazing xref extract - Extract cross-references from all datasets
  • glazing xref clear-cache - Clear cached cross-references

Improvements

  • Progress indicators for long-running operations
  • Better error messages and help text

🐳 Docker Support

New Dockerfile for containerized deployment:

# Use official Python 3.13 slim image as base
FROM python:3.13-slim

# Set working directory
WORKDIR /app

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1

# Install system dependencies required for building packages
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Copy only requirements first to leverage Docker cache
COPY pyproject.toml README.md ./
COPY src/glazing/__version__.py src/glazing/

# Install package dependencies
RUN pip install --upgrade pip && \
    pip install -e .

# Copy the rest of the application code
COPY src/ src/
COPY tests/ tests/

# Create data directory for datasets
RUN mkdir -p /data

# Set environment variable for data directory
ENV GLAZING_DATA_DIR=/data

# Initialize datasets during build
RUN glazing init --data-dir /data

# Expose data directory as volume
VOLUME ["/data"]

# Set the entrypoint to the glazing CLI
ENTRYPOINT ["glazing"]

# Default command shows help
CMD ["--help"]

📚 New Documentation

  • User Guides:
    • Fuzzy search guide (docs/user-guide/fuzzy-search.md)
    • Syntax search guide (docs/user-guide/syntax-search.md)
  • API Documentation:
    • Symbol parsers for all datasets (4 files)
    • Fuzzy matching utilities
    • Syntax pattern models
  • Data Formats: JSON schemas added to data-formats.md

📦 Technical Changes

New Modules

  • glazing.syntax - Syntactic pattern models and parser
  • glazing.utils.fuzzy_match - String similarity functions
  • glazing.utils.ranking - Result ranking utilities
  • glazing.references.index - Mapping index implementation
  • glazing.symbols - Symbol type definitions
  • glazing.{dataset}.symbol_parser - Parse dataset identifiers (4 modules)

Dependencies

  • Added python-Levenshtein>=0.20.0 for fuzzy matching

📦 Updating

pip install --upgrade glazing==0.2.0

Requirements:

  • Python 3.13 or higher
  • New dependency: python-Levenshtein>=0.20.0

🔧 Compatibility

  • Fully backwards compatible with v0.1.x
  • No breaking changes
  • All existing APIs continue to work

📜 Citation

If you use this version of Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/aaronstevenwhite/glazing},
  version = {0.2.0},
  doi = {10.5281/zenodo.17185626}
}

🙏 Acknowledgments

This project is funded by the National Science Foundation (BCS-2040831).

📮 Contact


Full Changelog: v0.1.1...v0.2.0

Glazing v0.1.1

27 Sep 23:57
bad03af

Choose a tag to compare

This patch release fixes a critical usability issue with CLI search commands and includes documentation improvements.

🐛 What's Fixed

CLI Commands Now Work As Documented

The search commands now work without requiring the --data-dir argument, matching the examples in our documentation.

Before v0.1.1:

$ glazing search query "abandon"
Error: Missing option '--data-dir'.

After v0.1.1:

$ glazing search query "abandon"
                          Search Results for 'abandon'                          
┏━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Dataset  ┃ Type     ┃ ID/Name    ┃ Description                       ┃ Score ┃
┡━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ VERBNET  │ class    │ leave-51.2 │ VerbNet class with 1 members      │ 1.00  │
│ PROPBANK │ frameset │ abandon    │ PropBank frameset with 3 rolesets │ 1.00  │
└──────────┴──────────┴────────────┴───────────────────────────────────┴───────┘

Affected Commands

All search commands now automatically use the default data directory (~/.local/share/glazing/converted/):

  • glazing search query - Search across datasets
  • glazing search entity - Get entity details
  • glazing search role - Search for semantic roles
  • glazing search cross-ref - Find cross-references

📝 Documentation Improvements

This release incorporates the documentation improvements that were merged after v0.1.0:

  • More concise and natural language throughout documentation
  • Removed unnecessary promotional language
  • Improved docstring clarity
  • 50% reduction in documentation size while maintaining all essential information

📦 Updating

To get the latest version:

pip install --upgrade glazing

Requirements remain unchanged:

  • Python 3.13 or higher
  • ~5MB for the package
  • ~54MB for raw downloaded datasets
  • ~130MB total disk space after conversion

🔧 Technical Notes

  • Search commands now use get_default_data_path() from the glazing.initialize module
  • Default path respects XDG Base Directory specification ($XDG_DATA_HOME if set)
  • Commands with explicit --data-dir arguments continue to work as before
  • No breaking changes - fully backward compatible

📜 Citation

If you use Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/aaronstevenwhite/glazing},
  version = {0.1.1}
}

🙏 Acknowledgments

This project is funded by the National Science Foundation (BCS-2040831).

📮 Contact


For the complete changelog, see CHANGELOG.md

Glazing v0.1.0

23 Sep 14:36

Choose a tag to compare

We're excited to announce the first release of Glazing, a package containing unified data models and interfaces for syntactic and semantic frame ontologies.

Glazing provides a modern Python interface for working with four major linguistic resources: FrameNet, PropBank, VerbNet, and WordNet.

✨ Highlights

  • 🚀 One-command setup: Get started immediately with glazing init to download and prepare all datasets
  • 📦 Type-safe models: Comprehensive Pydantic v2 models for all data structures
  • 🔍 Unified search: Query across all datasets with a consistent API
  • 🔗 Cross-references: Automatic mapping between resources
  • 💾 Efficient storage: JSON Lines format with streaming support
  • 🐍 Modern Python: Full type hints and Python 3.13+ support

📚 Supported Datasets

  • FrameNet 1.7: Semantic frames, frame elements, lexical units, and frame relations
  • PropBank 3.4: Framesets, rolesets, and semantic role labels
  • VerbNet 3.4: Verb classes, thematic roles, syntactic frames, and GL semantics
  • WordNet 3.1: Synsets, lemmas, lexical relations, and morphological processing

🚀 Quick Start

# Install the package
pip install glazing

# Initialize all datasets (one-time setup)
glazing init

# Start using the Python API
python -c "
from glazing.search import UnifiedSearch
search = UnifiedSearch()
results = search.search('give')
for r in results[:5]:
    print(f'{r.dataset}: {r.name}')
"

🛠️ Installation

pip install glazing

Requirements:

  • Python 3.13 or higher
  • ~5MB for the package
  • ~54MB for raw downloaded datasets
  • ~130MB total disk space after conversion (includes both raw and converted data)

📖 Features

Command-Line Interface

  • glazing init - Initialize all datasets with a single command
  • glazing download - Download individual or all datasets
  • glazing convert - Convert from source formats to JSON Lines
  • glazing search - Search across datasets with various filters

Python API

from glazing.framenet.loader import FrameNetLoader
from glazing.verbnet.loader import VerbNetLoader

# Loaders automatically load data after 'glazing init'
fn_loader = FrameNetLoader()
frames = fn_loader.frames

vn_loader = VerbNetLoader()  
verb_classes = list(vn_loader.classes.values())

Cross-Reference Resolution

from glazing.references.extractor import ReferenceExtractor
from glazing.verbnet.loader import VerbNetLoader
from glazing.propbank.loader import PropBankLoader

# Extract references
extractor = ReferenceExtractor()
extractor.extract_verbnet_references(list(vn_loader.classes.values()))
extractor.extract_propbank_references(list(pb_loader.framesets.values()))

# Access cross-references
if "give.01" in extractor.propbank_refs:
    refs = extractor.propbank_refs["give.01"]
    vn_classes = refs.get_verbnet_classes()

📝 Documentation

📊 Technical Details

  • Performance: Streaming support for lazy loading datasets
  • Format: JSON Lines for efficient storage and processing
  • Validation: Automatic validation using Pydantic models
  • Caching: Efficient caching for repeated operations
  • Type Safety: Full type hints for better IDE support

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines.

📜 Citation

If you use Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/aaronstevenwhite/glazing},
  version = {0.1.0}
}

🙏 Acknowledgments

This project was funded by the National Science Foundation (BCS-2040831) and builds upon the foundational work of the FrameNet, PropBank, VerbNet, and WordNet teams.

📮 Contact


Thank you for using Glazing! We're excited to see what you build with it. 🎉