Releases · FACTSlab/glazing

06 Feb 21:05

v0.2.2

8d3786b

Glazing v0.2.2 Latest

Latest

This release rewrites and fixes converters/loaders for all four resources, migrates the repository to the FACTSlab organization, and adds CI/CD infrastructure.

✨ What's New

PyPI Publish Workflow

Added a GitHub Actions workflow that automatically publishes to PyPI when a new tag is created, using trusted publishers (OIDC) for secure, tokenless authentication.

Converter-to-Loader Round-Trip Integration Tests

Added comprehensive integration tests that verify the full converter → JSONL → loader round-trip for all four resources (FrameNet, PropBank, VerbNet, WordNet).

FrameNet Enrichments

Added frame relation, lexical unit enrichment, semantic type, and fulltext annotation parsing to the FrameNet converter and loader. Supplementary data files (framenet_semtypes.jsonl, framenet_fulltext.jsonl) are now converted during initialization.

Supplementary Data Conversion

glazing init now converts supplementary WordNet data files (wordnet_senses.jsonl, wordnet_exceptions.jsonl) and FrameNet data files (framenet_semtypes.jsonl, framenet_fulltext.jsonl).

🔄 What's Changed

WordNet Converter and Loader Rewrite

Rewrote the WordNet converter and loader to use enriched single-file JSONL output with supplementary sense and exception files, improving data completeness and load performance.

Relaxed Lemma Validation

Lemma validation now allows uppercase letters, digits at the start, and dots — supporting proper nouns (e.g., "Dog"), abbreviations (e.g., "Dr."), and numeric prefixes (e.g., "123abandon").

Repository Migration

Moved the repository from aaronstevenwhite/glazing to factslab/glazing. All URLs across documentation, CI, and configuration have been updated.

🐛 What's Fixed

VerbNet Converter Mappings

Fixed the VerbNet converter to populate framenet_mappings and propbank_mappings from member attributes, which were previously left empty.

PropBank AMR-UMR-91 Support

Added AMR-UMR-91 roleset conversion and fixed XML edge cases in the PropBank converter.

Ruff Lint Compliance

Fixed PLW0108 (unnecessary lambdas) and PLC0207 (unnecessary string splits) flagged by newer ruff versions.

📦 Updating

To get the latest version:

pip install --upgrade glazing

Important: You must reconvert datasets to benefit from the enriched output:

glazing init --force

Requirements remain unchanged:

Python 3.13 or higher

📜 Citation

If you use this version of Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/factslab/glazing},
  version = {0.2.2},
  doi = {10.5281/zenodo.17467082}
}

🙏 Acknowledgments

This project is funded by the National Science Foundation (BCS-2040831).

📮 Contact

Author: Aaron Steven White (aaron.white@rochester.edu)
Repository: https://github.com/factslab/glazing
Documentation: https://glazing.readthedocs.io
PyPI: https://pypi.org/project/glazing/

Full Changelog: v0.2.1...v0.2.2

Assets 2

28 Oct 16:00

aaronstevenwhite

v0.2.1

25044e1

Glazing v0.2.1

This patch release fixes a critical data completeness issue where FrameNet lexical units were not being loaded during dataset conversion.

Fixes: #4

🐛 What's Fixed

FrameNet Lexical Units Now Properly Loaded

Lexical units are now correctly parsed from luIndex.xml and associated with their frames during conversion. This fixes a critical issue where all frames had empty lexical_units fields despite the raw FrameNet data containing 13,575 lexical units.

Before v0.2.1:

>>> from glazing.framenet.loader import FrameNetLoader
>>> loader = FrameNetLoader()
>>> index = loader.build_frame_index(loader.frames)
>>> frame = index.get_frame_by_name("Abandonment")
>>> len(frame.lexical_units)
0  # All frames had empty lexical units!

After v0.2.1:

>>> from glazing.framenet.loader import FrameNetLoader
>>> loader = FrameNetLoader()
>>> index = loader.build_frame_index(loader.frames)
>>> frame = index.get_frame_by_name("Abandonment")
>>> len(frame.lexical_units)
5  # Lexical units now properly loaded
>>> frame.lexical_units[0].name
'abandon.v'
>>> frame.lexical_units[0].pos
'V'

Real-World Data Support

Validation patterns have been updated to handle all actual FrameNet data, including:

Proper nouns (e.g., "April.n", "Monday.n")
Multi-word expressions (e.g., "a bit.n", "give up.v")
Special characters (e.g., "(can't) help.v", "American [N and S Am].n")
Complex lexeme names (e.g., "Boxing Day", "Scud-B missile")

Conversion Success Rate:

Before: ~12,400 out of 13,575 LUs successfully parsed (91.3%)
After: 13,572 out of 13,575 LUs successfully parsed (99.98%)

📦 Updating

To get the latest version:

pip install --upgrade glazing

Important: You must reconvert FrameNet data to benefit from this fix:

glazing init --force

Requirements remain unchanged:

Python 3.13 or higher

🔧 Technical Changes

Converter Improvements

Added _parse_lu_from_index() method to parse lexical units from luIndex.xml
Added convert_lu_index_file() method to convert entire LU index
Modified convert_frames_directory() to load LUs and associate with frames by frame_id
Improved CLI output to show "Converting FrameNet frames and lexical units..."

Validation Updates

Relaxed LU_NAME_PATTERN from strict format to permissive ^.+\.[a-z]+$
Relaxed LEXEME_NAME_PATTERN from strict format to permissive ^.+$
Updated tests to match new validation behavior

Data Completeness

Approximately 13,500 lexical units now correctly associated with their frames
Enables querying frames by lexical unit name via the frame index
Full metadata preserved: POS tags, annotation status, sentence counts, lexeme structures

🔧 Compatibility

Fully backwards compatible with v0.2.0
No API changes
Existing code continues to work, but will now have access to lexical unit data
Dataset reconversion required to populate lexical units

📜 Citation

If you use this version of Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/aaronstevenwhite/glazing},
  version = {0.2.1},
  doi = {10.5281/zenodo.17185626}
}

🙏 Acknowledgments

This project is funded by the National Science Foundation (BCS-2040831).

📮 Contact

Author: Aaron Steven White (aaron.white@rochester.edu)
Repository: https://github.com/aaronstevenwhite/glazing
Documentation: https://glazing.readthedocs.io
PyPI: https://pypi.org/project/glazing/

Full Changelog: v0.2.0...v0.2.1

Assets 2

30 Sep 20:17

aaronstevenwhite

v0.2.0

f708c7d

Glazing v0.2.0

This minor release adds symbol parsing, fuzzy search, and syntax-based search capabilities to Glazing.

✨ What's New

Symbol Parsing

Parse linguistic identifiers from all four datasets to extract structured information:

>>> from glazing.propbank.symbol_parser import parse_roleset_id
>>> result = parse_roleset_id("give.01")
>>> result.lemma
'give'
>>> result.sense_number
1

Fuzzy String Matching

Find approximate matches using Levenshtein distance:

>>> from glazing.search import UnifiedSearch
>>> searcher = UnifiedSearch()
>>> results = searcher.search_with_fuzzy("comunication", fuzzy_threshold=0.8)
>>> # Returns results including "Communication" frames

Syntax-Based Search

Search across datasets using syntactic patterns:

>>> from glazing.search import UnifiedSearch
>>> searcher = UnifiedSearch()
>>> results = searcher.search_by_syntax(
...     pattern="NP V NP PP[to]",  # Ditransitive pattern
...     allow_wildcards=True
... )
>>> # Returns 52 matching patterns

Enhanced Cross-References

New MappingIndex class provides transitive reference resolution with caching:

Discover indirect mappings between datasets
File-based caching for performance
Configurable hop limits for transitive searches

🔧 CLI Enhancements

New Commands

glazing search query --fuzzy - Enable fuzzy matching for typo correction
glazing search fuzzy - Search with fuzzy matching and typo correction
glazing search syntax - Search for syntactic patterns with morphological features
glazing search args - Search for arguments with specific properties
glazing search role - Search for semantic roles across datasets
glazing search roles - Search for semantic roles with specific properties
glazing search elements - Search for frame elements with specific properties
glazing search relations - Search for synsets with specific relations
glazing xref resolve - Resolve cross-references for an entity
glazing xref extract - Extract cross-references from all datasets
glazing xref clear-cache - Clear cached cross-references

Improvements

Progress indicators for long-running operations
Better error messages and help text

🐳 Docker Support

New Dockerfile for containerized deployment:

# Use official Python 3.13 slim image as base
FROM python:3.13-slim

# Set working directory
WORKDIR /app

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1

# Install system dependencies required for building packages
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Copy only requirements first to leverage Docker cache
COPY pyproject.toml README.md ./
COPY src/glazing/__version__.py src/glazing/

# Install package dependencies
RUN pip install --upgrade pip && \
    pip install -e .

# Copy the rest of the application code
COPY src/ src/
COPY tests/ tests/

# Create data directory for datasets
RUN mkdir -p /data

# Set environment variable for data directory
ENV GLAZING_DATA_DIR=/data

# Initialize datasets during build
RUN glazing init --data-dir /data

# Expose data directory as volume
VOLUME ["/data"]

# Set the entrypoint to the glazing CLI
ENTRYPOINT ["glazing"]

# Default command shows help
CMD ["--help"]

📚 New Documentation

User Guides:
- Fuzzy search guide (docs/user-guide/fuzzy-search.md)
- Syntax search guide (docs/user-guide/syntax-search.md)
API Documentation:
- Symbol parsers for all datasets (4 files)
- Fuzzy matching utilities
- Syntax pattern models
Data Formats: JSON schemas added to data-formats.md

📦 Technical Changes

New Modules

glazing.syntax - Syntactic pattern models and parser
glazing.utils.fuzzy_match - String similarity functions
glazing.utils.ranking - Result ranking utilities
glazing.references.index - Mapping index implementation
glazing.symbols - Symbol type definitions
glazing.{dataset}.symbol_parser - Parse dataset identifiers (4 modules)

Dependencies

Added python-Levenshtein>=0.20.0 for fuzzy matching

📦 Updating

pip install --upgrade glazing==0.2.0

Requirements:

Python 3.13 or higher
New dependency: python-Levenshtein>=0.20.0

🔧 Compatibility

Fully backwards compatible with v0.1.x
No breaking changes
All existing APIs continue to work

📜 Citation

If you use this version of Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/aaronstevenwhite/glazing},
  version = {0.2.0},
  doi = {10.5281/zenodo.17185626}
}

🙏 Acknowledgments

This project is funded by the National Science Foundation (BCS-2040831).

📮 Contact

Author: Aaron Steven White (aaron.white@rochester.edu)
Repository: https://github.com/aaronstevenwhite/glazing
Documentation: https://glazing.readthedocs.io
PyPI: https://pypi.org/project/glazing/

Full Changelog: v0.1.1...v0.2.0

Assets 2

27 Sep 23:57

aaronstevenwhite

v0.1.1

bad03af

Glazing v0.1.1

This patch release fixes a critical usability issue with CLI search commands and includes documentation improvements.

🐛 What's Fixed

CLI Commands Now Work As Documented

The search commands now work without requiring the --data-dir argument, matching the examples in our documentation.

Before v0.1.1:

$ glazing search query "abandon"
Error: Missing option '--data-dir'.

After v0.1.1:

$ glazing search query "abandon"
                          Search Results for 'abandon'                          
┏━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Dataset  ┃ Type     ┃ ID/Name    ┃ Description                       ┃ Score ┃
┡━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ VERBNET  │ class    │ leave-51.2 │ VerbNet class with 1 members      │ 1.00  │
│ PROPBANK │ frameset │ abandon    │ PropBank frameset with 3 rolesets │ 1.00  │
└──────────┴──────────┴────────────┴───────────────────────────────────┴───────┘

Affected Commands

All search commands now automatically use the default data directory (~/.local/share/glazing/converted/):

glazing search query - Search across datasets
glazing search entity - Get entity details
glazing search role - Search for semantic roles
glazing search cross-ref - Find cross-references

📝 Documentation Improvements

This release incorporates the documentation improvements that were merged after v0.1.0:

More concise and natural language throughout documentation
Removed unnecessary promotional language
Improved docstring clarity
50% reduction in documentation size while maintaining all essential information

📦 Updating

To get the latest version:

pip install --upgrade glazing

Requirements remain unchanged:

Python 3.13 or higher
~5MB for the package
~54MB for raw downloaded datasets
~130MB total disk space after conversion

🔧 Technical Notes

Search commands now use get_default_data_path() from the glazing.initialize module
Default path respects XDG Base Directory specification ($XDG_DATA_HOME if set)
Commands with explicit --data-dir arguments continue to work as before
No breaking changes - fully backward compatible

📜 Citation

If you use Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/aaronstevenwhite/glazing},
  version = {0.1.1}
}

🙏 Acknowledgments

This project is funded by the National Science Foundation (BCS-2040831).

📮 Contact

Author: Aaron Steven White (aaron.white@rochester.edu)
Repository: https://github.com/aaronstevenwhite/glazing
Documentation: https://glazing.readthedocs.io
PyPI: https://pypi.org/project/glazing/

For the complete changelog, see CHANGELOG.md

Assets 2

23 Sep 14:36

aaronstevenwhite

v0.1.0

fede602

Glazing v0.1.0

We're excited to announce the first release of Glazing, a package containing unified data models and interfaces for syntactic and semantic frame ontologies.

Glazing provides a modern Python interface for working with four major linguistic resources: FrameNet, PropBank, VerbNet, and WordNet.

✨ Highlights

🚀 One-command setup: Get started immediately with glazing init to download and prepare all datasets
📦 Type-safe models: Comprehensive Pydantic v2 models for all data structures
🔍 Unified search: Query across all datasets with a consistent API
🔗 Cross-references: Automatic mapping between resources
💾 Efficient storage: JSON Lines format with streaming support
🐍 Modern Python: Full type hints and Python 3.13+ support

📚 Supported Datasets

FrameNet 1.7: Semantic frames, frame elements, lexical units, and frame relations
PropBank 3.4: Framesets, rolesets, and semantic role labels
VerbNet 3.4: Verb classes, thematic roles, syntactic frames, and GL semantics
WordNet 3.1: Synsets, lemmas, lexical relations, and morphological processing

🚀 Quick Start

# Install the package
pip install glazing

# Initialize all datasets (one-time setup)
glazing init

# Start using the Python API
python -c "
from glazing.search import UnifiedSearch
search = UnifiedSearch()
results = search.search('give')
for r in results[:5]:
    print(f'{r.dataset}: {r.name}')
"

🛠️ Installation

pip install glazing

Requirements:

Python 3.13 or higher
~5MB for the package
~54MB for raw downloaded datasets
~130MB total disk space after conversion (includes both raw and converted data)

📖 Features

Command-Line Interface

glazing init - Initialize all datasets with a single command
glazing download - Download individual or all datasets
glazing convert - Convert from source formats to JSON Lines
glazing search - Search across datasets with various filters

Python API

from glazing.framenet.loader import FrameNetLoader
from glazing.verbnet.loader import VerbNetLoader

# Loaders automatically load data after 'glazing init'
fn_loader = FrameNetLoader()
frames = fn_loader.frames

vn_loader = VerbNetLoader()  
verb_classes = list(vn_loader.classes.values())

Cross-Reference Resolution

from glazing.references.extractor import ReferenceExtractor
from glazing.verbnet.loader import VerbNetLoader
from glazing.propbank.loader import PropBankLoader

# Extract references
extractor = ReferenceExtractor()
extractor.extract_verbnet_references(list(vn_loader.classes.values()))
extractor.extract_propbank_references(list(pb_loader.framesets.values()))

# Access cross-references
if "give.01" in extractor.propbank_refs:
    refs = extractor.propbank_refs["give.01"]
    vn_classes = refs.get_verbnet_classes()

📝 Documentation

📊 Technical Details

Performance: Streaming support for lazy loading datasets
Format: JSON Lines for efficient storage and processing
Validation: Automatic validation using Pydantic models
Caching: Efficient caching for repeated operations
Type Safety: Full type hints for better IDE support

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines.

📜 Citation

If you use Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/aaronstevenwhite/glazing},
  version = {0.1.0}
}

🙏 Acknowledgments

This project was funded by the National Science Foundation (BCS-2040831) and builds upon the foundational work of the FrameNet, PropBank, VerbNet, and WordNet teams.

📮 Contact

Author: Aaron Steven White (aaron.white@rochester.edu)
Repository: https://github.com/aaronstevenwhite/glazing
Documentation: https://glazing.readthedocs.io
PyPI: https://pypi.org/project/glazing/

Thank you for using Glazing! We're excited to see what you build with it. 🎉

Assets 2

Releases: FACTSlab/glazing

Glazing v0.2.2

✨ What's New

PyPI Publish Workflow

Converter-to-Loader Round-Trip Integration Tests

FrameNet Enrichments

Supplementary Data Conversion

🔄 What's Changed

WordNet Converter and Loader Rewrite

Relaxed Lemma Validation

Repository Migration

🐛 What's Fixed

VerbNet Converter Mappings

PropBank AMR-UMR-91 Support

Ruff Lint Compliance

📦 Updating

📜 Citation

🙏 Acknowledgments

📮 Contact

Uh oh!

Glazing v0.2.1

🐛 What's Fixed

FrameNet Lexical Units Now Properly Loaded

Real-World Data Support

📦 Updating

🔧 Technical Changes

Converter Improvements

Validation Updates

Data Completeness

🔧 Compatibility

📜 Citation

🙏 Acknowledgments

📮 Contact

Uh oh!

Glazing v0.2.0

✨ What's New

Symbol Parsing

Fuzzy String Matching

Syntax-Based Search

Enhanced Cross-References

🔧 CLI Enhancements

New Commands

Improvements

🐳 Docker Support

📚 New Documentation

📦 Technical Changes

New Modules

Dependencies

📦 Updating

🔧 Compatibility

📜 Citation

🙏 Acknowledgments

📮 Contact

Uh oh!

Glazing v0.1.1

🐛 What's Fixed

CLI Commands Now Work As Documented

Affected Commands

📝 Documentation Improvements

📦 Updating

🔧 Technical Notes

📜 Citation

🙏 Acknowledgments

📮 Contact

Uh oh!

Glazing v0.1.0

✨ Highlights

📚 Supported Datasets

🚀 Quick Start

🛠️ Installation

📖 Features

Command-Line Interface

Python API

Cross-Reference Resolution

📝 Documentation

📊 Technical Details

🤝 Contributing

📜 Citation

🙏 Acknowledgments

📮 Contact