Releases: FACTSlab/glazing
Glazing v0.2.2
This release rewrites and fixes converters/loaders for all four resources, migrates the repository to the FACTSlab organization, and adds CI/CD infrastructure.
✨ What's New
PyPI Publish Workflow
Added a GitHub Actions workflow that automatically publishes to PyPI when a new tag is created, using trusted publishers (OIDC) for secure, tokenless authentication.
Converter-to-Loader Round-Trip Integration Tests
Added comprehensive integration tests that verify the full converter → JSONL → loader round-trip for all four resources (FrameNet, PropBank, VerbNet, WordNet).
FrameNet Enrichments
Added frame relation, lexical unit enrichment, semantic type, and fulltext annotation parsing to the FrameNet converter and loader. Supplementary data files (framenet_semtypes.jsonl, framenet_fulltext.jsonl) are now converted during initialization.
Supplementary Data Conversion
glazing init now converts supplementary WordNet data files (wordnet_senses.jsonl, wordnet_exceptions.jsonl) and FrameNet data files (framenet_semtypes.jsonl, framenet_fulltext.jsonl).
🔄 What's Changed
WordNet Converter and Loader Rewrite
Rewrote the WordNet converter and loader to use enriched single-file JSONL output with supplementary sense and exception files, improving data completeness and load performance.
Relaxed Lemma Validation
Lemma validation now allows uppercase letters, digits at the start, and dots — supporting proper nouns (e.g., "Dog"), abbreviations (e.g., "Dr."), and numeric prefixes (e.g., "123abandon").
Repository Migration
Moved the repository from aaronstevenwhite/glazing to factslab/glazing. All URLs across documentation, CI, and configuration have been updated.
🐛 What's Fixed
VerbNet Converter Mappings
Fixed the VerbNet converter to populate framenet_mappings and propbank_mappings from member attributes, which were previously left empty.
PropBank AMR-UMR-91 Support
Added AMR-UMR-91 roleset conversion and fixed XML edge cases in the PropBank converter.
Ruff Lint Compliance
Fixed PLW0108 (unnecessary lambdas) and PLC0207 (unnecessary string splits) flagged by newer ruff versions.
📦 Updating
To get the latest version:
pip install --upgrade glazingImportant: You must reconvert datasets to benefit from the enriched output:
glazing init --forceRequirements remain unchanged:
- Python 3.13 or higher
📜 Citation
If you use this version of Glazing in your research, please cite:
@software{glazing2025,
author = {White, Aaron Steven},
title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
year = {2025},
url = {https://github.com/factslab/glazing},
version = {0.2.2},
doi = {10.5281/zenodo.17467082}
}🙏 Acknowledgments
This project is funded by the National Science Foundation (BCS-2040831).
📮 Contact
- Author: Aaron Steven White (aaron.white@rochester.edu)
- Repository: https://github.com/factslab/glazing
- Documentation: https://glazing.readthedocs.io
- PyPI: https://pypi.org/project/glazing/
Full Changelog: v0.2.1...v0.2.2
Glazing v0.2.1
This patch release fixes a critical data completeness issue where FrameNet lexical units were not being loaded during dataset conversion.
Fixes: #4
🐛 What's Fixed
FrameNet Lexical Units Now Properly Loaded
Lexical units are now correctly parsed from luIndex.xml and associated with their frames during conversion. This fixes a critical issue where all frames had empty lexical_units fields despite the raw FrameNet data containing 13,575 lexical units.
Before v0.2.1:
>>> from glazing.framenet.loader import FrameNetLoader
>>> loader = FrameNetLoader()
>>> index = loader.build_frame_index(loader.frames)
>>> frame = index.get_frame_by_name("Abandonment")
>>> len(frame.lexical_units)
0 # All frames had empty lexical units!After v0.2.1:
>>> from glazing.framenet.loader import FrameNetLoader
>>> loader = FrameNetLoader()
>>> index = loader.build_frame_index(loader.frames)
>>> frame = index.get_frame_by_name("Abandonment")
>>> len(frame.lexical_units)
5 # Lexical units now properly loaded
>>> frame.lexical_units[0].name
'abandon.v'
>>> frame.lexical_units[0].pos
'V'Real-World Data Support
Validation patterns have been updated to handle all actual FrameNet data, including:
- Proper nouns (e.g., "April.n", "Monday.n")
- Multi-word expressions (e.g., "a bit.n", "give up.v")
- Special characters (e.g., "(can't) help.v", "American [N and S Am].n")
- Complex lexeme names (e.g., "Boxing Day", "Scud-B missile")
Conversion Success Rate:
- Before: ~12,400 out of 13,575 LUs successfully parsed (91.3%)
- After: 13,572 out of 13,575 LUs successfully parsed (99.98%)
📦 Updating
To get the latest version:
pip install --upgrade glazingImportant: You must reconvert FrameNet data to benefit from this fix:
glazing init --forceRequirements remain unchanged:
- Python 3.13 or higher
🔧 Technical Changes
Converter Improvements
- Added
_parse_lu_from_index()method to parse lexical units fromluIndex.xml - Added
convert_lu_index_file()method to convert entire LU index - Modified
convert_frames_directory()to load LUs and associate with frames byframe_id - Improved CLI output to show "Converting FrameNet frames and lexical units..."
Validation Updates
- Relaxed
LU_NAME_PATTERNfrom strict format to permissive^.+\.[a-z]+$ - Relaxed
LEXEME_NAME_PATTERNfrom strict format to permissive^.+$ - Updated tests to match new validation behavior
Data Completeness
- Approximately 13,500 lexical units now correctly associated with their frames
- Enables querying frames by lexical unit name via the frame index
- Full metadata preserved: POS tags, annotation status, sentence counts, lexeme structures
🔧 Compatibility
- Fully backwards compatible with v0.2.0
- No API changes
- Existing code continues to work, but will now have access to lexical unit data
- Dataset reconversion required to populate lexical units
📜 Citation
If you use this version of Glazing in your research, please cite:
@software{glazing2025,
author = {White, Aaron Steven},
title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
year = {2025},
url = {https://github.com/aaronstevenwhite/glazing},
version = {0.2.1},
doi = {10.5281/zenodo.17185626}
}🙏 Acknowledgments
This project is funded by the National Science Foundation (BCS-2040831).
📮 Contact
- Author: Aaron Steven White (aaron.white@rochester.edu)
- Repository: https://github.com/aaronstevenwhite/glazing
- Documentation: https://glazing.readthedocs.io
- PyPI: https://pypi.org/project/glazing/
Full Changelog: v0.2.0...v0.2.1
Glazing v0.2.0
This minor release adds symbol parsing, fuzzy search, and syntax-based search capabilities to Glazing.
✨ What's New
Symbol Parsing
Parse linguistic identifiers from all four datasets to extract structured information:
>>> from glazing.propbank.symbol_parser import parse_roleset_id
>>> result = parse_roleset_id("give.01")
>>> result.lemma
'give'
>>> result.sense_number
1Fuzzy String Matching
Find approximate matches using Levenshtein distance:
>>> from glazing.search import UnifiedSearch
>>> searcher = UnifiedSearch()
>>> results = searcher.search_with_fuzzy("comunication", fuzzy_threshold=0.8)
>>> # Returns results including "Communication" framesSyntax-Based Search
Search across datasets using syntactic patterns:
>>> from glazing.search import UnifiedSearch
>>> searcher = UnifiedSearch()
>>> results = searcher.search_by_syntax(
... pattern="NP V NP PP[to]", # Ditransitive pattern
... allow_wildcards=True
... )
>>> # Returns 52 matching patternsEnhanced Cross-References
New MappingIndex class provides transitive reference resolution with caching:
- Discover indirect mappings between datasets
- File-based caching for performance
- Configurable hop limits for transitive searches
🔧 CLI Enhancements
New Commands
glazing search query --fuzzy- Enable fuzzy matching for typo correctionglazing search fuzzy- Search with fuzzy matching and typo correctionglazing search syntax- Search for syntactic patterns with morphological featuresglazing search args- Search for arguments with specific propertiesglazing search role- Search for semantic roles across datasetsglazing search roles- Search for semantic roles with specific propertiesglazing search elements- Search for frame elements with specific propertiesglazing search relations- Search for synsets with specific relationsglazing xref resolve- Resolve cross-references for an entityglazing xref extract- Extract cross-references from all datasetsglazing xref clear-cache- Clear cached cross-references
Improvements
- Progress indicators for long-running operations
- Better error messages and help text
🐳 Docker Support
New Dockerfile for containerized deployment:
# Use official Python 3.13 slim image as base
FROM python:3.13-slim
# Set working directory
WORKDIR /app
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
# Install system dependencies required for building packages
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
g++ \
&& rm -rf /var/lib/apt/lists/*
# Copy only requirements first to leverage Docker cache
COPY pyproject.toml README.md ./
COPY src/glazing/__version__.py src/glazing/
# Install package dependencies
RUN pip install --upgrade pip && \
pip install -e .
# Copy the rest of the application code
COPY src/ src/
COPY tests/ tests/
# Create data directory for datasets
RUN mkdir -p /data
# Set environment variable for data directory
ENV GLAZING_DATA_DIR=/data
# Initialize datasets during build
RUN glazing init --data-dir /data
# Expose data directory as volume
VOLUME ["/data"]
# Set the entrypoint to the glazing CLI
ENTRYPOINT ["glazing"]
# Default command shows help
CMD ["--help"]📚 New Documentation
- User Guides:
- Fuzzy search guide (
docs/user-guide/fuzzy-search.md) - Syntax search guide (
docs/user-guide/syntax-search.md)
- Fuzzy search guide (
- API Documentation:
- Symbol parsers for all datasets (4 files)
- Fuzzy matching utilities
- Syntax pattern models
- Data Formats: JSON schemas added to
data-formats.md
📦 Technical Changes
New Modules
glazing.syntax- Syntactic pattern models and parserglazing.utils.fuzzy_match- String similarity functionsglazing.utils.ranking- Result ranking utilitiesglazing.references.index- Mapping index implementationglazing.symbols- Symbol type definitionsglazing.{dataset}.symbol_parser- Parse dataset identifiers (4 modules)
Dependencies
- Added
python-Levenshtein>=0.20.0for fuzzy matching
📦 Updating
pip install --upgrade glazing==0.2.0Requirements:
- Python 3.13 or higher
- New dependency:
python-Levenshtein>=0.20.0
🔧 Compatibility
- Fully backwards compatible with v0.1.x
- No breaking changes
- All existing APIs continue to work
📜 Citation
If you use this version of Glazing in your research, please cite:
@software{glazing2025,
author = {White, Aaron Steven},
title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
year = {2025},
url = {https://github.com/aaronstevenwhite/glazing},
version = {0.2.0},
doi = {10.5281/zenodo.17185626}
}🙏 Acknowledgments
This project is funded by the National Science Foundation (BCS-2040831).
📮 Contact
- Author: Aaron Steven White (aaron.white@rochester.edu)
- Repository: https://github.com/aaronstevenwhite/glazing
- Documentation: https://glazing.readthedocs.io
- PyPI: https://pypi.org/project/glazing/
Full Changelog: v0.1.1...v0.2.0
Glazing v0.1.1
This patch release fixes a critical usability issue with CLI search commands and includes documentation improvements.
🐛 What's Fixed
CLI Commands Now Work As Documented
The search commands now work without requiring the --data-dir argument, matching the examples in our documentation.
Before v0.1.1:
$ glazing search query "abandon"
Error: Missing option '--data-dir'.After v0.1.1:
$ glazing search query "abandon"
Search Results for 'abandon'
┏━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Dataset ┃ Type ┃ ID/Name ┃ Description ┃ Score ┃
┡━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ VERBNET │ class │ leave-51.2 │ VerbNet class with 1 members │ 1.00 │
│ PROPBANK │ frameset │ abandon │ PropBank frameset with 3 rolesets │ 1.00 │
└──────────┴──────────┴────────────┴───────────────────────────────────┴───────┘Affected Commands
All search commands now automatically use the default data directory (~/.local/share/glazing/converted/):
glazing search query- Search across datasetsglazing search entity- Get entity detailsglazing search role- Search for semantic rolesglazing search cross-ref- Find cross-references
📝 Documentation Improvements
This release incorporates the documentation improvements that were merged after v0.1.0:
- More concise and natural language throughout documentation
- Removed unnecessary promotional language
- Improved docstring clarity
- 50% reduction in documentation size while maintaining all essential information
📦 Updating
To get the latest version:
pip install --upgrade glazingRequirements remain unchanged:
- Python 3.13 or higher
- ~5MB for the package
- ~54MB for raw downloaded datasets
- ~130MB total disk space after conversion
🔧 Technical Notes
- Search commands now use
get_default_data_path()from theglazing.initializemodule - Default path respects XDG Base Directory specification (
$XDG_DATA_HOMEif set) - Commands with explicit
--data-dirarguments continue to work as before - No breaking changes - fully backward compatible
📜 Citation
If you use Glazing in your research, please cite:
@software{glazing2025,
author = {White, Aaron Steven},
title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
year = {2025},
url = {https://github.com/aaronstevenwhite/glazing},
version = {0.1.1}
}🙏 Acknowledgments
This project is funded by the National Science Foundation (BCS-2040831).
📮 Contact
- Author: Aaron Steven White (aaron.white@rochester.edu)
- Repository: https://github.com/aaronstevenwhite/glazing
- Documentation: https://glazing.readthedocs.io
- PyPI: https://pypi.org/project/glazing/
For the complete changelog, see CHANGELOG.md
Glazing v0.1.0
We're excited to announce the first release of Glazing, a package containing unified data models and interfaces for syntactic and semantic frame ontologies.
Glazing provides a modern Python interface for working with four major linguistic resources: FrameNet, PropBank, VerbNet, and WordNet.
✨ Highlights
- 🚀 One-command setup: Get started immediately with
glazing initto download and prepare all datasets - 📦 Type-safe models: Comprehensive Pydantic v2 models for all data structures
- 🔍 Unified search: Query across all datasets with a consistent API
- 🔗 Cross-references: Automatic mapping between resources
- 💾 Efficient storage: JSON Lines format with streaming support
- 🐍 Modern Python: Full type hints and Python 3.13+ support
📚 Supported Datasets
- FrameNet 1.7: Semantic frames, frame elements, lexical units, and frame relations
- PropBank 3.4: Framesets, rolesets, and semantic role labels
- VerbNet 3.4: Verb classes, thematic roles, syntactic frames, and GL semantics
- WordNet 3.1: Synsets, lemmas, lexical relations, and morphological processing
🚀 Quick Start
# Install the package
pip install glazing
# Initialize all datasets (one-time setup)
glazing init
# Start using the Python API
python -c "
from glazing.search import UnifiedSearch
search = UnifiedSearch()
results = search.search('give')
for r in results[:5]:
print(f'{r.dataset}: {r.name}')
"🛠️ Installation
pip install glazingRequirements:
- Python 3.13 or higher
- ~5MB for the package
- ~54MB for raw downloaded datasets
- ~130MB total disk space after conversion (includes both raw and converted data)
📖 Features
Command-Line Interface
glazing init- Initialize all datasets with a single commandglazing download- Download individual or all datasetsglazing convert- Convert from source formats to JSON Linesglazing search- Search across datasets with various filters
Python API
from glazing.framenet.loader import FrameNetLoader
from glazing.verbnet.loader import VerbNetLoader
# Loaders automatically load data after 'glazing init'
fn_loader = FrameNetLoader()
frames = fn_loader.frames
vn_loader = VerbNetLoader()
verb_classes = list(vn_loader.classes.values())Cross-Reference Resolution
from glazing.references.extractor import ReferenceExtractor
from glazing.verbnet.loader import VerbNetLoader
from glazing.propbank.loader import PropBankLoader
# Extract references
extractor = ReferenceExtractor()
extractor.extract_verbnet_references(list(vn_loader.classes.values()))
extractor.extract_propbank_references(list(pb_loader.framesets.values()))
# Access cross-references
if "give.01" in extractor.propbank_refs:
refs = extractor.propbank_refs["give.01"]
vn_classes = refs.get_verbnet_classes()📝 Documentation
📊 Technical Details
- Performance: Streaming support for lazy loading datasets
- Format: JSON Lines for efficient storage and processing
- Validation: Automatic validation using Pydantic models
- Caching: Efficient caching for repeated operations
- Type Safety: Full type hints for better IDE support
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines.
📜 Citation
If you use Glazing in your research, please cite:
@software{glazing2025,
author = {White, Aaron Steven},
title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
year = {2025},
url = {https://github.com/aaronstevenwhite/glazing},
version = {0.1.0}
}🙏 Acknowledgments
This project was funded by the National Science Foundation (BCS-2040831) and builds upon the foundational work of the FrameNet, PropBank, VerbNet, and WordNet teams.
📮 Contact
- Author: Aaron Steven White (aaron.white@rochester.edu)
- Repository: https://github.com/aaronstevenwhite/glazing
- Documentation: https://glazing.readthedocs.io
- PyPI: https://pypi.org/project/glazing/
Thank you for using Glazing! We're excited to see what you build with it. 🎉