Getting Started & Usage Guide

Quickstart

I. Create Pecha

To create a new Pecha (an annotated text corpus), you can use the Pecha.create method directly, or use a parser (e.g., for DOCX files):

from pathlib import Path
from openpecha.pecha import Pecha

# Create an empty Pecha in a given output directory
output_path = Path("./output")
pecha = Pecha.create(output_path)

Or, to create a Pecha after parsing:

from openpecha.pecha.parsers.docx.root import DocxRootParser
from openpecha.pecha.layer import AnnotationType

parser = DocxRootParser()
pecha, annotation_path = parser.parse(
    input="/path/to/file.docx",
    annotation_type=AnnotationType.SEGMENTATION,
    metadata={"title": {"en": "Sample Title"}, "language": "bo"},
    output_path=Path("/output_path/")
)

II. Load Pecha

You can load an existing Pecha either from a local path after downloading from the openpecha backend:

from openpecha.pecha import Pecha
from pathlib import Path

# Load from local path
pecha = Pecha.from_path(Path("/path/to/pecha"))

III. Pecha Attributes

A Pecha object exposes several useful attributes:

pecha.id: The Pecha's unique ID, generated from 8 digits UUID
pecha.pecha_path: Filesystem path to the Pecha
pecha.metadata: Metadata object (see below)
pecha.bases: Dictionary of base file names to text
pecha.layers: Dictionary of annotation layers

IV. Metadata

Each Pecha has a metadata attribute, which is a PechaMetaData object. Example fields include:

id: Pecha ID
title: Title (can be a dict with language keys)
author: Author(s)
language: Language code (e.g., 'bo', 'en')
parser: Name of the parser used
initial_creation_type: How the Pecha was created (e.g., 'google_docx', 'ocr')
source_metadata: Additional source info
copyright, licence, etc.

You can update metadata by passing a dictionary:

pecha.set_metadata({
    "title": {"en": "New Title"},
    "author": "Author Name",
    # ... other fields ...
})

V. Base File

The base file is the plain text of the work. You can access and set base files:

# Get base text by name
base_text = pecha.get_base("base1")

# Set a new base text
pecha.set_base("This is the text.", base_name="base1")

VI. Annotations

Annotations are stored in layers, each corresponding to a type (segmentation, alignment, etc.).

To access all layers for a base:

for layer_name, layer_store in pecha.get_layers("base1"):
    print(layer_name, layer_store)

To add a new annotation layer:

from openpecha.pecha.layer import AnnotationType
layer, layer_path = pecha.add_layer("base1", AnnotationType.SEGMENTATION)

To add an annotation to a layer:

from openpecha.pecha.annotations import Span, SegmentationAnnotation
ann = SegmentationAnnotation(span=Span(start=0, end=10), index=1)
pecha.add_annotation(layer, ann, AnnotationType.SEGMENTATION)
layer.save()

To get annotation data:

from openpecha.pecha import get_anns
anns = get_anns(layer)
for ann in anns:
    print(ann)

VII. Alignment Transfer

Alignment transfer allows you to map and serialize aligned segments between a root text and a commentary or translation Pecha. This is useful for exporting how commentary or translation segments correspond to the root text.

Commentary Alignment Transfer

To transfer alignment from a root Pecha to a commentary Pecha:

from openpecha.pecha import Pecha
from openpecha.alignment.commentary_transfer import CommentaryAlignmentTransfer

# Load the root and commentary Pechas
root_pecha = Pecha.from_path("/path/to/root_pecha")
commentary_pecha = Pecha.from_path("/path/to/commentary_pecha")

# Specify the alignment layer IDs (relative to the layer directory)
root_alignment_id = "B5FE/alignment-6707.json"
commentary_alignment_id = "B014/alignment-2127.json"

# Get the transferred commentary segments as a list of strings
transfer = CommentaryAlignmentTransfer()
aligned_commentary = transfer.get_serialized_commentary(
    root_pecha,
    root_alignment_id,
    commentary_pecha,
    commentary_alignment_id,
)

for segment in aligned_commentary:
    print(segment)

If your commentary Pecha also has a segmentation layer, you can use:

commentary_segmentation_id = "B014/segmentation-33FC.json"
aligned_commentary = transfer.get_serialized_commentary_segmentation(
    root_pecha,
    root_alignment_id,
    commentary_pecha,
    commentary_alignment_id,
    commentary_segmentation_id,
)

Translation Alignment Transfer

For translation alignment transfer, use the TranslationAlignmentTransfer class:

from openpecha.pecha import Pecha
from openpecha.alignment.translation_transfer import TranslationAlignmentTransfer

root_pecha = Pecha.from_path("/path/to/root_pecha")
translation_pecha = Pecha.from_path("/path/to/translation_pecha")

root_alignment_id = "B5FE/alignment-6707.json"
translation_alignment_id = "B014/alignment-2127.json"

transfer = TranslationAlignmentTransfer()
aligned_translation = transfer.get_serialized_translation_alignment(
    root_pecha,
    root_alignment_id,
    translation_pecha,
    translation_alignment_id,
)

for segment in aligned_translation:
    print(segment)

If your translation Pecha also has a segmentation layer, use:

translation_segmentation_id = "B014/segmentation-33FC.json"
aligned_translation = transfer.get_serialized_translation_segmentation(
    root_pecha,
    root_alignment_id,
    translation_pecha,
    translation_alignment_id,
    translation_segmentation_id,
)

Notes

The alignment and segmentation layer IDs are typically found in the layers directory of each Pecha.
The output is a list of strings, each representing a segment in the commentary or translation, aligned to the root text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started & Usage Guide

Quickstart

I. Create Pecha

II. Load Pecha

III. Pecha Attributes

IV. Metadata

V. Base File

VI. Annotations

VII. Alignment Transfer

Commentary Alignment Transfer

Translation Alignment Transfer

Notes

FilesExpand file tree

usage.md

Latest commit

History

usage.md

File metadata and controls

Getting Started & Usage Guide

Quickstart

I. Create Pecha

II. Load Pecha

III. Pecha Attributes

IV. Metadata

V. Base File

VI. Annotations

VII. Alignment Transfer

Commentary Alignment Transfer

Translation Alignment Transfer

Notes