Skip to content

plasmate-labs/som-action

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Plasmate SOM Action

GitHub Action

A GitHub Action that fetches a web page with Plasmate and outputs the Semantic Object Model (SOM). SOM uses 4x fewer tokens than raw HTML while preserving page structure, element roles, and interactive affordances.

Usage

- uses: plasmate-labs/som-action@v1
  with:
    url: https://example.com
  id: som

- run: echo "Page title: ${{ steps.som.outputs.title }}"

Inputs

Input Required Default Description
url URL to fetch
format json Output format (json or text)

Outputs

Output Description
som The full SOM JSON output
title Page title extracted from the SOM
tokens Approximate token count of the SOM

Examples

Fetch and use in subsequent steps

name: Scrape and Process

on:
  workflow_dispatch:
    inputs:
      url:
        description: 'URL to scrape'
        required: true

jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - uses: plasmate-labs/som-action@v1
        with:
          url: ${{ github.event.inputs.url }}
        id: som

      - name: Display results
        run: |
          echo "Title: ${{ steps.som.outputs.title }}"
          echo "Tokens: ${{ steps.som.outputs.tokens }}"

      - name: Save SOM
        run: echo '${{ steps.som.outputs.som }}' > som.json

      - uses: actions/upload-artifact@v4
        with:
          name: som-output
          path: som.json

Scheduled monitoring

name: Monitor Page Changes

on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours

jobs:
  monitor:
    runs-on: ubuntu-latest
    steps:
      - uses: plasmate-labs/som-action@v1
        with:
          url: https://example.com/status
        id: som

      - name: Check for changes
        run: |
          echo "Current title: ${{ steps.som.outputs.title }}"
          # Add your change detection logic here

How It Works

  1. Builds a Docker container with Plasmate installed
  2. Fetches the specified URL using plasmate fetch
  3. Parses the SOM JSON to extract metadata
  4. Outputs the full SOM and extracted fields as step outputs

License

MIT


Part of the Plasmate Ecosystem

Engine plasmate - The browser engine for agents
MCP plasmate-mcp - Claude Code, Cursor, Windsurf
Extension plasmate-extension - Chrome cookie export
SDKs Python / Node.js / Go / Rust
Frameworks LangChain / CrewAI / AutoGen / Smolagents
Tools Scrapy / Audit / A11y / GitHub Action
Resources Awesome Plasmate / Notebooks / Benchmarks
Docs docs.plasmate.app
W3C Web Content Browser for AI Agents

About

GitHub Action: fetch a web page with Plasmate and output SOM JSON. Use in CI for content monitoring and testing.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors