Skip to content

display dictionary statistics for current media#943

Merged
ShanaryS merged 20 commits intomainfrom
dictionary-statistics
Apr 19, 2026
Merged

display dictionary statistics for current media#943
ShanaryS merged 20 commits intomainfrom
dictionary-statistics

Conversation

@ShanaryS
Copy link
Copy Markdown
Collaborator

@ShanaryS ShanaryS commented Mar 18, 2026

This adds statistics about the current media to aide language learning and sentence mining. I also refactored SubtitleColoring to SubtitleAnnotations and added support for batching /termEntries since we rely on frequency so much now (disabled until Yomitan PR is accepted).

I added AsyncSemaphore to all usage of /termEntries since it gets pretty heavy to generate the statistics (without the batching PR). This means that lemmatize() can now return undefined but that's only if we reset the Yomitan instance while there is a queue (so practically always still be an array). I updated it to support different priority levels, so the /termEntries batch is the highest, then just lemmatize(), then frequency(). If batching is available the then there should never be a single /termEntries call similar to how tokenizeBulk() prevents it.

I also fully comprehend what TabRegistry is doing now so the communication should be robust. I also update the build Anki cache state messages to push to all asbplayer instances, instead of the extension and requested tab (I didn't know how to do it before).

The current UI for the statistics is entirely a placeholder, even it being its own settings tab. Feel free to remove it completely and design what you think is best.


  • The statistics aren't particularly tied to subtitles, I used sentences everywhere. A subtitle event is considered a sentence (since that's how asbplayer mines) but this leaves it flexible for the future.
  • Uses DictionaryProvider to handle passing around statistics though only counting the user's total words requires the dictionary
  • The statistics is owned by SubtitleAnnotation which ingest the subtitles into the DictionaryStatistics class. It also relays snapshots when requested through the UI.
    • Work was done to ensure that statistics stay isolated if there is multiple instances available (through the use of mediaId). It should not be possible for the UI to show statistics for the wrong element (see dictionary-handler.ts, other contexts are trivially safe). The only exception is viewing through the extension popup or settings page since these don't have a clear preference but it's also not really misleading.
    • mediaId is Asbplayer.id when App owned and video.src when extension owned. This is good enough for our uses cases but there are small chances of collisions (but generally not a big deal)
  • When the settings is opened when there is subtitles loaded, it will jump to the Statistics tab since it's likely to be the most relevant.
  • Similarly when seeking or mining from the extension options page, it will focus to the relevant tab. Mining from the popup will close the popup/settings.
  • Statistics can be auto generated or triggered from the Statistics tab. A keybind also exists to open straight to the tab.

I won't go into detail about what's tracked, it should be clear from the Demo. Again, the UI/UX is simply a placeholder, we could move to a toolbar or some other design after this PR is reviewed. I also would like more feedback on anything else worth tracking.

asbplayer-statistics.mp4

One kind of statistics I've considered but decided against is how this session will affect your upcoming Anki reviews. This would require a db migration to handle efficiently (though real time queries won't be too bad since scope is limited). But more importantly the FSRS algorithm should already account for a user's immersion (assuming it's consistent enough) since it will just appear like the user has a good memory (even though its outside Anki "reviews"). So the takeaway would be the same every session and does not offer any new insight that would guide a user's journey outside of maybe extra motivation. These statistics also can be loosely approximated by using the already displaying known words for this content and comparing it with their total known words to see roughly how much reviewing they will be getting.


Did a big refactor that should make forward compatibility easier:

  • dictionary-statistics.ts now only ingests sentences and generates snapshots. It does no processing and sends the data raw
  • dictionary-statistics-view.ts processes this raw data into whatever the UI desires.
  • This allows easy changing of the UI without needing to worry about app and extension version mismatches. We only need to consider that when we need to change the snapshot data.
  • It also allows an easy way to create multiple different UIs of the statistics (e.g sidepanel, toolbar, overlay, etc)

Added some more stats:

  • Displaying global ignored count under global known
  • Use configured status colors for status distribution
  • Frequency distributions are banded by status distrubution (hover for detailed stats)
  • Display unique words and known words per sentence
  • Display the comprehension distribution for the sentences (can click to seek)
  • Statistics now consider unique tokens from their surface form, rather than de-duplicating by lemma. richText only deals with surface forms so using lemmas for statistics will always cause an inconsistency somewhere.
image

@ShanaryS ShanaryS self-assigned this Mar 18, 2026
@ShanaryS ShanaryS added the enhancement New feature or request label Mar 18, 2026
@ShanaryS ShanaryS force-pushed the dictionary-statistics branch from 0499474 to fb37557 Compare March 18, 2026 23:44
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Mar 18, 2026

Deploying asbplayer with  Cloudflare Pages  Cloudflare Pages

Latest commit: a362f7a
Status: ✅  Deploy successful!
Preview URL: https://5517d2dd.asbplayer.pages.dev
Branch Preview URL: https://dictionary-statistics.asbplayer.pages.dev

View logs

@ShanaryS ShanaryS force-pushed the dictionary-statistics branch from fb37557 to 1f604e2 Compare March 18, 2026 23:49
@ShanaryS
Copy link
Copy Markdown
Collaborator Author

@NovaKing001 Do you have any suggestions for any other statistics?

@NovaKing001
Copy link
Copy Markdown

@ShanaryS Looks great so far! You pretty much added everything I thought of. Here's some extra ideas I thought of. I probably forgot a few obvious ones.

Track average words collected per day

Would be helpful to see progress

Recommended words

Would be based on the number of occurrences across different media.

Recommended sentences

Based on frequency. would be through recommended words, frequency dictionaries, or occurrences on that video.

Future ideas

I have some more ideas on the statistics tab but they aren't tied to tokenization so I'll wait until that gets sorted out.

I know that you are not currently focusing on UI/UX in this pr but I'm going to throw out some ideas on how the statistics info could be shown

I think adding a button on top of the already existing video overlay that shows a simple overview of your current video statistics would be nice. similar to migaku.
image
something like:
image

With a complete overview of the data being shown in the side menu.

@ShanaryS
Copy link
Copy Markdown
Collaborator Author

Thanks for the suggestions. Right now I'm focusing on the current media rather than between sessions. This will require a db migration so I'd like to get this feature out first and gather more information on what stats people want to track and handle it then. At that point I might just add the changes to the db so we can see how it affects Anki reviews.

Recommended words
Would be based on the number of occurrences across different media.
Recommended sentences
Based on frequency. would be through recommended words, frequency dictionaries, or occurrences on that video.

I tried to avoid telling users what they should mine and instead just give them information to make the decision themselves. If I were to implement a recommended sentence it would be 1 Uncollected, high frequency (<10,000 or higher as their global known increases), and maybe minimum sentence length and maximum for ideal context. But all of this can be achieved through the 1 Uncollected filter view, you can sort by frequency (from frequency dict) or occurrences (total count in this media).

I have some more ideas on the statistics tab but they aren't tied to tokenization so I'll wait until that gets sorted out.

I assume this is general counting stats for asbplayer like time spent mining etc? I would add this when adding stats that persist between sessions. I'd still like to hear them now so I can add it to my notes for later.

I think adding a button on top of the already existing video overlay that shows a simple overview of your current video statistics would be nice. similar to migaku.

Yeah that's part of the next steps.

Comment thread common/app/services/app-extension-dictionary-storage.ts
Comment thread common/app/services/chrome-extension.ts Outdated
Comment thread extension/src/entrypoints/asbplayer.content.ts
Comment thread extension/src/entrypoints/asbplayer.content.ts
Comment thread extension/src/services/binding.ts
Comment thread extension/src/ui/components/SettingsPage.tsx Outdated
@ShanaryS ShanaryS force-pushed the dictionary-statistics branch from 76b6fe5 to 33c9ad0 Compare March 29, 2026 23:57
@ShanaryS
Copy link
Copy Markdown
Collaborator Author

ShanaryS commented Mar 29, 2026

Added some more stats, updated PR description.

@ShanaryS ShanaryS force-pushed the dictionary-statistics branch from 33c9ad0 to cb70446 Compare March 29, 2026 23:59
@ShanaryS ShanaryS force-pushed the dictionary-statistics branch 2 times, most recently from 0db1fc5 to 3a3c322 Compare March 31, 2026 21:48
@ShanaryS ShanaryS force-pushed the dictionary-statistics branch from 3a3c322 to 7f7f85c Compare April 1, 2026 00:42
ShanaryS and others added 8 commits April 1, 2026 22:19
* First pass on statistics UI - restructure UI flows

* Statistics UI state when annotations are disabled

* Statistics open in-app without extension

* Extract drawer into common component

* Show media source info in popup statistics

* Spacing/layout adjustments

* Side panel responds to in-app subtitles

* Fix layout in popup

* Iterate on in-app side panel logic

* Keyboard shortcut works for side panel (partial on Firefox)

* Sidepanel disappears more quickly from tab registry when closed

* Support statistics popup, so that keyboard shortcut completely works

* Some cleanup

* Fix source string overflowing

* Annotation settings link in popup is bound

* Fix NPE

* Iteration on StatisticsSentenceDetailsDialog

* Mining from StatisticsSentenceDetailsDialog mostly works - fix mining double firing

* Fix warnings/compile errors

* Catch up loc changes

* Missing import in StatisticsSentenceDetailsDialog.tsx

* Memoize sentence entries in StatisticsSentenceDetailsDialog

* Statistics UI tweaks

- Track-specific snapshots are selected rather than renderd at the same
  time
- Bigger font sizes for statistics sections
- Limit number of x-axis labels for comprehension graph

* More statistics UI tweaks

* Stats overlay initial draft in-app

* Popout button in statistics side panel

* Stats overlay in extension

* FTUE supports light theme

* Fix fullscreen overlay not populating

* Fit iframe to content

* More settings UI tweaks

- settings link in "getting started" text doesn't include the period
- Progress bar in overlay does not go off the corners
- Mobile overlay doesn't go on top of the i+1 sentence dialog

* Stats popout button added to popup + layouting fixes

* Prevent popup nav buttons from expanding vertically

* Stats overlay can be triggered from stats panel
Copy link
Copy Markdown
Owner

@killergerbah killergerbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShanaryS Feel free to merge when you're ready

@ShanaryS
Copy link
Copy Markdown
Collaborator Author

  • Fixed some icon formatting and bugs with the overlay appearing/disappearing
  • Overlay is now movable and can be toggled on and off through the button (extension and app)
  • Colored the comprehension % for easier readability.
  • Fixed an issue scrolling through the filtered sentences view when seeking to a specific sentence
  • Fixed some issues with resetting state when no more statistics are available
  • Handle video.src preventing video-disappeared from working correctly

@ShanaryS ShanaryS force-pushed the dictionary-statistics branch from 15383ee to a702702 Compare April 18, 2026 22:29
@ShanaryS ShanaryS merged commit ee4ab6a into main Apr 19, 2026
2 checks passed
@ShanaryS ShanaryS deleted the dictionary-statistics branch April 19, 2026 00:23
@killergerbah killergerbah added this to the Extension v1.16.0 milestone Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants