Example:
- Searching document text for
meme. http://127.0.0.1:8000/archives/doc/3_19_pmm_memo_re_709_1960_04_29_1_19 is first result.
- Looking at PDF preview online, there is no
meme in text, only memo. Highlighting the sentence Status of programming memo and revision of machine shut-down date to late July. and copy pasting elsewhere gives correct text.
- Check OCR text in
data/processed_pdfs folder. It says Status of programming meme, probably due to OCR error.
Seems like PDF preview and search have different opinions on the OCR?
Example:
meme. http://127.0.0.1:8000/archives/doc/3_19_pmm_memo_re_709_1960_04_29_1_19 is first result.memein text, onlymemo. Highlighting the sentenceStatus of programming memo and revision of machine shut-down date to late July.and copy pasting elsewhere gives correct text.data/processed_pdfsfolder. It saysStatus of programming meme, probably due to OCR error.Seems like PDF preview and search have different opinions on the OCR?