Hello,
Thank you for this useful library !
The issue
I had the following issue, with the following code :
import spacy
from spacypdfreader import pdf_reader
nlp = spacy.load("fr_core_news_sm")
doc = pdf_reader('9.PADD_SCOT RM.pdf', nlp)
doc.tensor
I get an empty tensor.
Wheras :
import spacy
from pdfminer import high_level
nlp = spacy.load("fr_dep_news_trf")
doc = nlp(high_level.extract_text(path))
doc.tensor
Returns the right tensor.
Reason
The issue seems to comes from the fact that pdf_reader processess each page as a document and uses Doc.from_docs. It turns out that Doc.from_docs does not preserve Doc.tensor (but it is not found).
Hello,
Thank you for this useful library !
The issue
I had the following issue, with the following code :
I get an empty tensor.
Wheras :
Returns the right tensor.
Reason
The issue seems to comes from the fact that pdf_reader processess each page as a document and uses Doc.from_docs. It turns out that Doc.from_docs does not preserve Doc.tensor (but it is not found).