Skip to content

Commit a94579f

Browse files
Merge pull request #237 from AnEnglishmanInNorway/index
Add capability to generate an album index
2 parents 2401398 + 4863400 commit a94579f

24 files changed

Lines changed: 1693 additions & 4 deletions

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,3 +65,5 @@ nosetests.xml
6565
/tests/*/*.mcf.*.pdf
6666
/tests/*/*.mcfx.*.pdf
6767
/tests/temp
68+
/junk*.pdf
69+
/tests/testIndex/test_index.mcf.S.idx.png

README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,49 @@ noShadows = False
140140
# root: ERROR[2], WARNING[4], INFO[38]
141141
```
142142

143+
#### Indexing an album
144+
It is possible to ask cewe2pdf to generate an index for the album, where index terms are selected using a combination of of font and font size used in a text area. The index is initially generated as a separate pdf file with black text on white background. The index pdf is used to create an index image file, a png in which the background is transparent. That png image is then merged into the album pdf, being placed on any page containing an index marker identifier.
145+
146+
This feature may be useful in, for example, an album which represents a day-by-day record of some period of time. The headings for each day in the album can be specified in a font/fontsize combination which is not used for any other purpose in the album, and the index will then present a short day-by-day summary with page number references.
147+
148+
It is normal to allow cewe2pdf to delete the index pdf but to retain the index png. That allows you to manually insert the index png onto the index page in the album editor, and thus have it as part of the album which is sent for quality printing (if you do that!). If you rerun the album pdf generation, creating a new index png to be merged into the album, the merge process will remove any old index png from the index page before adding the new one (based on best-effort recognition of the image in the pdf!)
149+
150+
The page on which the index is to be placed is recognised by the presence of a text on the page. The text is identified with a regular expression defined in the .ini file, and would often be a visible text such as "Contents". If you don't want a visible text, you can always set the colour of the text to "None". Other things on the index page (photos, clip-art, text, etc) are left undisturbed and should be visible since the background of the index image is transparent.
151+
152+
There are a host of index configuration options which can be specified in a separate section of the .ini file. No indexing will take place unless there is an __INDEX__ section and the __indexing__ value is __True__
153+
```
154+
[INDEX]
155+
indexing = False
156+
indexEntryFonts =
157+
Arial Rounded MT Bold, 15
158+
indexFont = Helvetica
159+
indexFontSize = 12
160+
lineSpacing = 1.1
161+
pageWidth = 210
162+
pageHeight = 291 # A4 is 297. 291 is the size of the paper in a 30x30 album
163+
indexMarkerRegex = ^Contents$
164+
topMargin = 5
165+
bottomMargin = 0
166+
leftMargin = 7
167+
rightMargin = 7
168+
deleteIndexPdf = True
169+
deleteIndexPng = False
170+
```
171+
__indexEntryFonts__ specifies one or more font / font sizw combinations which will be used to recognise index terms in the album
172+
173+
__indexFont, indexFontSize, lineSpacing, pageWidth, pageHeight__ determine how the index entries are formatted on the index pdf page
174+
175+
__indexMarkerRegex__ specifies the regular expression against which all text items in the album are tested. Any page with a matching text will be used for insertion of the index png
176+
177+
__topMargin__ etc determine the placement of the index png on the index page. The image is scaled appropriately to fit.
178+
179+
__deleteIndexPdf__ etc determine whether or not the generated files are deleted after the album pdf has been updated.
180+
181+
There are also margin settings for the creation of the index pdf, __pdfTopMargin__ etc. These may be useful if you intend to keep and use the generated index pdf, but default to 1 so that the pdf page is filled and the image margins are the most important.
182+
183+
#### Large index limitations
184+
The current code only handles a single index page. If there are more index terms than fit on a single page, the index pdf will be correct, but the index image will only take the first page.
185+
143186
### additional_fonts.txt
144187
The code knows where to find the fonts delivered with the Cewe software, but if you use non-Cewe fonts then you must specify the location of those fonts. For historical reasons configuration of fonts is done with a separate (optional) configuration file, ``additional_fonts.txt``. The file should contain one line per font file or font directory to be added. Both `.ttf` or `.otf` files are read.
145188

cewe2pdf.py

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,9 @@
102102
from pageNumbering import getPageNumberXy, PageNumberingInfo, PageNumberPosition
103103
from passepartout import Passepartout
104104
from pathutils import findFileInDirs
105-
from text import AppendItemTextInStyle, AppendSpanEnd, AppendSpanStart, AppendText, CollectFontInfo, CreateParagraphStyle, Dequote, noteFontSubstitution
105+
from text import AppendItemTextInStyle, AppendSpanEnd, AppendSpanStart, AppendText
106+
from text import CollectFontInfo, CollectItemFontFamily, CreateParagraphStyle, Dequote, noteFontSubstitution
107+
from index import Index
106108
from textart import handleTextArt
107109

108110

@@ -170,6 +172,7 @@ def __str__(self):
170172
# pdf_styleN = pdf_styles['Normal']
171173
pdf_flowableList = []
172174

175+
albumIndex = None # set after we have got the configuration information
173176
clipartDict = dict[int, str]() # a dictionary for clipart element IDs to file name
174177
clipartPathList = tuple[str]()
175178
passepartoutDict = None # will be dict[int, str] for passepartout designElementIDs to file name
@@ -583,6 +586,7 @@ def processDecorationShadow(decoration, areaHeight, areaWidth, pdf):
583586
frm_table.wrapOn(pdf, shadowWidth, shadowHeight)
584587
frm_table.drawOn(pdf, shadowBottomLeft_x, shadowBottomLeft_y)
585588

589+
586590
def warnAndIgnoreEnabledDecorationShadow(decoration):
587591
if getConfigurationBool(defaultConfigSection, "noShadows", "False"):
588592
return
@@ -594,7 +598,7 @@ def warnAndIgnoreEnabledDecorationShadow(decoration):
594598
continue
595599

596600

597-
def processAreaTextTag(textTag, additional_fonts, area, areaHeight, areaRot, areaWidth, pdf, transx, transy): # noqa: C901 (too complex)
601+
def processAreaTextTag(textTag, additional_fonts, area, areaHeight, areaRot, areaWidth, pdf, transx, transy, pgno): # noqa: C901 (too complex)
598602
# note: it would be better to use proper html processing here
599603
htmlxml = etree.XML(textTag.text)
600604
body = htmlxml.find('.//body')
@@ -675,6 +679,10 @@ def processAreaTextTag(textTag, additional_fonts, area, areaHeight, areaRot, are
675679
# unset by CreateParagraphStyle
676680
# pdf_styleN.backColor = reportlab.lib.colors.HexColor("0xFFFF00")
677681

682+
# There may be multiple "index entry" paragraphs in the text area.
683+
# Concatenating them to just one index entry seems to work in practice
684+
indexEntryText = None
685+
678686
htmlparas = body.findall(".//p")
679687
for p in htmlparas:
680688
maxfs = 0 # cannot use the bodyfs as a default, there may not actually be any text at body size
@@ -710,6 +718,9 @@ def processAreaTextTag(textTag, additional_fonts, area, areaHeight, areaRot, are
710718
usefs = maxfs if maxfs > 0 else bodyfs
711719
pdf_styleN.leading = usefs * finalLeadingFactor # line spacing (text + leading)
712720
pdf_flowableList.append(Paragraph(paragraphText, pdf_styleN))
721+
originalFont = CollectItemFontFamily(p, family)
722+
if albumIndex.CheckForIndexEntry(originalFont, bodyfs):
723+
indexEntryText = Index.AppendIndexText(indexEntryText, p.text)
713724

714725
else:
715726
paragraphText = '<para autoLeading="max">'
@@ -748,6 +759,9 @@ def processAreaTextTag(textTag, additional_fonts, area, areaHeight, areaRot, are
748759

749760
if span.text is not None:
750761
paragraphText = AppendText(paragraphText, html.escape(span.text))
762+
originalFont = CollectItemFontFamily(span, family)
763+
if albumIndex.CheckForIndexEntry(originalFont, spanfs):
764+
indexEntryText = Index.AppendIndexText(indexEntryText, span.text)
751765

752766
# there might be (one or more, or only one?) line break within the span.
753767
brs = span.findall(".//br")
@@ -783,6 +797,9 @@ def processAreaTextTag(textTag, additional_fonts, area, areaHeight, areaRot, are
783797
except Exception:
784798
logging.exception('Exception')
785799

800+
if indexEntryText:
801+
albumIndex.AddIndexEntry(pgno, indexEntryText)
802+
786803
# Add a frame object that can contain multiple paragraphs. Margins (padding) are specified in
787804
# the editor in mm, arriving in the mcf in 1/10 mm, but appearing in the html with the unit "px".
788805
# This is a bit strange, but ignoring the "px" and using mcf2rl seems to work ok.
@@ -951,7 +968,7 @@ def processElements(additional_fonts, fotobook, imagedir,
951968

952969
# process text
953970
for textTag in area.findall('text'):
954-
processAreaTextTag(textTag, additional_fonts, area, areaHeight, areaRot, areaWidth, pdf, transx, transy)
971+
processAreaTextTag(textTag, additional_fonts, area, areaHeight, areaRot, areaWidth, pdf, transx, transy, pageNumber)
955972

956973
# Clip-Art
957974
# In the clipartarea there are two similar elements, the <designElementIDs> and the <clipart>.
@@ -1007,6 +1024,7 @@ def convertMcf(albumname, keepDoublePages: bool, pageNumbers=None, mcfxTmpDir=No
10071024
global bg_res # pylint: disable=global-statement
10081025
global defaultConfigSection # pylint: disable=global-statement
10091026
global pageNumberingInfo # pylint: disable=global-statement
1027+
global albumIndex # pylint: disable=global-statement
10101028

10111029
clipartDict = {} # a dictionary for clipart element IDs to file name
10121030
clipartPathList = tuple()
@@ -1140,6 +1158,11 @@ def convertMcf(albumname, keepDoublePages: bool, pageNumbers=None, mcfxTmpDir=No
11401158

11411159
bg_notFoundDirList = set([]) # keep a list of background folders that are not found, to prevent multiple errors for the same cause.
11421160

1161+
try:
1162+
albumIndex = Index(configuration['INDEX'])
1163+
except KeyError:
1164+
albumIndex = Index(None)
1165+
11431166
# Load fonts
11441167
availableFonts = findAndRegisterFonts(defaultConfigSection, appDataDir, albumBaseFolder, cewe_folder)
11451168

@@ -1201,6 +1224,19 @@ def convertMcf(albumname, keepDoublePages: bool, pageNumbers=None, mcfxTmpDir=No
12011224

12021225
pdf = []
12031226

1227+
if albumIndex.indexing:
1228+
# At this point we have an index of items (selected on the basis of their font characteristics)
1229+
# albumIndex.ShowIndex()
1230+
indexPdfFileName = albumIndex.SaveIndexPdf(outputFileName, albumTitle, pagesize)
1231+
indexPngFileName = albumIndex.SaveIndexPng(indexPdfFileName)
1232+
albumIndex.MergeAlbumAndIndexPng(outputFileName, indexPngFileName)
1233+
# most usual is to delete the index pdf, but leave the index png which could be added
1234+
# to the original with the cewe editor, and then you get it in the printed edition as well
1235+
if albumIndex.deleteIndexPdf and os.path.exists(indexPdfFileName):
1236+
os.remove(indexPdfFileName)
1237+
if albumIndex.deleteIndexPng and os.path.exists(indexPngFileName):
1238+
os.remove(indexPngFileName)
1239+
12041240
# force the release of objects which might be holding on to picture file references
12051241
# so that they will not prevent the removal of the files as we clean up and exit
12061242
objectscollected = gc.collect()

cewe2pdf.pyproj

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,17 @@
144144
<Content Include="tests\testFontSubstitution\previous_result_pdfs\testfontsubstitution.mcf.20250411S.pdf" />
145145
<Content Include="tests\testfontsubstitution\testfontsubstitution.mcf" />
146146
<Content Include="tests\testfontsubstitution\testfontsubstitution_mcf-Dateien\folderid.xml" />
147+
<Content Include="tests\testIndexLarge\additional_fonts.txt" />
148+
<Content Include="tests\testIndexLarge\cewe2pdf.ini" />
149+
<Content Include="tests\testIndexLarge\test_indexLarge.mcf" />
150+
<Content Include="tests\testIndexLarge\test_indexLarge_mcf-Dateien\6ds3xtbb_1_20200306_111748.jpg" />
151+
<Content Include="tests\testIndexLarge\test_indexLarge_mcf-Dateien\folderid.xml" />
152+
<Content Include="tests\testIndexLarge\test_indexLarge_mcf-Dateien\folderid.xml~" />
153+
<Content Include="tests\testIndexLarge\test_indexLarge_mcf-Dateien\ud3dqwdw_1_test_index.mcf.s.idx.png" />
154+
<Content Include="tests\testIndex\additional_fonts.txt" />
155+
<Content Include="tests\testIndex\cewe2pdf.ini" />
156+
<Content Include="tests\testIndex\test_index.mcf" />
157+
<Content Include="tests\testIndex\test_index_mcf-Dateien\folderid.xml" />
147158
<Content Include="tests\testPageNumbers\additional_fonts.txt" />
148159
<Content Include="tests\testPageNumbers\cewe2pdf.ini" />
149160
<Content Include="tests\testPageNumbers\test_pagenumbers.mcf" />
@@ -269,6 +280,7 @@
269280
<Compile Include="imageUtils.py">
270281
<SubType>Code</SubType>
271282
</Compile>
283+
<Compile Include="index.py" />
272284
<Compile Include="lineScales.py">
273285
<SubType>Code</SubType>
274286
</Compile>
@@ -290,6 +302,8 @@
290302
<Compile Include="tests\testEmptyPageOne\test_emptyPageOne.py" />
291303
<Compile Include="tests\testFontDoesNotExist\test_fontDoesNotExist.py" />
292304
<Compile Include="tests\testFontSubstitution\test_fontSubstitution.py" />
305+
<Compile Include="tests\testIndexLarge\test_indexLarge.py" />
306+
<Compile Include="tests\testIndex\test_index.py" />
293307
<Compile Include="tests\testMcfxExtraction\test_McfxExtraction.py" />
294308
<Compile Include="tests\testPageNumbers\test_pagenumbers.py" />
295309
<Compile Include="tests\testTextArt\test_textart.py" />
@@ -345,6 +359,11 @@
345359
<Folder Include="tests\testfontsubstitution\" />
346360
<Folder Include="tests\testFontSubstitution\previous_result_pdfs\" />
347361
<Folder Include="tests\testfontsubstitution\testfontsubstitution_mcf-Dateien\" />
362+
<Folder Include="tests\testIndexLarge\" />
363+
<Folder Include="tests\testIndexLarge\previous_result_pdfs\" />
364+
<Folder Include="tests\testIndexLarge\test_indexLarge_mcf-Dateien\" />
365+
<Folder Include="tests\testIndex\" />
366+
<Folder Include="tests\testIndex\test_index_mcf-Dateien\" />
348367
<Folder Include="tests\testPageNumbers\" />
349368
<Folder Include="tests\testPageNumbers\test_pagenumbers_mcf-Dateien\" />
350369
<Folder Include="tests\testTextArt\" />

configUtils.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,25 @@ def getConfigurationInt(configSection, itemName, defaultValue, minimumValue):
1414
returnValue = minimumValue
1515
return returnValue
1616

17+
def getConfigurationFloat(configSection, itemName, defaultValue, minimumValue):
18+
returnValue = minimumValue
19+
if configSection is not None:
20+
try:
21+
# eg getConfigurationFloat(defaultConfigSection, 'pdfImageResolution', '1.15', 1.0)
22+
returnValue = float(configSection.get(itemName, defaultValue))
23+
except ValueError:
24+
logging.error(f'Invalid configuration value supplied for {itemName}')
25+
returnValue = float(defaultValue)
26+
if returnValue < minimumValue:
27+
logging.error(f'Configuration value supplied for {itemName} is less than {minimumValue}, using {minimumValue}')
28+
returnValue = minimumValue
29+
return returnValue
30+
1731
def getConfigurationBool(configSection, itemName, defaultValue):
1832
returnValue = defaultValue
1933
if configSection is not None:
2034
try:
21-
# eg getConfigurationBool(defaultConfigSection, 'insideCoverWhite', False)
35+
# eg getConfigurationBool(defaultConfigSection, 'insideCoverWhite', 'False')
2236
bv = configSection.get(itemName, defaultValue)
2337
returnValue = bv.lower() == "true"
2438
except ValueError:

0 commit comments

Comments
 (0)