Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 46 additions & 83 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,83 +1,46 @@
# htmldocx
Convert html to docx

Dependencies: `python-docx` & `bs4`

### To install

`pip install htmldocx`

### Usage

Add strings of html to an existing docx.Document object

```
from docx import Document
from htmldocx import HtmlToDocx

document = Document()
new_parser = HtmlToDocx()
# do stuff to document

html = '<h1>Hello world</h1>'
new_parser.add_html_to_document(html, document)

# do more stuff to document
document.save('your_file_name')
```

Convert files directly

```
from htmldocx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.parse_html_file(input_html_file_path, output_docx_file_path)
```

Convert files from a string

```
from htmldocx import HtmlToDocx

new_parser = HtmlToDocx()
docx = new_parser.parse_html_string(input_html_file_string)
```

Change table styles

Tables are not styled by default. Use the `table_style` attribute on the parser to set a table
style. The style is used for all tables.

```
from htmldocx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.table_style = 'Light Shading Accent 4'
```

To add borders to tables, use the `TableGrid` style:

```
new_parser.table_style = 'TableGrid'
```

Default table styles can be found
here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template

Change default paragraph style

No style is applied to the paragraphs by default. Use the `paragraph_style` attribute on the parser
to set a default paragraph style. The style is used for all paragraphs. If additional styling (
color, background color, alignment...) is defined in the HTML, it will be applied after the
paragraph style.

```
from htmldocx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.paragraph_style = 'Quote'
```

Default paragraph styles can be found
here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#paragraph-styles-in-default-template
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<style>
body {
font-family: 'Times New Roman', Serif;
width: 21cm;
margin: 0 auto;
padding: 2cm;
line-height: 1.4;
color: #000;
text-align: justify;
}
.header {
text-align: center;
margin-bottom: 25px;
}
.header-top, .header-hospital {
font-weight: bold;
margin-bottom: 5px;
font-size: 14pt;
}
.header-hospital {
font-size: 16pt;
margin-top: 20px;
}
.motivation {
font-style: italic;
margin: 15px 0;
font-size: 12pt;
}
.document-number {
text-align: center;
font-weight: bold;
margin: 30px 0;
font-size: 12pt;
}
.document-title {
text-align: center;
font-weight: bold;
text-transform: uppercase;
font-size: 16pt;
margin-bottom: 40px;
text-decoration: underline