Skip to content

Exception handling #63

@crtnx

Description

@crtnx

This invalid segment of a real-life html crashes entire module. See below. The better behavior would be to swallow the error and issue a warning.

<p style="margin:0;padding:0;margin: 0cm; margin-bottom: ..0001pt; -ms-word-wrap: break-word;"><span style="font-size: 10.0pt; font-family: \'Arial\',sans-serif; color: black;">
File "/emails/venv/lib/python3.10/site-packages/inscriptis/__init__.py", line 104, in get_text
    return Inscriptis(html_tree, config).get_text() if html_tree is not None \
  File "/emails/venv/lib/python3.10/site-packages/inscriptis/html_engine.py", line 81, in __init__
    self._parse_html_tree(html_tree)
  File "/emails/venv/lib/python3.10/site-packages/inscriptis/html_engine.py", line 100, in _parse_html_tree
    self._parse_html_tree(node)
  File "/emails/venv/lib/python3.10/site-packages/inscriptis/html_engine.py", line 100, in _parse_html_tree
    self._parse_html_tree(node)
  File "/emails/venv/lib/python3.10/site-packages/inscriptis/html_engine.py", line 100, in _parse_html_tree
    self._parse_html_tree(node)
  [Previous line repeated 2 more times]
  File "/emails/venv/lib/python3.10/site-packages/inscriptis/html_engine.py", line 93, in _parse_html_tree
    self.handle_starttag(tree.tag, tree.attrib)
  File "/emails/venv/lib/python3.10/site-packages/inscriptis/html_engine.py", line 135, in handle_starttag
    self.apply_attributes(attrs, html_element=self.css.get(
  File "/emails/venv/lib/python3.10/site-packages/inscriptis/model/attribute.py", line 60, in apply_attributes
    self.attribute_mapping[attr_name](attr_value, html_element)
  File "/emails/venv/lib/python3.10/site-packages/inscriptis/model/css.py", line 43, in attr_style
    apply_style(value, html_element)
  File "/emails/venv/lib/python3.10/site-packages/inscriptis/model/css.py", line 101, in attr_margin_bottom
    html_element.margin_after = CssParse._get_em(value)
  File "/emails/venv/lib/python3.10/site-packages/inscriptis/model/css.py", line 61, in _get_em
    value = float(_m.group(1))
ValueError: could not convert string to float: '..0001'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions