Skip to content

Error with xml_serialize()/xml_unserialize() roundtrip: Opening and ending tag mismatch [PATCH] #407

@HenrikBengtsson

Description

@HenrikBengtsson

Issue

xml_serialize()-xml_unserialize() roundtrip failes with: "Opening and ending tag mismatch: link line 12 and head [76]"

I'd expect a roundtrip to always work.

Reproducible Example

doc <- xml2::read_html("https://www.r-project.org")
doc
#> {html_document}
#> <html lang="en">
#> [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
#> [2] <body>\n    <div class="container page">\n      <div class="row">\n       ...

raw <- xml2::xml_serialize(doc, connection = NULL)
doc2 <- xml2::xml_unserialize(raw)
#> Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html,  : 
#>   Opening and ending tag mismatch: link line 12 and head [76]

Traceback:

> traceback()
4: read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, 
       options = options)
3: read_xml.character(unclass(object), ...)
2: read_xml(unclass(object), ...)
1: xml2::xml_unserialize(raw)
Session Info
> devtools::session_info() # Paste output belowSession info ─────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       Ubuntu 22.04.3 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Los_Angeles
 date     2023-10-03
 pandoc   3.1.7 @ /home/henrik/shared/software/CBI/pandoc-3.1.7/bin/pandocPackages ──────────────────────
 package     * version date (UTC) lib source
 cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.0)
 callr         3.7.3   2022-11-02 [1] CRAN (R 4.3.0)
 cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
 crayon        1.5.2   2022-09-29 [1] RSPM (R 4.3.0)
 devtools      2.4.5   2022-10-11 [1] RSPM (R 4.3.0)
 digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.1)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.3.0)
 fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 fs            1.6.3   2023-07-20 [1] RSPM (R 4.3.0)
 glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
 htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.1)
 htmlwidgets   1.6.2   2023-03-17 [1] RSPM (R 4.3.0)
 httpuv        1.6.11  2023-05-11 [1] RSPM (R 4.3.0)
 later         1.3.1   2023-05-02 [1] CRAN (R 4.3.0)
 lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
 magrittr      2.0.3   2022-03-30 [1] RSPM (R 4.3.0)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
 mime          0.12    2021-09-28 [1] CRAN (R 4.3.0)
 miniUI        0.1.1.1 2018-05-18 [1] RSPM (R 4.3.0)
 pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 4.3.1)
 pkgload       1.3.3   2023-09-22 [1] CRAN (R 4.3.1)
 prettyunits   1.2.0   2023-09-24 [1] RSPM (R 4.3.0)
 processx      3.8.2   2023-06-30 [1] CRAN (R 4.3.1)
 profvis       0.3.8   2023-05-02 [1] RSPM (R 4.3.0)
 promises      1.2.1   2023-08-10 [1] CRAN (R 4.3.1)
 ps            1.7.5   2023-04-18 [1] RSPM (R 4.3.0)
 purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.1)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.3.1)
 remotes       2.4.2.1 2023-07-18 [1] CRAN (R 4.3.1)
 rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
 sessioninfo   1.2.2   2021-12-06 [1] RSPM (R 4.3.0)
 shiny         1.7.5   2023-08-12 [1] RSPM (R 4.3.0)
 stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
 stringr       1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
 urlchecker    1.0.1   2021-11-30 [1] RSPM (R 4.3.0)
 usethis       2.2.2   2023-07-06 [1] CRAN (R 4.3.1)
 vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
 xtable        1.8-4   2019-04-21 [1] RSPM (R 4.3.0)

 [1] /home/henrik/R/ubuntu22_04-x86_64-pc-linux-gnu-library/4.3-CBI-gcc11
 [2] /home/henrik/shared/software/CBI/_ubuntu22_04/R-4.3.1-gcc11/lib/R/library

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions