Skip to content

Re-add curly braces in author names after latex parsing#293

Merged
michel-kraemer merged 8 commits intomichel-kraemer:masterfrom
Siedlerchr:FixCurlyBracesHandlingInAuthor
Oct 22, 2025
Merged

Re-add curly braces in author names after latex parsing#293
michel-kraemer merged 8 commits intomichel-kraemer:masterfrom
Siedlerchr:FixCurlyBracesHandlingInAuthor

Conversation

@Siedlerchr
Copy link
Copy Markdown
Contributor

@Siedlerchr Siedlerchr commented Sep 13, 2025

Fixes #292

The reason is that LatexParser in jbibtex uses the curly braces for group indication in the grammar and does not treat it as single item nor preserves them

@Siedlerchr Siedlerchr changed the title Readd curly braces in author names after latex parsing Re-add curly braces in author names after latex parsing Sep 13, 2025
…dlingInAuthor

* upstream/master:
  Add missing item types and variables from the CSL 1.0.2 specification
@Siedlerchr
Copy link
Copy Markdown
Contributor Author

@michel-kraemer I finally found time to fix the failing tests as well to exclude inner formatting braces.

@michel-kraemer
Copy link
Copy Markdown
Owner

Thank you very much for raising this issue and for providing the pull request! Also, apologies for my late reply. As I said, I was out of office and didn't have access to a computer.

The PR works very well! I reviewed it carefully and have one final question before I merge it. If I understand it correctly, this is a workaround for a bug in JBibTex, right? I was wondering because I discovered that the following test fails:

    @Test
    public void curlyBracesAreReaddedOnlyForSecondAuthor() throws ParseException {
        String entry = "@online{testcitationkey,\n" +
                "  author = {Foo Bar and {Foo Bar}},\n" +
                "  journal = {Test journal},\n" +
                "  title = {Test title},\n" +
                "  year = {2025},\n" +
                "}";

        BibTeXDatabase db = new BibTeXParser().parse(new StringReader(entry));
        BibTeXConverter converter = new BibTeXConverter();
        Map<String, CSLItemData> items = converter.toItemData(db);
        CSLItemData item = items.get("testcitationkey");

        CSLName name1 = new CSLNameBuilder()
                .family("Bar")
                .given("Foo")
                .build();

        CSLName name2 = new CSLNameBuilder()
                .literal("Foo Bar")
                .build();

        assertEquals(name1, item.getAuthor()[0]); // <- FAILS HERE
        assertEquals(name2, item.getAuthor()[1]);
    }

The code that re-adds the curly braces doesn't distinguish between the first and the second occurrence of "Foo Bar", so both will be wrapped. I don't think this is a huge problem as the test case I constructed is artificial and most likely doesn't happen in practice. I was just wondering if it wouldn't be better to fix the actual bug in JBibTex (or maybe add a flag to change its behavior).

Since you have already put more thought into this as I did, I wanted to ask what your opinion is.

@michel-kraemer
Copy link
Copy Markdown
Owner

michel-kraemer commented Oct 20, 2025

Ah! I think I get it now. I just added a println to BibTeXConverter.toItemData:

                List<LaTeXObject> objs = latexParser.parse(new StringReader(us));
                System.out.println(objs);
                us = latexPrinter.print(objs).replaceAll("\\n", " ").replaceAll("\\r", "").trim();

And got the following output:

[org.jbibtex.LaTeXString@4743a322, org.jbibtex.LaTeXGroup@79316f3a]
[org.jbibtex.LaTeXString@4cdb8504]
[org.jbibtex.LaTeXString@76db540e]
[org.jbibtex.LaTeXString@10358c32]

What if we looked for LaTeXGroup objects in this list and always wrap them in curly braces? This would make the code much shorter and more bullet proof, wouldn't it?

@michel-kraemer
Copy link
Copy Markdown
Owner

michel-kraemer commented Oct 20, 2025

The following seems to work with all your test cases and mine:

        for (Map.Entry<Key, Value> field : e.getFields().entrySet()) {
            String us = field.getValue().toUserString().replaceAll("\\r", "");

            // convert LaTeX string to normal text
            try {
                List<LaTeXObject> objs = latexParser.parse(new StringReader(us));
                List<LaTeXObject> newObjs;
                String keyLower = field.getKey().getValue().toLowerCase();
                if (FIELD_AUTHOR.equals(keyLower) || FIELD_EDITOR.equals(keyLower)) {
                    newObjs = new ArrayList<>();
                    for (LaTeXObject o : objs) {
                        if (o instanceof LaTeXGroup) {
                            List<LaTeXObject> children = new ArrayList<>();
                            children.add(new LaTeXString("{"));
                            children.addAll(((LaTeXGroup)o).getObjects());
                            children.add(new LaTeXString("}"));
                            LaTeXGroup g = new LaTeXGroup(children);
                            newObjs.add(g);
                        } else {
                            newObjs.add(o);
                        }
                    }
                } else {
                    newObjs = objs;
                }
                us = latexPrinter.print(newObjs).replaceAll("\\n", " ").replaceAll("\\r", "").trim();
            } catch (ParseException | TokenMgrException ex) {
                // ignore
            }

            entries.put(field.getKey().getValue().toLowerCase(), us);
        }

@Siedlerchr
Copy link
Copy Markdown
Contributor Author

Siedlerchr commented Oct 20, 2025

@michel-kraemer Thanks for the update. Your code looks even simpler and I agree more future proof.

Actually, this is not really a bug in JBibtex, the behavior is correct, as it is a LatexParser creates two Latex objects out of it
jbibtex/jbibtex#33 (comment)

Feel free to push your changes to this branch

@Siedlerchr
Copy link
Copy Markdown
Contributor Author

Siedlerchr commented Oct 20, 2025

@michel-kraemer I tested your approach now and It fails in the fixture tests (had this initially as well) with diacritics in curly braces

[2]F[. Giné, F. Solsona, P. Hernández, and E.] Luque, “Dealing wit...> but was:<..., 1992, p. 674.
[2]F[rancesc Gin{é} and Francesc Solsona and Porfidio Hern{á}ndez and Emilio] Luque, “Dealing wit...> 

I came up with a better solution, check for latex commands inside the latex objects

@michel-kraemer michel-kraemer merged commit eba84ab into michel-kraemer:master Oct 22, 2025
1 of 2 checks passed
@michel-kraemer
Copy link
Copy Markdown
Owner

@Siedlerchr Thanks for your effort! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BibtexConverter does not preserve curly braces for authors

2 participants