Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Modules] Incorrect ODR checks caused by preferred_name attribute. #56490

Open
ChuanqiXu9 opened this issue Jul 12, 2022 · 22 comments · Fixed by #122726
Open

[Modules] Incorrect ODR checks caused by preferred_name attribute. #56490

ChuanqiXu9 opened this issue Jul 12, 2022 · 22 comments · Fixed by #122726
Assignees
Labels
clang:modules C++20 modules and Clang Header Modules

Comments

@ChuanqiXu9
Copy link
Member

ChuanqiXu9 commented Jul 12, 2022

I met an incorrect ODR checks caused by preferred_name. Here is a reduced example.

// string_view.h
template<class _CharT>
class basic_string_view;

typedef basic_string_view<char> string_view;

template<class _CharT>
class
__attribute__((__preferred_name__(string_view)))
basic_string_view {
public:
    basic_string_view() 
    {
    }
};

inline basic_string_view<char> foo()
{
  return basic_string_view<char>();
}
// A.cppm
module;
#include "string_view.h"
export module A;

// Use.cppm
module;
#include "string_view.h"
export module Use;
import A;

The input command line is:

clang++ -std=c++20 --precompile A.cppm -o A.pcm
clang++ -std=c++20 --precompile Use.cppm -fprebuilt-module-path=.  -fsyntax-only

The output would be the incorrect ODR warning:

string_view.h:11:5: error: 'basic_string_view<char>::basic_string_view' from module 'A.<global>' is not present in definition of 'string_view' provided earlier

I've been fighting with the problem for several days. My current process is that when the ASTReader reads the preferred_name attribute, it would read the string_view type and find an existing one. However, the current behavior would create another RecordType for string_view despite there is already one. So here is the problem.

@ChuanqiXu9 ChuanqiXu9 added the clang:modules C++20 modules and Clang Header Modules label Jul 12, 2022
@llvmbot
Copy link
Member

llvmbot commented Jul 12, 2022

@llvm/issue-subscribers-clang-modules

@ChuanqiXu9
Copy link
Member Author

I guess I located the problem. But it is really hard to handle. The problem here is that when the compiler reads __attribute__((__preferred_name__(string_view))), it would like to read the type of string_view. And when the compiler reads the type of string_view, the compiler would try to read basic_string_view and the compiler find that basic_string_view has an attribute __preferred_name__ so the compiler would read the attribute again...

Although due to the design of ASTReader, it wouldn't fall into the infinite reading loop. But the reading result is not correct somehow.

I don't find a good solution to fix this.

ChuanqiXu9 added a commit that referenced this issue Jul 26, 2022
…interface

Currently, the use of preferred_name would block implementing std
modules in libcxx. See #56490
for example.
The problem is pretty hard and it looks like we couldn't solve it in a
short time. So we sent this patch as a workaround to avoid blocking us
to modularize STL. This is intended to be fixed properly in the future.

Reviewed By: erichkeane, aaron.ballman, tahonermann

Differential Revision: https://reviews.llvm.org/D130331
@ChuanqiXu9
Copy link
Member Author

Sent a2772fc as a workaround but we still need to fix it.

mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
…interface

Currently, the use of preferred_name would block implementing std
modules in libcxx. See llvm/llvm-project#56490
for example.
The problem is pretty hard and it looks like we couldn't solve it in a
short time. So we sent this patch as a workaround to avoid blocking us
to modularize STL. This is intended to be fixed properly in the future.

Reviewed By: erichkeane, aaron.ballman, tahonermann

Differential Revision: https://reviews.llvm.org/D130331
arichardson pushed a commit to CTSRD-CHERI/llvm-project that referenced this issue Jan 9, 2024
…interface

Currently, the use of preferred_name would block implementing std
modules in libcxx. See llvm/llvm-project#56490
for example.
The problem is pretty hard and it looks like we couldn't solve it in a
short time. So we sent this patch as a workaround to avoid blocking us
to modularize STL. This is intended to be fixed properly in the future.

Reviewed By: erichkeane, aaron.ballman, tahonermann

Differential Revision: https://reviews.llvm.org/D130331
@zygoloid
Copy link
Collaborator

It seems like the problem here is that reading string_view should not result in us immediately reading the attributes on basic_string_view. Generally, reading a declaration shouldn't cause us to immediately read things that appear later, because that can result in cycles.

So either reading the basic_string_view<char> from string_view should only read the forward declaration of basic_string_view, without attributes, or we should defer loading attributes until we're done with recursive deserializations.

@ChuanqiXu9
Copy link
Member Author

We would read the local redeclarations for TagDecl. So the first suggestion might not be straight forward. The second suggestion looks promising.

@ChuanqiXu9
Copy link
Member Author

I did some initial experiments. And it shows it at least not straight forward since we may read some attributes during the deserialization. So if we continue this direction, we may have to divide the attributes as lazy attributes and eager attributes..

@ilya-biryukov
Copy link
Contributor

The same error should be happening for header modules, I'm surprised we're not hitting it at Google with libc++ already.
I guess everything would be fine if all the types would come from the same module, but I'm not sure.

@ilya-biryukov
Copy link
Contributor

Answering my own question here: we're not hitting it at Google because the preferred_name is only used in targets that are modularized themselves, so we never need to merge two independent types with preferred_name into one.

I've checked that if that actually happens, we also hit the same error with header modules.

@VitaNuo VitaNuo self-assigned this Dec 5, 2024
@VitaNuo
Copy link
Contributor

VitaNuo commented Dec 5, 2024

I've tried to compare the deserialisation of A in the versions of code with and without the preferred_name attribute.

The primary difference is that the version without the attribute does not visit the typedef decl for string_view. I don't know the exact reason for this, but it might be that module deserialization is lazy and as long as the typedef is not used, it's not loaded either.

During the merge step, the typedef imported from A cannot be found
in the decl context. The decl context is the translation unit decl for string_view.h. It does contain the string_view typedef, but the two typedefs (imported and existing) don't pass the same entity check.

Type of the existing typedef:

RecordType 0x555569490520 'string_view'
`-ClassTemplateSpecialization 0x555569490428 'basic_string_view'

Type of the imported typedef:

RecordType 0x5555694c5560 'string_view' imported
`-ClassTemplateSpecialization 0x555569490428 'basic_string_view'

They seem identical, but pointers are not the same, and so the typedefs aren't merged.

In the code version with the attribute, the basic_string_view constructor merge step seems to be triggered twice. The first time it seems to be associated with merging ClassTemplateDecl for basic_string_view.
This first constructor merge is the same both in the version with and without the attribute, and succeeds in both cases (without going through the ODR check mechanism at all).

In the code version with the attribute, the constructor merge step is then triggered again for this constructor decl

CXXConstructorDecl 0x5555694c5928 parent 0x555569490428 <string_view.h:11:5, col:18> col:5 imported in A.<global> hidden used basic_string_view 'void ()' implicit_instantiation implicit-inline instantiated_from 0x5555694c59e0

which I therefore assume is connected to the typedef merge step.
This merge triggers the failed ODR check.

These are probably still all symptoms rather than the actual problem.

So far I haven't identified the exact location where deserialization attempts to go on a loop, as described in the previous comments.

@ilya-biryukov
Copy link
Contributor

The primary difference is that the version without the attribute does not visit the typedef decl for string_view. I don't know the exact reason for this, but it might be that module deserialization is lazy and as long as the typedef is not used, it's not loaded either.

This seems totally expected as string_view is not mentioned anywhere in the template itself if you don't have the attribute specified.

They seem identical, but pointers are not the same, and so the typedefs aren't merged.

The fact that pointers are different should be the curlprit of the problem. This should be stemming from the fact that if we have the attribute, we need to deserialize the typedef when deserializing the template itself. I suspect we just don't have the information that

RecordType 0x5555694c5560 'foo' imported

is the same as (previously created)

RecordType 0x555569490520 'foo'

and so we end up with wrong results of merging.

From a different standpoint, without the attribute:

  1. we can independently deserialize the template and its pattern RecordDecl,
  2. deserializing the typedef requires deserializing the template and its pattern RecordDecl.

With the attribute, the (2) still holds, but now we also need to deserialize the typedef when deserializing the template itself. And I believe it's this new aspect that screws us up. The merging of the type inside typedef fails to canonicalize the type to point to the previously parsed RecordType 0x555569490520 'foo' (non-imported) and we end up failing the ODR check because the attributes are different?

To confirm this theory, I'd recommend looking at:

  • why the ODR check fails (I expect its the type inside the attribute being different),
  • tracing if the merging sees the previously parsed RecordType properly in all the internal attributes when deserializing the typedef.

Intuitively, this recursion coming from the attributes should be avoided and we could probably fix this by postponing the merging of this specific attribute to happen after the deserialization step for the corresponding RecordDecl finishes. But it'd be nice to get to the bottom of the issue too.

@ChuanqiXu9
Copy link
Member Author

I guess I located the problem. But it is really hard to handle. The problem here is that when the compiler reads __attribute__((__preferred_name__(string_view))), it would like to read the type of string_view. And when the compiler reads the type of string_view, the compiler would try to read basic_string_view and the compiler find that basic_string_view has an attribute __preferred_name__ so the compiler would read the attribute again...

Although due to the design of ASTReader, it wouldn't fall into the infinite reading loop. But the reading result is not correct somehow.

I don't find a good solution to fix this.

My memory to the problem got vanished slowly. But I think my previous analysis might be true

@VitaNuo
Copy link
Contributor

VitaNuo commented Dec 13, 2024

Some elaboration on the reason why deserialization of the string_view typedef doesn't loop.

At some point in the deserialization of the A module in Use.cppm, VisitCXXRecordDecl is called for the basic_string_view class.

Visiting a Decl triggers reading of its attributes.
Reading the preferred_name attribute triggers reading of TypeSourceInfo for the attribute, in this case the string_view typedef.

readTypedefType of string_view bubbles down to reading the decl (GetDecl) for the typedef declaration.
Down the stack in the GetDecl logic, there's a following snippet

if (!DeclsLoaded[Index]) {
    ReadDeclRecord(ID);
}

return DeclsLoaded[Index];

When the typedef decl is read for the first time, its entry is absent from the DeclsLoaded vector, and reading the decl record is triggered.

Reading the decl record for the typedef first sets it to loaded
DeclsLoaded[Index] = D;

and then issues the actual visitation (Reader.Visit(D) -> VisitTypedefDecl).

VisitTypedefDecl triggers reading the underlying type, which triggers the visitation of ClassTemplateSpecializationDecl for basic_string_view<char> (VisitClassTemplateSpecializationDecl).

Visiting this decl triggers the visitation of its attributes, and visiting the preferred_name attribute triggers reading the typesourceinfo for the attribute once again.

This again bubbles down to GetDecl for the typedef type, but this time around if (!DeclsLoaded[Index]) is false, so the existing (unfinished) type is returned:

ElaboratedType 0x5555694c5510 'foo' sugar imported
`-TypedefType 0x5555694c54e0 'foo'

Essentially, reading the typedef type is triggered while reading the typedef type in a cycle, but the literal cycle doesn't occur, because the decl is marked as loaded once before visititation, and at the second attempt to visit it the unfinished loaded decl is returned.

Then the ClassTemplateSpecialization decl visitation is finished with the result

ClassTemplateSpecializationDecl 0x5555694c54a0 prev 0x555569490428 <string_view.h:1:1, line:2:7> line:9:1 imported in A.<global> hidden class foo_templ implicit_instantiation
|-TemplateArgument type 'char'
| `-BuiltinType 0x555569438c00 'char'
`-PreferredNameAttr 0x5555694c5680 <line:8:16, col:38> string_view

which is also unfinished because of the unfinished attribute reading.

Once typedef visitation is finished, the resulting typedef decl is

TypedefDecl 0x5555694c5410 <string_view.h:4:1, col:25> col:25 imported in A.<global> hidden referenced foo 'basic_string_view<char>':'string_view'
`-ElaboratedType 0x5555694c57c0 'basic_string_view<char>' sugar imported
  `-TemplateSpecializationType 0x5555694c5770 'basic_string_view<char>' sugar imported
    |-name: 'foo_templ' qualified
    | `-ClassTemplateDecl 0x5555694b44b8 prev 0x5555694901c8 <line:1:1, line:2:7> col:7 imported in A.<global> hidden basic_string_view
    |-TemplateArgument type 'char'
    | `-BuiltinType 0x555569438c00 'char'
    `-RecordType 0x5555694c5570 'string_view' imported
      `-ClassTemplateSpecialization 0x555569490428 'basic_string_view'

After the typedef visitation, an existing typedef is found in the decl context, but the existing typedef and the new typedef above aren't merged, since they are deemed to be different entities.

And once the CXXRecordType visitation mentioned in the beginning of the post is finished, the result is

CXXRecordDecl 0x5555694b4808 prev 0x5555694b4698 <string_view.h:7:1, line:12:1> line:9:1 imported in A.<global> hidden class basic_string_view
|-also in A.<global>
`-PreferredNameAttr 0x5555694c5840 <line:8:16, col:38> string_view

which also seems to be unfinished.

@VitaNuo
Copy link
Contributor

VitaNuo commented Dec 16, 2024

I've been trying to find the location where the two typedefs (global and imported) should be linked as one (imported) being a redeclaration of the other (global), but this doesn't happen.

One idea I had is that they might be linked here in the VisitRedeclarable call for the imported typedef.

However, in the case of the imported typedef, reading the Decl ID inside the record returns 0 as a first decl ID, which is invalid, and the conclusion from this redeclarable visitation is that the imported typedef is the only declaration.

Reading through the code, it's probably the expected behavior. ASTRecordReader is constructed with Idx set to 0. GlobalDeclID of the imported decl (ID) is set to a non-zero value. The visitation of the typedef decl is triggered almost immediately after.

This makes me think that the behavior is probably expected and VisitRedeclarable can only link the redecls inside the imported module together.

@VitaNuo
Copy link
Contributor

VitaNuo commented Dec 16, 2024

Continuing on the previous comment, I believe that the linking of imported decls with their global counterparts happens in attachPreviousDecl, which is triggered by loading pending decl chains (via this code path).

For this specific snippet, attaching the previous decl is triggered for:

  • ClassTemplateDecl basic_string_view imported in A.<global> with ClassTemplateDecl basic_string_view in Use.<global>
  • CXXRecordDecl basic_string_view imported in A.<global> with CXXRecordDecl basic_string_view in Use.<global>

.. but not triggered for the TypedefDecl string_view.
In case of the TypedefDecl string_view, loading of the decl chain exits early due to offset being equal to 0.

This makes me think that the redeclarable visit for the typedef should have taken a different branch, but I can't tell at the moment why it didn't.

@ilya-biryukov
Copy link
Contributor

I don't think that's the place we should be looking at. The redecl chain that we are deserializing should be just one typedef (because there are no redeclarations in the module we are loading).

However, I think you've already mentioned that we fail to merge the typedef we are importing either here or here because their types are different, but should be the same.

We've also concluded they they are different because the TypeForDecl is assigned before we set the previous link for the redecl chain. Normally, this should not happen and it only happens because this same typedef is being referenced while we try to deserialize the type of the typedef itself.

I think that the obvious fix would be to postpone deserializing and setting the preferred_name attribute with a mechanism similar to PendingUpdateRecords. That way we ensure that we properly deserialized all structs before we start deserializaing the type inside preferred_name and avoid this cycle.

Could you try implementing something along those lines and see if that fixes the issue?

@VitaNuo
Copy link
Contributor

VitaNuo commented Dec 17, 2024

We've also concluded they they are different because the TypeForDecl is assigned before we set the previous link for the redecl chain. Normally, this should not happen and it only happens because this same typedef is being referenced while we try to deserialize the type of the typedef itself.

Thanks. I have tried to assert that this indeed holds

template<>
void ASTDeclReader::attachPreviousDeclImpl(ASTReader &Reader, Redeclarable<TagDecl> *D, Decl *Previous, Decl *Canon) {
  TagDecl* TD = static_cast<TagDecl *>(D);
  assert(TD->getTypeForDecl() == nullptr && "Type must be still null when link is set");
                                    
  D->RedeclLink.setPrevious(cast<TagDecl>(Previous));
  D->First = cast<TagDecl>(Previous)->First;
}

Unfortunately, the assertion that the type of TagDecl (parent of CXXRecordDecl for basic_string_view) is null before the link is set crashes both for the version of code with and without the preferred_name attribute.

In both case, type of the Decl is set to

InjectedClassNameType 0x5555694935a0 'basic_string_view<_CharT>' dependent
-CXXRecord 0x5555694938d8 'basic_string_view'

@VitaNuo
Copy link
Contributor

VitaNuo commented Dec 17, 2024

Also tried to assert that the types of D and the previous declaration are the same:

assert((TD->getTypeForDecl() == PTD->getTypeForDecl()) && "Types must be the same");

In this case, there's a difference of behavior between the code with attribute and the code without attribute.

Adding previous link to

ClassTemplateSpecializationDecl 0x555569490428 <string_view.h:1:1, line:2:7> line:9:1 in Use.<global> class basic_string_view definition implicit_instantiation

yields different types for D and Previous:

RecordType 0x555569490520 'string_view'
`-ClassTemplateSpecialization 0x555569490428 'basic_string_view'
RecordType 0x5555694c5560 'string_view' imported
`-ClassTemplateSpecialization 0x555569490428 'basic_string_view'

and the assertion triggers.

This doesn't happen for the version without the attribute.

@VitaNuo
Copy link
Contributor

VitaNuo commented Jan 7, 2025

I think that the obvious fix would be to postpone deserializing and setting the preferred_name attribute with a mechanism similar to PendingUpdateRecords.

I've looked in a bit more detail into the PendingUpdateRecords. I am not sure this exact mechanism is what we need to fix the problem.

The way I understand the mechanism:

  • PendingUpdateRecords are populated in ReadASTBlock while looping through the records and blocks of the module at binary level (reading records out of the module input stream).
  • In the case of a specific type of a record (in this case UPDATE_VISIBLE, I'm not sure what this means), the record is pushed into PendingUpdateRecords. In our case, the record with ID 1 (corresponding to the TranslationUnitDecl of the entire module) is pushed for delayed processing.
  • After ReadAST is finished, the destructor calls finishedDeserialising->finishPendingActions, which triggers the traversal of the entire TranslationUnitDecl for the module.

As mentioned, I'm not sure what UPDATE_VISIBLE means and why the entire translation unit processing is delayed, but it seems to me that PendingUpdateRecords mechanism is too low-level for the purpose of delaying a single decl like the typedef decl for the preferred_name attribute, since we'd like to defer the processing of the typedef earlier than the binary record processing time.

There are some more Pending.. structures involved in AST reading that might be better suited to delay processing at the level of a decl, it might be worth looking into those.

@ChuanqiXu9
Copy link
Member Author

I agree generally. BTW #121245 may be related.

@VitaNuo
Copy link
Contributor

VitaNuo commented Jan 8, 2025

The other mechanisms (other than PendingUpdateRecords) are, e.g., PendingDeducedVarTypes or PendingDeclContextInfos.

In my understanding, they allow to skip deserialization of a specific Decl or QualType. The way it works:

  • during primary deserialization, the decl or type ID is read from the record (usually with Record. readInt()). This allows to keep reading the record without breaking deserialization, since record is a stream.
  • the ID is stored in the corresponding Pending.. structure, together with the Decl to link it to later (e.g., the decl to set the type on)
  • deserialization continues
  • in finishPendingActions, the pending decls and decl/type IDs are retrieved from the Pending.. structure and deserialized using GetDecl or GetType in the ASTReader.

What's different about the preferred_name attribute:

  • it's not a single type or a single decl . The record contains a lot of entries for the attribute (attribute name, and many more). All these entries need to be read immediately in order to not break record deserialization (here's the place it'd break).
  • the piece that can (?) be delayed is reading the type source for the attribute. In my understanding, it requires reading only one ID from the record (the type ID for the typedef), so technically we could read this ID, store in a Pending.. structure, and defer the creation of the full type source and attaching it to the preferred name attribute until later.
  • an exact mechanism like this isn't present in the ASTReader as of now.
  • reading type source info is contained in the generated AttrPCHRead.inc, so we'd probably need to take it out of there to experiment with deferring the type deserialization.

@VitaNuo
Copy link
Contributor

VitaNuo commented Jan 10, 2025

The linked PR delays reading the type source info of the preferred_name attribute.
It does so by reading the type ID of the typedef from the Record stream, as well as the relevant pieces of the type loc (elaborated typedef loc + typedef loc).
In finishPendingActions, the type of the attribute (typedef) is read using the stored ID, and the typeloc is recreated using the previously stored data.

Missing piece:

  • if we delay deserialization of the preferred_name attribute and do it in a manual/custom way, there's probably no need anymore for the handling in the generated AttrPCHRead.inc. However, I haven't removed it yet, until we are agreed on the overall approach.

@hokein
Copy link
Collaborator

hokein commented Feb 17, 2025

It looks like the fix was reverted in f63e8ed. I think we'd better to re-open the issue.

@VitaNuo VitaNuo reopened this Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules
Projects
None yet
6 participants