Skip to content

Comments

Add optional filelists_ext metadata#316

Merged
kontura merged 1 commit intorpm-software-management:masterfrom
aplanas:fix_imadata
Feb 16, 2023
Merged

Add optional filelists_ext metadata#316
kontura merged 1 commit intorpm-software-management:masterfrom
aplanas:fix_imadata

Conversation

@aplanas
Copy link
Contributor

@aplanas aplanas commented Mar 24, 2022

Using the parameter '--filelists_ext', a new metadata file
(filelists_ext) will be created. This new metadata is a copy of
filelists, but will include a new attribute for the 'file' element,
named 'hash', that contain the registered hash (SHA256 for example)
for this file.

Fix #313

Signed-off-by: Alberto Planas [email protected]

@aplanas
Copy link
Contributor Author

aplanas commented Mar 24, 2022

There are some bits missing, mostly tests, some adjustments in the Python bindings, and all the sqlite code.

But I want to present this patch as early as possible, to see if the approach will be OK

@mlschroe
Copy link

The name "--hash" is somewhat confusing. Rpm calls those "digest", so why not --digest? And why is the filename "filelists_ext" if the option is called "hash"? Shouldn't it be "filehashes" or "filedigests"?

@mlschroe
Copy link

No testcases?

@aplanas
Copy link
Contributor Author

aplanas commented Mar 25, 2022

No testcases?

The first comment point other missing bits too. The patch was getting big without any review, so I decided to publish the code to have the chance of fixing issues before too long.

Tests and SQLIte schema change will be added in this PR before removing the draft flag.

@aplanas
Copy link
Contributor Author

aplanas commented Mar 25, 2022

The name "--hash" is somewhat confusing. Rpm calls those "digest", so why not --digest?

Sure, digest is used in the internal package structure. I can use also --filelists_ext

And why is the filename "filelists_ext" if the option is called "hash"? Shouldn't it be "filehashes" or "filedigests"?

filelists_ext.xml is a strict superset of filelists.xml. For reasons of why is that see #313 (comment)

I think I will go for --filelists_ext then.

@aplanas
Copy link
Contributor Author

aplanas commented Mar 28, 2022

The name "--hash" is somewhat confusing.

Done. I renamed it to --filelists_ext

@aplanas aplanas force-pushed the fix_imadata branch 5 times, most recently from 1f5504c to 11c274c Compare March 29, 2022 13:11
@kontura
Copy link
Contributor

kontura commented Mar 29, 2022

Given that the filelists-ext fully contains regular filelists how about instead of adding all the new parsing APIs we just use the existing ones? I think it would just work since the element was added to xml_parser_filelists.c.

@aplanas
Copy link
Contributor Author

aplanas commented Mar 29, 2022

Given that the filelists-ext fully contains regular filelists how about instead of adding all the new parsing APIs we just use the existing ones? I think it would just work since the element was added to xml_parser_filelists.c.

I like the idea. The parser state machine is shared, I will see how I can unify the API. I must admit that when I do not understand the code, I make an explicit copy for the filelists_ext version.

@aplanas aplanas force-pushed the fix_imadata branch 3 times, most recently from b875d3f to 8d9db76 Compare March 29, 2022 14:34
@aplanas
Copy link
Contributor Author

aplanas commented Mar 29, 2022

Given that the filelists-ext fully contains regular filelists how about instead of adding all the new parsing APIs we just use the existing ones?

Ok, I think that I know how to do this, thanks for the idea as it will remove a lot of issues.

@aplanas aplanas force-pushed the fix_imadata branch 2 times, most recently from a59da74 to 34762c5 Compare March 30, 2022 13:41
@aplanas aplanas marked this pull request as ready for review March 30, 2022 14:09
@aplanas
Copy link
Contributor Author

aplanas commented Mar 30, 2022

I still plan to add some tests, but I am thinking to drop the changes in sqlite for now. Meanwhile I moved the PR to ready for reviews.

@aplanas aplanas force-pushed the fix_imadata branch 4 times, most recently from 8506aad to e6bad40 Compare April 1, 2022 12:37
@aplanas
Copy link
Contributor Author

aplanas commented Apr 1, 2022

I updated some tests, and I decided not to break the cr_PackageFiles Python tuple, and not include the digest there.

IMHO I am done.

@dralley
Copy link
Contributor

dralley commented Apr 10, 2022

I still plan to add some tests, but I am thinking to drop the changes in sqlite for now. Meanwhile I moved the PR to ready for reviews.

Yes, what I've now realized is that .sqlite metadata is de-facto deprecated even though it's still provided in most repos. None of the tools that consume it are mainstream anymore, except for legacy yum in el7, which has only a few years to live, and of course this new feature isn't super relevant to it.

I filed this issue to turn it off in a future Fedora release. https://pagure.io/releng/issue/10745

So no changes to the sqlite metadata is likely preferable.

}

if (filelists_ext && entry->digest && entry->digest[0] != '\0') {
cr_xmlNewProp(file_node, BAD_CAST "hash", BAD_CAST entry->digest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should have an attribute to distinguish the digest algorithm used alongside the digest itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a new XML field for each package, that indicate the algorithm (<checksum/>), this tag is the file digest itself

cr_Package *package = NULL;

cr_PkgIterator *pkg_iterator = cr_PkgIterator_new(
TEST_REPO_04_PRIMARY, TEST_REPO_04_FILELISTS_EXT, TEST_REPO_04_OTHER, NULL, NULL, NULL, NULL, &tmp_err);
Copy link
Contributor

@dralley dralley Apr 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this API then, is that if you want the full metadata, you substitute the path to filelists_ext.xml instead of filelists.xml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the parser, this is the idea.

@Conan-Kudo
Copy link
Member

@aplanas Are you still interested in this? It needs rebasing...

@aplanas
Copy link
Contributor Author

aplanas commented Dec 8, 2022

@Conan-Kudo Yes. Rebased!

Is there anything actionable that I can do to bootstrap the discussion? Is the goal of this PR clear? Maybe there is a better way to connect the signature of the entity that create the repository with the value of the hashes?

@aplanas aplanas force-pushed the fix_imadata branch 2 times, most recently from b3e6d79 to a347d6a Compare December 15, 2022 15:59
@aplanas
Copy link
Contributor Author

aplanas commented Jan 4, 2023

@Conan-Kudo ping. Can you remove the "decaying" tag?

Copy link
Contributor

@kontura kontura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, sorry for the delay.

Unless I am wrong this breaks API only with the two functions I annotated, please can you rather create new ones? (the Python API should be fine as the new params are optional).

Otherwise in my opinion this is good to go and I would merge it.

Using the parameter '--filelists_ext', a new metadata file
(filelists_ext) will be created.  This new metadata is a copy of
filelists, but will include a new attribute for the 'file' element,
named 'hash', that contain the registered hash (SHA256 for example)
for this file.

Fix rpm-software-management#313

Signed-off-by: Alberto Planas <[email protected]>
@aplanas
Copy link
Contributor Author

aplanas commented Feb 15, 2023

Another option instead to duplicate similar functions, can be to drop --filelist_ext and generate it always, and is the user who decides to ignore it or not

@dralley
Copy link
Contributor

dralley commented Feb 16, 2023

You probably don't need to "duplicate" per se, just have one implementation with a different signature which is called by the existing function, which would be turned into just a stub.

@aplanas
Copy link
Contributor Author

aplanas commented Feb 16, 2023

just have one implementation with a different signature which is called by the existing function

Right, I used this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IMA: Extend filelists.xml to include a hash attribute inside the file elements.

5 participants