Skip to content

Add Diar-az#39

Merged
wq2012 merged 1 commit intowq2012:masterfrom
ArnarFreyrKristinsson:master
Sep 20, 2024
Merged

Add Diar-az#39
wq2012 merged 1 commit intowq2012:masterfrom
ArnarFreyrKristinsson:master

Conversation

@ArnarFreyrKristinsson
Copy link
Copy Markdown
Contributor

Diar-az creates files for a (diarization) corpus from Gecko and provides organization, cleaning and correction of data for Kaldi to Gecko to Kaldi/corpus and back.

@wq2012
Copy link
Copy Markdown
Owner

wq2012 commented Aug 20, 2024

I think this should fall into "Other software" instead of "Diarization dataset".

This is not a new dataset. It's just a format conversion tool, is it correct?

@judyfong
Copy link
Copy Markdown
Contributor

Its a tool specifically for the ruv-di dataset

@wq2012
Copy link
Copy Markdown
Owner

wq2012 commented Aug 20, 2024

If so, we should add ruv as a dataset, and this repo as "Other Software".

@judyfong
Copy link
Copy Markdown
Contributor

judyfong commented Aug 20, 2024 via email

@ArnarFreyrKristinsson
Copy link
Copy Markdown
Contributor Author

ArnarFreyrKristinsson commented Aug 20, 2024

Yes, I think Other software works and maybe a better fit, as it's not really a dataset, rather it was a tool to support the ruv-di dataset. To correct this, should this pull request be just updated or a new one created?

The dataset was never published, only the resulting models. Also, yes that dataset should be added but it was also lost in a cyber security attack in January 2024 on Reykjavik University’s servers. If you want, you could put a placeholder text for the RÚV-DI dataset here in this repo and we could try to recreate the dataset. We have a license that lists all the shows and episodes contained within the dataset. So we could recreate it from that. Other software works in my opinion.

On Tuesday, August 20, 2024, Quan Wang @.> wrote: If so, we should add ruv as a dataset, and this repo as "Other Software". — Reply to this email directly, view it on GitHub <#39 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUMNYEZUD2QOJQQ7AE2X5TZSNI2HAVCNFSM6AAAAABMY6MAQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJZGAZTMOJRGA . You are receiving this because you commented.Message ID: @.>

@wq2012
Copy link
Copy Markdown
Owner

wq2012 commented Aug 23, 2024

To correct this, should this pull request be just updated or a new one created?

I'm OK either way.

@ArnarFreyrKristinsson
Copy link
Copy Markdown
Contributor Author

To correct this, should this pull request be just updated or a new one created?

I'm OK either way.

Fixed, added to other software.

@judyfong
Copy link
Copy Markdown
Contributor

@afk0901 i believe you also need to put the placeholder text for the dataset for this pr to be properly closed.

In terms of recreating the dataset i believe it's actually best if @wq2012 recreates the dataset with daan and pet of google. And @afk0901 finish our writeup of this dataset creation. When we are both done we compare notes on arxiv and write the dataset paper together for interspeech, icassp, or sand2025, or wand in october.

@judyfong
Copy link
Copy Markdown
Contributor

For continuity and clarity I believe it's best if my second paragraph is dealt with separately, not in this pr. Thus i have created a new issue for it within this repo.

@wq2012
Copy link
Copy Markdown
Owner

wq2012 commented Sep 20, 2024

To correct this, should this pull request be just updated or a new one created?

I'm OK either way.

Fixed, added to other software.

I didn't see the change.

@ArnarFreyrKristinsson ArnarFreyrKristinsson force-pushed the master branch 2 times, most recently from f32caa9 to 8d1a453 Compare September 20, 2024 19:29
README.md Outdated
| [VoxConverse](https://github.com/joonson/voxconverse) | TBD | TBD | Free | VoxConverse is an audio-visual diarisation dataset consisting of over 50 hours of multispeaker clips of human speech, extracted from YouTube videos |
| [MiniVox Benchmark](https://github.com/doerlbh/MiniVox) | [MiniVox Benchmark](https://github.com/doerlbh/MiniVox) | en | Free | MiniVox is an automatic framework to transform any speaker-labelled dataset into continuous speech datastream with episodically revealed label feedbacks. |
| [The AliMeeting Corpus](https://github.com/yufan-aslp/AliMeeting) | Together with audios | zh | Free | |
| RÚV-DI dataset | TBD | is | TBD | |
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Add Diar-az
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants