Skip to content

Comments

Feature: Transcription via aTrain integration#40

Merged
alessandrobelli merged 58 commits intomainfrom
aTrain-integration
Jun 13, 2024
Merged

Feature: Transcription via aTrain integration#40
alessandrobelli merged 58 commits intomainfrom
aTrain-integration

Conversation

@jankapunkt
Copy link
Member

This implements a new transform service "aTrain" to integrate aTrain as our transcription service.

Warning: this introduces a tight coupling as we still haven't finalized the plugin specs. Later the service will be decoupled from the main system! (see #18).

@jankapunkt jankapunkt added the enhancement New feature or request label Jun 4, 2024
@jankapunkt jankapunkt added this to the Pre-Release milestone Jun 4, 2024
@alessandrobelli
Copy link
Contributor

Got this error while trying to run docker:

10.35 Collecting pandas
10.36   Downloading pandas-2.2.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (15.6 MB)
10.53      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.6/15.6 MB 83.9 MB/s eta 0:00:00
10.70 Collecting ffmpeg-python>=0.2
10.72   Downloading ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
11.37 ERROR: Could not find a version that satisfies the requirement torch==2.2.0+cu121 (from atrain-core) (from versions: 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0)
11.37 ERROR: No matching distribution found for torch==2.2.0+cu121
11.37 
11.37 [notice] A new release of pip is available: 23.0.1 -> 24.0
11.37 [notice] To update, run: pip install --upgrade pip

I'm not sure if it's related to atrain or not

@alessandrobelli alessandrobelli self-assigned this Jun 4, 2024
Alessandro Belli and others added 4 commits June 4, 2024 17:56
1. file is collected form frontend and saved in sources
2. source status for the audio is converting. file is then sent to atrain
3. after the file is received, create a new source status 'converted:txt'
4. set the audio source id to converted.

Changed source converted attribute to use like to include more converted file types
@jankapunkt
Copy link
Member Author

@alessandrobelli I added a minimal documentation with build instructions to the service. Please check if it works now.

@jankapunkt jankapunkt linked an issue Jun 4, 2024 that may be closed by this pull request
@jankapunkt jankapunkt changed the title A train integration Transcription via aTrain integration Jun 4, 2024
Alessandro Belli added 3 commits June 5, 2024 14:36
- simple bash script to start queue workers and websocket server
- now user sees feedback on ui
- you can download the source file you uploaded
- secured routes
Alessandro Belli and others added 25 commits June 11, 2024 13:54
- now we delete the file from aTrain after we get the transcription
- new icon for audio document
- new button to retry the ATRAIN transcription
- save html instead of txt
- refactor Transcription job
- delete audio file on successful trascription
bugfix naming of rtf endpoints
@jankapunkt jankapunkt requested a review from hohse June 13, 2024 14:52
Copy link
Contributor

@alessandrobelli alessandrobelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • approved proof of concept of aTrain, needs some improvements from UI and UX, but base is there.

@alessandrobelli
Copy link
Contributor

I'll merge so I can do the merge to main server now as for our timeline

@alessandrobelli alessandrobelli merged commit b0f3825 into main Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request live (staging) This is live and ready to be field-tested on our staging system

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aTrain integration

3 participants