Feature: Transcription via aTrain integration by jankapunkt · Pull Request #40 · openqda/openqda

jankapunkt · 2024-06-04T11:29:10Z

This implements a new transform service "aTrain" to integrate aTrain as our transcription service.

Warning: this introduces a tight coupling as we still haven't finalized the plugin specs. Later the service will be decoupled from the main system! (see #18).

alessandrobelli · 2024-06-04T15:35:53Z

Got this error while trying to run docker:

10.35 Collecting pandas
10.36   Downloading pandas-2.2.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (15.6 MB)
10.53      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.6/15.6 MB 83.9 MB/s eta 0:00:00
10.70 Collecting ffmpeg-python>=0.2
10.72   Downloading ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
11.37 ERROR: Could not find a version that satisfies the requirement torch==2.2.0+cu121 (from atrain-core) (from versions: 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0)
11.37 ERROR: No matching distribution found for torch==2.2.0+cu121
11.37 
11.37 [notice] A new release of pip is available: 23.0.1 -> 24.0
11.37 [notice] To update, run: pip install --upgrade pip

I'm not sure if it's related to atrain or not

1. file is collected form frontend and saved in sources 2. source status for the audio is converting. file is then sent to atrain 3. after the file is received, create a new source status 'converted:txt' 4. set the audio source id to converted. Changed source converted attribute to use like to include more converted file types

jankapunkt · 2024-06-04T17:26:01Z

@alessandrobelli I added a minimal documentation with build instructions to the service. Please check if it works now.

- return false if not converted

… into aTrain-integration

- now you should see file converting on the UI - LINT

services/transform/atrain/README.md

- simple bash script to start queue workers and websocket server

lint

- now user sees feedback on ui - you can download the source file you uploaded - secured routes

- now we delete the file from aTrain after we get the transcription - new icon for audio document - new button to retry the ATRAIN transcription

…ptionJob

…ction definition (open-close)

- save html instead of txt - refactor Transcription job - delete audio file on successful trascription

…aTrain-integration

…ssfully

bugfix naming of rtf endpoints

alessandrobelli

approved proof of concept of aTrain, needs some improvements from UI and UX, but base is there.

alessandrobelli · 2024-06-13T15:05:44Z

I'll merge so I can do the merge to main server now as for our timeline

Alessandro Belli and others added 8 commits May 29, 2024 13:36

first commit

dee6f9f

build(backend): move atrain url to .env

192a332

backend: load atrain url from .env config

a6d21ad

plugins: move convert-to-rtf service into /services/transform folder

d367aaf

build(docker): move docker compose file to top-level

c6c313b

feature(plugins): add aTrain transform service

0701b6c

build: merge services into web/docker-compose.yml

980d532

feature(plugins): save transcribed files from atrain service

d389b5f

jankapunkt added the enhancement New feature or request label Jun 4, 2024

jankapunkt added this to the Pre-Release milestone Jun 4, 2024

alessandrobelli self-assigned this Jun 4, 2024

Alessandro Belli and others added 4 commits June 4, 2024 17:56

Merge branch 'main' into aTrain-integration

0fca9f5

lint

d48edd6

plugins(transform.aTrain): add minimal README

57d349f

plugin(transform.atrain): add README

f5472da

jankapunkt linked an issue Jun 4, 2024 that may be closed by this pull request

aTrain integration #11

Closed

jankapunkt changed the title ~~A train integration~~ Transcription via aTrain integration Jun 4, 2024

Alessandro Belli and others added 6 commits June 5, 2024 11:09

Update Source.php

9c555a7

- return false if not converted

fix(plugin.atrain): fix filepath and service communitcations

fb0c97b

Update SourceController.php

6e297bf

Merge branch 'aTrain-integration' of https://github.com/openqda/openqda…

353ccc9

… into aTrain-integration

get feedback on the UI

e6f1916

- now you should see file converting on the UI - LINT

fix: add ability to transcribe large files

537c08c

alessandrobelli reviewed Jun 5, 2024

View reviewed changes

services/transform/atrain/README.md Show resolved Hide resolved

Alessandro Belli added 3 commits June 5, 2024 14:36

debug websockets locally

27d6575

- simple bash script to start queue workers and websocket server

Update TranscriptionJob.php

a2ee480

lint

Testing Atrain + download file

cfea51d

- now user sees feedback on ui - you can download the source file you uploaded - secured routes

Alessandro Belli and others added 25 commits June 11, 2024 13:54

delete file + new icons

ff6f3f7

- now we delete the file from aTrain after we get the transcription - new icon for audio document - new button to retry the ATRAIN transcription

build(backend/docker): remove unused debug script

cfda13a

fix(services.transform.atrain): add missing routes to .env.example

6e0d16a

fix(web): use new routes for aTrain integration in TranscriptionJob

1d8f069

fix(plugin.transform.atrain): new route system implemented

8cd84c3

fix(plugin.transform.atrain): added missing files git git rev

ac8851e

fix(client/lint): remove unused cypress eslint plugin

15eda6a

feature(transcription): push Failed event to client

483185b

fix(backend): use correct max size validation for uploaded files

4e1c64c

fix(backend): use a default timeout on Post process calls in Transcri…

2f33d05

…ptionJob

fix(backend): sytax fixed with pint

7148035

fix(backend): better logging in transcription job

76bba56

fix(docker): run services only with 2 workers

099b468

feature(plugin.transform.atrain): use .env for configurable runtime

79dfa3d

reafactor(client): move visibility logic for fileslist actions into a…

1d56dce

…ction definition (open-close)

fix(client): lint/format fix

93ab1bb

fix(client): better error handling on FilesUploader

bb4f2a3

feature(sources/jobs): retry transcription depending on job status

3a9d0fa

remove variables on delete sources

e283ba1

various improvements

cf7e6ba

- save html instead of txt - refactor Transcription job - delete audio file on successful trascription

feature(plugins.transform.atrain): coloured speakers and timestamps

57521e9

Merge branch 'aTrain-integration' of github.com:openqda/openqda into …

cbb2a30

…aTrain-integration

fix(client): set failed to false when file conversion completed succe…

7e381d0

…ssfully

Update internalPlugins.php

472a66f

bugfix naming of rtf endpoints

Update TranscriptionJob.php

d450a88

jankapunkt requested a review from hohse June 13, 2024 14:52

Merge branch 'main' into aTrain-integration

227806b

alessandrobelli approved these changes Jun 13, 2024

View reviewed changes

alessandrobelli merged commit b0f3825 into main Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Feature: Transcription via aTrain integration#40

Feature: Transcription via aTrain integration#40
alessandrobelli merged 58 commits intomainfrom
aTrain-integration

jankapunkt commented Jun 4, 2024

Uh oh!

alessandrobelli commented Jun 4, 2024

Uh oh!

jankapunkt commented Jun 4, 2024

Uh oh!

Uh oh!

alessandrobelli left a comment

Uh oh!

alessandrobelli commented Jun 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

jankapunkt commented Jun 4, 2024

Uh oh!

alessandrobelli commented Jun 4, 2024

Uh oh!

jankapunkt commented Jun 4, 2024

Uh oh!

Uh oh!

alessandrobelli left a comment

Choose a reason for hiding this comment

Uh oh!

alessandrobelli commented Jun 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants