Skip to content

RESOURCES / BLOG

Automatically Generate Adaptive Video Transcripts With AI

Why It Matters

Build accessible video experiences with AI-generated transcripts and adaptive streaming using Cloudinary and Next.js.

Ensuring videos perform at different internet speeds and meet accessibility needs is conducive to a great video experience. There are two boxes you should tick to do so: adaptive video streaming to deliver a smooth viewing experience and transcripts and subtitles (younger generations overwhelmingly prefer to watch content with subtitles on).

Adding transcriptions and subtitles with Cloudinary, the image and video platform for storing, transforming, uploading, and delivering visual media, provides AI-powered tools to automate transcriptions and subtitles for video, improving accessibility and the viewing experience. This blog post shows you how create a flexible, adaptive video solution with automatic AI-generated transcripts using Next.js and Cloudinary.

You can find the GitHub repository here.

You should have:

First, clone the starter project into your preferred folder and checkout to the starter branch using the Git commands below to quickly get started. This starter project is a Next.js application using the app directory configuration with Tailwind CSS for styling.

git clone https://github.com/Olanetsoft/cloudinary-adaptive-video-with-ai-transcripts.git

cd cloudinary-adaptive-video-with-ai-transcripts

git checkout starterCode language: JavaScript (javascript)

The starter project comes pre-configured with several essential packages that you’ll use throughout this tutorial. When you run npm install, the following packages and their dependencies will be installed, as specified in the package.json:

  • Dependencies:
    • Next.js (^14)
    • React (^18) and React DOM (^18)
    • Cloudinary (^2.5.1)
    • React Toastify (^10.0.6)
    • Lucide React (^0.453.0)
  • Dev dependencies:
    • Tailwind CSS (^3.4.1)
    • ESLint (^8) and ESLint Config Next (14.2.16)
    • PostCSS (^8)

Next, run the following command to install all dependencies using the npm package manager and start the project on http://localhost:3000.

<code>npm install && npm run dev</code>Code language: HTML, XML (xml)

When you run the starter project, you’ll see the basic UI for the upload component on the home page. The upload area is displayed, but the functionality to handle uploads is yet to be implemented.

Note:

The upload input is disabled, and there’s no functionality connected to it yet. Throughout this tutorial, you’ll implement the functionality step by step.

You need environment credentials from your Cloudinary dashboard to use Cloudinary image transformation and create an adaptive video solution with automatic AI-generated transcripts. 

Log in to your Cloudinary dashboard to retrieve environment credentials such as the cloud name, API key, and API secret

Get product environment credentials on Cloudinary

Then, create a .env.local file in the project’s root folder using the following command:

For macOS and Linux:

<code>touch .env.local</code>Code language: HTML, XML (xml)

For Windows(Command Prompt):

<code>type NUL > .env.local</code>Code language: HTML, XML (xml)

For Windows(PowerShell):

<code>New-Item -Path .env.local -ItemType File</code>Code language: HTML, XML (xml)

Add your environment credentials:

NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME="*********************"

NEXT_PUBLIC_CLOUDINARY_API_KEY="*********************"

CLOUDINARY_API_SECRET="*********************"Code language: JavaScript (javascript)

Replace ********************* with your actual Cloudinary credentials.

Note:

The NEXT_PUBLIC_ prefix exposes the variable to the browser. Do not expose sensitive information like API_SECRET with this prefix.

Your project structure should look like this:

cloudinary-adaptive-video-with-ai-transcripts/

├── node_modules/

├── public/

├── pages/

├── styles/

├── .env.local

├── package.json

└── next.config.jsCode language: JavaScript (javascript)

Next, you’ll activate the Google Video transcription Add-on in the following section.

Before creating a transcript that will be used for the video subtitle in this guide, you’ll need to activate the Cloudinary Google AI Video Transcription add-on. The Google AI Video Transcription add-on automatically provides video transcription, generating speech-to-text transcripts of videos. It also supports the transcription of videos in any language.

Navigate to the Add-ons section on your Cloudinary dashboard, and click the Google AI Video Transcription add-on.

Navigate to the Add-ons section on your Cloudinary

Next, click to subscribe to the free plan or any plan of your choice.

Subscribe to the Google AI Video Transcription free plan

Now that you’ve activated the Google Video Transcription add-on, you’ll create an API route that handles video uploads to Cloudinary and requests automatic subtitle generation.

In the app/ directory, create a new folder named api/, then add an upload/ folder within it, and finally, create a route.js file inside upload/ folder.

mkdir -p app/api/upload

touch app/api/upload/route.js

Add the following code to app/api/upload/route.js:

// app/api/upload/route.js

"use server";

import { NextResponse } from "next/server";

import { v2 as cloudinary } from "cloudinary";

cloudinary.config({

  cloud_name: process.env.NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME,

  api_key: process.env.NEXT_PUBLIC_CLOUDINARY_API_KEY,

  api_secret: process.env.CLOUDINARY_API_SECRET,

});

export async function POST(req) {

  try {

    const formData = await req.formData();

    const file = formData.get("video_file");

    const buffer = Buffer.from(await file.arrayBuffer());

    const base64Video = `data:${file.type};base64,${buffer.toString("base64")}`;

    // Upload with VTT generation

    const uploadResult = await cloudinary.uploader.upload(base64Video, {

      resource_type: "video",

      public_id: `videos/${Date.now()}`,

      raw_convert: "google_speech:vtt", // Request VTT format

    });

    const cloudName = process.env.NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME;

    // Construct URLs following the documentation pattern

    const videoUrl = uploadResult.secure_url;

    const vttUrl = `https://res.cloudinary.com/${cloudName}/raw/upload/v${

      parseInt(uploadResult.version) + 1

    }/${uploadResult.public_id}.vtt`;

    return NextResponse.json({

      videoUrl,

      vttUrl,

    });

  } catch (error) {

    console.error("Upload error:", error);

    return NextResponse.json(

      { error: "Failed to upload video" },

      { status: 500 }

    );

  }

}Code language: JavaScript (javascript)

In the code above, you:

  • Configured Cloudinary with the credentials from the environment variables.
  • Extracted the video file from the form data.
  • Converted the video file to a base64-encoded string.
  • Uploaded the video to Cloudinary, requesting automatic subtitle generation (raw_convert: 'google_speech:vtt').
  • Constructed the URLs for the uploaded video and the generated VTT subtitle file and returned a JSON response with the videoUrl and vttUrl.

Now, you’ll implement the frontend component that allows users to upload videos.

In the page.js file inside the app/ folder, you must enable the upload input and implement the functionality. At the top of page.js file, import the following packages that were installed earlier in the starter project:

// app/page.js

"use client";

import { useState, useEffect, useRef } from "react";

import { Upload, Loader2 } from "lucide-react";

import { ToastContainer, toast } from "react-toastify";

import "react-toastify/dist/ReactToastify.css";

export default function Home() {

  //...

}Code language: JavaScript (javascript)

Inside the Home component, define the following state variables:

// app/page.js

export default function Home() {

  const [videoUrl, setVideoUrl] = useState("");

  const [vttUrl, setVttUrl] = useState("");

  const [isLoading, setIsLoading] = useState(false);

  // Your code continues...

}Code language: JavaScript (javascript)

Add the handleUpload function inside the Home component:

// app/page.js

//...

export default function Home() {

//...

const handleUpload = async (e) => {

  const file = e.target?.files?.[0];

  if (!file) return;

  setIsLoading(true);

  setVideoUrl("");

  setVttUrl("");

  const formData = new FormData();

  formData.append("video_file", file);

  try {

    const response = await fetch("/api/upload", {

      method: "POST",

      body: formData,

    });

    if (!response.ok) throw new Error("Upload failed");

    const data = await response.json();

    setVideoUrl(data.videoUrl);

    setVttUrl(data.vttUrl);

    toast.success("Video uploaded successfully!");

  } catch (err) {

    const message =

      err instanceof Error ? err.message : "Failed to upload video";

    toast.error(message);

  } finally {

    setIsLoading(false);

  }

};

}Code language: JavaScript (javascript)

In the code above, you:

  • Defined the handleUpload function to retrieve the selected file from the Input Event.
  • Checked if a file was selected; exit if not.
  • Set the loading state to true, reset URLs, created a FormData Object, and appended the video file.
  • Made a POST request to the /api/upload endpoint with FormData.
  • Parsed the JSON response to obtain videoUrl and vttUrl.
  • Updated state variables with the received URLs and displayed a success notification to the user.
  • Caught and handled any errors during the upload process and reset the loading state after completion.

Modify the upload input in the JSX to use the handleUpload function:

// app/page.js

//...

export default function Home() {

//...

<div className="border-2 border-dashed border-gray-700 rounded-lg p-8 text-center">

    <label className="cursor-pointer group">

      <input

        type="file"

        accept="video/*"

        onChange={handleUpload}

        className="hidden"

        disabled={isLoading}

      />

      <div className="space-y-4">

      {/* ... */}

      </div>

    </label>

</div>

}Code language: HTML, XML (xml)

Remove the disabled attribute from the input when you enable the functionality, and at the bottom of your JSX, include the ToastContainer for toast notification in the application:

// app/page.js

//...

export default function Home() {

  //...

return (

    <div className="min-h-screen bg-gray-900 text-white p-4 md:p-8">

      <div className="max-w-5xl mx-auto">

      {/* ... */}

      <ToastContainer theme="dark" />

      </div>

    </div>

  );

}Code language: JavaScript (javascript)

To test the upload functionality, you can start your development server if it’s not already running, navigate to http://localhost:3000 in your browser, and try uploading a video file. If the upload is successful, you should see a success notification.

Now that you can upload videos and receive URLs, let’s add the video player. Below the upload component, add the following code snippet to:

  • Display the video player when a videoUrl is available.
  • Add a handleVideoPlay function that shows subtitle generation notifications.
// app/page.js

//...

export default function Home() {

  //...

  // Handle video play

  const handleVideoPlay = () => {

    if (vttUrl) {

      toast.info(

        "Generating subtitles. They will appear automatically when ready."

      );

    }

  };

  return (

    <div className="min-h-screen bg-gray-900 text-white p-4 md:p-8">

      <div className="max-w-5xl mx-auto">

        {/* ... */}

        {!videoUrl ? (

          <div className="border-2 border-dashed border-gray-700 rounded-lg p-8 text-center">

            {/* ... */}

          </div>

        ) : (

          <div className="space-y-6">

            <div className="relative rounded-lg overflow-hidden bg-black">

              <video

                ref={videoRef}

                className="w-full aspect-video"

                controls

                crossOrigin="anonymous"

                playsInline

                autoBuffer

                muted

                onPlay={handleVideoPlay}

              >

                <source src={videoUrl} type="video/mp4" />

              </video>

            </div>

          </div>

        )}

        <ToastContainer theme="dark" />

      </div>

    </div>

  );

}Code language: JavaScript (javascript)

Next, you’ll add the subtitles to the video player. The URL returned by Cloudinary via the Google AI Video Transcription add-on is the subtitle.

Modify the <video> element to include the subtitles with the following code snippet:

// app/page.js

//...

export default function Home() {

  //...

  return (

      {/* ... */}

      <video

        className="w-full aspect-video"

        controls

        crossOrigin="anonymous"

        playsInline

        autoBuffer

        muted

        onPlay={handleVideoPlay}

      >

        <source src={videoUrl} type="video/mp4" />

        {vttUrl && (

          <track

            label="English"

            kind="subtitles"

            srcLang="en"

            src={vttUrl}

            default

          />

        )}

      </video>

  )

}Code language: PHP (php)

In the following step, you will implement adaptive streaming simulations.

You can simulate adaptive streaming by toggling the video quality based on user interaction. Add the following state variables for throttling at the top of your Home component.

// app/page.js

//...

export default function Home() {

  //...

  const [isThrottled, setIsThrottled] = useState(false);

  const [currentQuality, setCurrentQuality] = useState("auto");

  return (

    // ...

  )

}Code language: JavaScript (javascript)

To simulate how the video player adapts to different bandwidth conditions, you’ll implement the simulateThrottling function. This function toggles between normal and throttled bandwidth states. When throttling is enabled, you’ll modify the video URL to request a lower-quality video version by adding transformation parameters to the Cloudinary URL (q_auto:low). This simulates a low-bandwidth environment, causing the video player to load a lower-quality video stream. 

You’ll also update the UI to reflect the current quality and provide feedback to the user using notifications. To do that, add the following function inside the Home component:

// app/page.js

//...

export default function Home() {

  //...

  const simulateThrottling = () => {

    setIsThrottled(!isThrottled);

    if (!isThrottled) {

      const throttledUrl = videoUrl.replace("/upload/", "/upload/q_auto:low/");

      setCurrentQuality("360p");

      videoRef.current.src = throttledUrl;

    } else {

      setCurrentQuality("auto");

      videoRef.current.src = videoUrl;

    }

    toast.info(

      isThrottled ? "Restored normal bandwidth" : "Simulating low bandwidth"

    );

  };

  return (

    //...

  )

}Code language: JavaScript (javascript)

Next, add the throttling button with an onClick functionality referencing the simulateThrottling function and the Quality dropdown.

// app/page.js

"use client";

//...

export default function Home() {

  //...

  return (

    <div className="min-h-screen bg-gray-900 text-white p-4 md:p-8">

      <div className="max-w-5xl mx-auto">

        {/* ... */}

        {!videoUrl ? (

          {

            /* ... */

          }

        ) : (

          <div className="space-y-6">

            {/* ... */}

            <div className="flex justify-end gap-4">

              <button

                onClick={simulateThrottling}

                className={`px-4 py-2 rounded-md text-sm flex items-center gap-2 transition-colors ${

                  isThrottled ? "bg-red-500/70" : "bg-black/70"

                }`}

              >

                {isThrottled ? "Throttled" : "Normal"} Bandwidth

              </button>

              <select

                value={currentQuality}

                onChange={(e) => setCurrentQuality(e.target.value)}

                className="bg-black/70 text-white px-4 py-2 rounded-md text-sm"

              >

                <option value="auto">Auto Quality</option>

                <option value="1080p">1080p</option>

                <option value="720p">720p</option>

                <option value="480p">480p</option>

                <option value="360p">360p</option>

              </select>

            </div>

          </div>

        )}

        <ToastContainer theme="dark" />

      </div>

    </div>

  );

}Code language: JavaScript (javascript)

Next, you’ll implement bandwidth monitoring to display playback statistics such as current video quality, connection status, and estimated bandwidth. This feature provides real-time feedback to the user about their playback experience and demonstrates how the video player adapts to different network conditions.

Add state variables and useEffect for bandwidth monitoring:

// app/page.js

//...

export default function Home() {

  //...

  const [bandwidth, setBandwidth] = useState(null);

  const videoRef = useRef(null);

  return (

    //...

  )

}Code language: JavaScript (javascript)

Next, add the following useEffect hook inside the Home component:

// app/page.js

//...

export default function Home() {

  //...

  useEffect(() => {

    if (!videoRef.current) return;

    const video = videoRef.current;

    let lastLoadedBytes = 0;

    let lastLoadedTime = Date.now();

    const updateBandwidth = () => {

      if (!video.buffered.length) return;

      const loadedBytes =

        video.buffered.end(video.buffered.length - 1) * 1024 * 1024;

      const currentTime = Date.now();

      const timeDiff = (currentTime - lastLoadedTime) / 1000;

      const bytesDiff = loadedBytes - lastLoadedBytes;

      if (timeDiff > 0) {

        const bandwidthMbps = (bytesDiff / timeDiff / (1024 * 1024)).toFixed(2);

        setBandwidth(bandwidthMbps);

      }

      lastLoadedBytes = loadedBytes;

      lastLoadedTime = currentTime;

    };

    const intervalId = setInterval(updateBandwidth, 1000);

    return () => clearInterval(intervalId);

  }, [videoUrl]);

  return (

    //...

  );

}Code language: JavaScript (javascript)

In the code snippet above, you:

  • Implemented useEffect Hook.
  • Checked if the video element exists and initialized variables for loaded bytes and time.
  • Defined a function updateBandwidth to calculate bandwidth.
  • Set up an interval to call updateBandwidth every second.
  • Cleaned up the interval when the component unmounts or videoUrl changes.

Next, attach the ref to the Video Element by updating your <video> element to include the ref attribute:

<video

    ref={videoRef}

    className="w-full aspect-video"

    controls

    crossOrigin="anonymous"

    playsInline

    autoBuffer

    muted

    onPlay={handleVideoPlay}

  >

    {/* ... */}

</video>Code language: HTML, XML (xml)

Below the throttling button, add the following JSX code to display the playback statistics:

// app/page.js

//...

export default function Home() {

  //...

    return (

      <div className="min-h-screen bg-gray-900 text-white p-4 md:p-8">

        <div className="max-w-5xl mx-auto">

          <header className="mb-8">

           {/* ... */}

          </header>

          {!videoUrl ? (

              {/* ... */}

          ) : (

              {/* ... */}

              <div className="bg-gray-800/50 rounded-lg p-4">

                <h3 className="text-sm font-semibold mb-2">Playback Stats</h3>

                <div className="text-sm text-gray-400 space-y-2">

                  <p>Current Quality: {currentQuality}</p>

                  <p>Connection: {isThrottled ? "Throttled" : "Normal"}</p>

                  <p>

                    Bandwidth: {bandwidth ? `${bandwidth} Mbps` : "Measuring..."}

                  </p>

                </div>

              </div>

          )}

          <ToastContainer theme="dark" />

        </div>

      </div>

  );

}Code language: JavaScript (javascript)

Congratulations! You’ve successfully implemented an adaptive video player with automatic subtitles using Next.js and Cloudinary.

You can now test the entire application by uploading, playing the video, throttling to simulate low bandwidth, and observing the bandwidth measurement and playback statistics. Here is a link to the full video of the demo:

Note:

After implementation, if the subtitle for your video doesn’t show up, wait for a few seconds or minutes and retry. Don’t refresh the page.

You now have a fully functional adaptive video player with AI-generated subtitles. This blog post showed you how to use Next.js and Cloudinary to create a flexible, adaptive video solution with automatic AI-generated transcripts.

To learn more about how to build accessible video experiences that work well across different network conditions with Cloudinary, contact us today.

Start Using Cloudinary

Sign up for our free plan and start creating stunning visual experiences in minutes.

Sign Up for Free