Skip to content

RESOURCES / BLOG

Generate a Video Summary With OpenAI

Manually summarizing videos can be time-consuming and inefficient, making it challenging to extract key information quickly. Copying and pasting video transcripts into a note-taking tool — or hunting down written summaries — is one way to go about it, but it’s not exactly efficient, or, let’s be honest, fun. This is where AI technology steps in to do the heavy lifting for us. 

In this blog post, we’ll use Cloudinary’s video transcription services and OpenAI’s large language model (LLM) capabilities to generate accurate transcriptions and concise summaries. 

To follow this tutorial, you’ll need a Cloudinary account to handle video uploads and transcriptions, a ChatGPT Plus account to access the text generation API, and a basic understanding of Next.js and TypeScript. 

The complete source code for the project is available on GitHub.

To get started, create a Next.js app by running the following command:

<code>npx create-next-app@14.2 cloudinary-video-summary</code>Code language: HTML, XML (xml)

When prompted, select TypeScript, Tailwind CSS, and App Router to set them up in your application. 

Next, navigate to the newly created app folder:

<code>cd cloudinary-video-summary</code>Code language: HTML, XML (xml)

Install the required dependencies, including the Cloudinary Node.js SDK and the OpenAI client library:

<code>npm install cloudinary openai</code>Code language: HTML, XML (xml)

Once your project is set up, run the development server:

<code>npm run dev</code>Code language: HTML, XML (xml)

Your Next.js project should now be running at http://localhost:3000

After initializing the project, navigate to Cloudinary’s developer dashboard to get all the required credentials for this project.

Cloudinary Dashboard

To access the text generation capabilities, you’ll need an OpenAI API key. Go to the OpenAI API key page to create a new API key. On the API key page, click Create new secret key. This key will enable text generation interaction with the OpenAI API.

Create an API Key

In your project’s root directory, create a .env.local file and store the Cloudinary and Open API key credentials, as shown below.

// .env.local

NEXT_PUBLIC_OPENAI_API_KEY=<openai_api_key>

NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME=<cloudinary_api_key>

NEXT_PUBLIC_CLOUDINARY_API_KEY=<cloudinary_api_secret>

CLOUDINARY_API_SECRET=<cloudinary_cloud_name>Code language: HTML, XML (xml)
Note:

Never share your credentials publicly.

Before creating a transcript, you’ll need to activate the Cloudinary Google AI Video Transcription add-on. Cloudinary provides automatic video transcription services using the Google AI Video Transcription add-on, generating speech-to-text transcripts of videos. The add-on supports transcription of videos in any language.

Navigate to the Add-on section in Cloudinary, click the Google AI Video Transcription service, and subscribe to the free plan.

This image has an empty alt attribute; its file name is blog-Generate_a_Video_Summary_With_OpenAI-4.png
Head to the Add-on section
Subscribe to the Free plan

Activating the video transcript Add-on will set up the transcription service needed to transcribe the uploaded video in the next step.

Next, you’ll need to upload a video file to initiate the transcription process. To do this, you’ll create a form with an input element to accept the video file you want to transcribe and a button to trigger the upload process. 

Navigate to pages.tsx file in the src/app folder and create the form and input element, as shown below.

// src/app/pages.tsx

"use client"

export default function Home() {

  return (

  <div className="grid grid-rows-[20px_1fr_20px] items-center justify-items-center min-h-screen p-8 pb-20 gap-16 sm:p-20 font-[family-name:var(--font-geist-sans)]">

    <nav>

      <div className="flex justify-center my-10 items-center rounded-md border-[1px] px-20 py-2 border-blue-800">

      <form>

      <input type="file" name="video" accept="video/*" />

        <button className="bg-blue-800 text-white p-2 rounded-md uppercase tracking-wider text-sm">

              Upload

            </button>

          </form>

        </div>

        {!summary && <span className="text-xl tracking-wide font-semibold text-center">Upload a video to get an instant summary.</span>}

      </nav>

    </div>

  );

}Code language: JavaScript (javascript)

Next, create a function within the same pages.tsx file that handles, appends, and sends the video file to an API route. Then, connect the function to the form’s submission event to trigger it upon submission.

// src/app/pages.tsx 

export default function Home() { 

  const [isLoading, setIsLoading] = useState<boolean>(false);

const handleUpload = async (e: React.FormEvent<HTMLFormElement>) => {

    e.preventDefault();  // Prevents the form from refreshing the page

    setIsLoading(true)

    const formData = new FormData(e.currentTarget);

    try {

      const response = await fetch('/api/upload', {

        method: 'POST',

        body: formData,

      });

      if (response.ok) {

        const data = await response.json();

        console.log('Upload successful', data);

      }

    } catch (error) {

      console.error('Error uploading file:', error);

    }

  };

  return (

  ...

    <form onSubmit={handleUpload}>

    ... // input and button elements

    </form>

  ...

  )}Code language: PHP (php)

In the code above, a FormData object is created to extract the video from the form automatically. Then, the form data is sent to the /api/upload endpoint on the server via a POST request. Finally, the response is logged to the console to see if the upload is successful.

Next, you’ll define an API route handler file to upload the video to Cloudinary. Navigate to the src/app folder and create a folder called api. In the api folder, create another folder called upload. Then, add a file called route.ts and add the following code: 

//src/api/upload/route.tsx

import { NextResponse } from 'next/server';

import cloudinary from 'cloudinary';

// Configure Cloudinary with your account details

cloudinary.v2.config({

  cloud_name: process.env.CLOUDINARY_CLOUD_NAME,

  api_key: process.env.CLOUDINARY_API_KEY,

  api_secret: process.env.CLOUDINARY_API_SECRET,

});

export async function POST(req: Request, res: NextResponse) {

  const formData = await req.formData();

  const file = formData.get('video') as File;

  const buffer: Buffer = Buffer.from(await file.arrayBuffer());

  const cloud_name: string | undefined = process.env.CLOUDINARY_CLOUD_NAME;

  const videoFile: string = `data:${file.type};base64,${buffer.toString(

    'base64'

  )}`;

  // Upload the video and generate the transcript URL in a chain

  try {

    const uploadResult = await cloudinary.v2.uploader

      .upload(videoFile, {

        public_id: `videos/${Date.now()}`,

        resource_type: 'video',

        raw_convert: 'google_speech',

      })

    const videoUrl = uploadResult.secure_url;

    const transcriptFileUrl = `https://res.cloudinary.com/${cloud_name}/raw/upload/v${uploadResult.version + 1

      }/${uploadResult.public_id}.transcript`;

    return NextResponse.json(

      { uploadResult, transcriptFileUrl, videoUrl },

      { status: 200 }

    );

  } catch (error: any) {

    return NextResponse.json({ message: error.message }, { status: 500 })

  }

};Code language: JavaScript (javascript)

Let’s break down the actions taken in each section of the code snippet above:

  • Imports the necessary components from Next.js to define the data structure for requests and responses.
  • cloudinary.v2.config({...}) imports the Cloudinary Node.js SDK (version 2) configured using the previously set environment variables, including the Cloudinary account details.
  • Parses the incoming request body and extracts the form data. Then, it creates a Buffer object from the uploaded file’s binary data and converts the file content into a base64 encoded string for efficient upload.
  • try {...} sends the video file to Cloudinary using the upload method. It provides options:
    • public_id: The unique public ID generated earlier.
    • resource_type: Set to “video” to indicate the upload type.
    • raw_convert: Set to google_speech to trigger a call to Google’s Cloud Speech-to-Text API, transcribing the video.
  • const videoUrl accesses the uploaded video from Cloudinary.
  • const transcriptFileUrl constructs a URL to access the transcript file based on the public ID and Cloudinary versioning.
  • return NextResponse.json(…); handles the upload response. If the upload is successful, this line sends a 200 status code response with the transcript file URL, uploaded video and upload result object from Cloudinary.
  • catch (error: any) {…} handles any errors that occur during the upload process.

This setup allows for easy video upload and automatic transcription generation using Cloudinary’s features, leveraging Next.js API routes to handle requests.

Currently, the transcript file can’t be directly accessed immediately after uploading. This is due to Cloudinary’s asynchronous process of generating transcripts, which initially remains in a pending state as the time required to generate the transcript file depends on the duration of the video.

To handle this, you’ll need to check the transcription status from Cloudinary at predefined intervals. Add a function called generateSummary to the src/app/pages.tsx file, as shown below.

// src/app/pages.tsx  

export default function Home() {

  const [isLoading, setIsLoading] = useState<boolean>(false);

  const [videoUrl, setVideoUrl] = useState<string>('');

  const POLLING_INTERVAL = 30000;

  const generateSummary = async (url: string) => {

    try {

      const response = await fetch(

        `/api/summary?url=${encodeURIComponent(url)}`

      );

      const data = await response.json();

      console.log(data);

      if (data.available) {

      //  the summary will be updated here

        setIsLoading(false)

      } else {

        setTimeout(() => generateSummary(url), POLLING_INTERVAL);        

      }

    } catch (error: any) {

      console.error('Error checking transcription status:', error);

    }

  };

  const handleUpload = async (e: React.FormEvent<HTMLFormElement>) => {

        ...

 if (response.ok) {

        ...

        generateSummary(data.transcriptFileUrl);

        setVideoUrl(data.videoUrl);

      }

    } catch (error) {

      console.error('Error uploading file:', error);

    }

  };

  return (

...

  )};Code language: JavaScript (javascript)

The function uses a polling mechanism set to 30000 milliseconds (or 30 seconds) to check for the availability of the transcript file. First, it sends a GET request to the /api/summary endpoint, passing the transcript file URL as the url parameter, encoded using encodeURIComponent to ensure it’s safe for use in the request. Then, the generateSummary function is triggered upon a successful video upload, receiving the transcript file URL as an input. The uploaded video from the data response is also stored in a state variable, setVideoUrl, which will be rendered once the transcript file is available. The app will recheck the transcript file status every five seconds until the transcript file is ready.

The next step involves defining the structure of the transcript file and its words and creating a transcript parser to process the transcript data, converting it into a clear, readable text format. To define the structure of the transcript file, navigate to the src folder, create a folder called types, and then add a file called transcript-data.type.ts. Add the following code snippet: 

// src/types/transcript-data.type.ts

export interface Word {

  word: string;

  start_time: number;

  end_time: number;

}

export interface TranscriptData {

  transcript: string;

  confidence: number;

  words: Word[];

}Code language: PHP (php)

These interfaces are used to type-check and model transcript data, ensuring the format is consistent throughout the application.

Next, you’ll create the transcript parser. To do this, head to the src folder and create a file called transcript.ts, then add the code snippet shown below.

// src/transcript.ts

import { TranscriptData } from '@/types/transcript-data.type';

export const parseTranscriptData = async (

  data: TranscriptData[]

): Promise<string> => {

  let transcript: string = '';

  data.forEach(

    (line: TranscriptData) => (transcript = transcript + ` ${line.transcript}`)

  );

  return transcript;

};Code language: JavaScript (javascript)

The code snippet above takes an array of TranscriptData objects and concatenates the transcript strings from each object into a single string.

Additionally, you’ll create a file for the /api/summary endpoint. Here, the transcript summary is generated once the transcript file is ready. Navigate to the src/app/api folder and create a new folder called summary. In this folder, create a new file called route.ts, then add the code snippet, as shown below.

// src/app/api/summary/route.ts

import { parseTranscriptData } from '@/transcript';

import { TranscriptData } from '@/types/transcript-data.type';

import { NextResponse, type NextRequest } from 'next/server';

export async function GET(req: NextRequest) {

  const searchParams = req.nextUrl.searchParams;

  const url: string | null = searchParams.get('url');

  try {

    const response = await fetch(url as string);

    if (response.ok) {

      const transcriptData: TranscriptData[] = await response.json();

      const transcript: string = await parseTranscriptData(transcriptData);

      return NextResponse.json(

        { available: true},

        { status: 200 }

      );

    } else {

      return NextResponse.json({ available: false }, { status: 400 });

    }

  } catch (error: any) {

    throw new Error(error);

  }

}Code language: JavaScript (javascript)

The code snippet above begins by importing the parseTranscriptData method and the TranscriptData interface. It then extracts the transcript URL from the incoming request’s query parameters.

Then, it fetches the transcript data from the transcript file URL and parses the JSON response. The parsed data is processed using the parseTranscriptData function to extract the relevant transcript information. 

Finally, a JSON response is returned, indicating whether the transcript is available, and if not, the app will continue polling the /api/summary endpoint until the transcript file is generated.

Once the transcript file is available, the next step is to send it to the OpenAI Text Generation model for summarization and return the transcript summary. Update the code base in the summary/route.ts file, as shown below.

// src/app/api/summary/route.ts

import OpenAI from "openai";

const openai = new OpenAI();

export async function GET(req: NextRequest) {

  ... // Cloudinary video upload function

if (response.ok) {

      ...

     const completion = await openai.chat.completions.create({

        model: "gpt-4", // Choose the appropriate model

        messages: [

          { role: "system", content: "You are a helpful assistant that summarizes texts." },

          {

            role: "user",

            content: `Please summarize the following transcript:${transcript}`,

          },

        ],

      });

      const summary = completion.choices[0].message.content;

      return NextResponse.json(

        { available: true, summary },

        { status: 200 }

      );

    } else {

      return NextResponse.json({ available: false }, { status: 200 });

    }

  } catch (error: any) {

    throw new Error(error);

  }

}Code language: JavaScript (javascript)

Here’s a breakdown of the code snippet above: 

  • The code fetches the transcript data from the transcript file URL and parses it into a suitable format. 
  • The parsed transcript data is input to OpenAI’s Text Generation API. This API call sends a prompt to the OpenAI language model, asking it to summarize the provided transcript. 
  • Then, the API returns a response containing the generated summary along with an available flag from Cloudinary set to true indicating that the transcript is available, making the app stop the polling mechanism.

Finally, you’ll display the uploaded video and summary of the provided transcript. Head to the pages.tsx file in the src/app directory and update the file with the code snippet below.

//src/app/pages.tsx

import { useState } from 'react';

export default function Home() { 

  const [videoUrl, setVideoUrl] = useState<string>('');

  const [summary, setSummary] = useState(''); 

  const [isLoading, setIsLoading] = useState<boolean>(false);

  const generateSummary = async (url: string) => {

    try {

      ...

      if (data.available) {

        setSummary(data.summary)

        setIsLoading(false)

      } else {

        setTimeout(() => generateSummary(url), POLLING_INTERVAL);        

      }

    } catch (error: any) {

      console.error('Error checking transcription status:', error);

    }

  };

    ... // upload function

 return (

    <div className="grid grid-rows-[20px_1fr_20px] items-center justify-items-center min-h-screen p-8 pb-20 gap-16 sm:p-20 font-[family-name:var(--font-geist-sans)]">

     ... //form

      <main className="flex flex-col gap-8 row-start-2 items-center sm:items-start">

        {isLoading && <Loader size={100} />}

        {summary && (

          <>

            <h1 className="text-3xl font-semibold text-center mb-3">Video</h1>

            <div className={styles['video-transcription-section']}>

              <video crossOrigin='anonymous' controls muted>

                <source src={videoUrl} type='video/mp4' />

              </video>

            </div>

            <div className="px-10 ">

              <h1 className="text-3xl font-semibold text-center mb-3">Video Summary</h1>

              <span>{summary}</span>

            </div>

          </>

        )}

      </main>

    </div>

  );

}Code language: HTML, XML (xml)

To test the app, the user selects and uploads a video to Cloudinary, which is processed for transcription during the upload. After the upload is complete, the transcript file is fetched and parsed into a readable format and then sent to the OpenAI API for summarization.

By integrating Cloudinary and OpenAI, we provided a comprehensive solution to manual video summarization. You now have the tools to effortlessly upload videos, generate transcriptions, and create summaries using advanced AI capabilities. Sign up for a free Cloudinary account today and get started.

Start Using Cloudinary

Sign up for our free plan and start creating stunning visual experiences in minutes.

Sign Up for Free