0% found this document useful (0 votes)
23 views1 page

Model Report

The document outlines the process of creating a dataset for video analysis using a mediapipe holistic model to detect landmarks for left hand, right hand, and pose from individual video frames. It details the challenges faced in detecting landmarks due to visibility issues and the methods used to address these, including image sharpening and filling undetected frames with previously detected landmarks. Finally, the processed data is organized into numpy arrays for training LSTM and 3D CNN models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views1 page

Model Report

The document outlines the process of creating a dataset for video analysis using a mediapipe holistic model to detect landmarks for left hand, right hand, and pose from individual video frames. It details the challenges faced in detecting landmarks due to visibility issues and the methods used to address these, including image sharpening and filling undetected frames with previously detected landmarks. Finally, the processed data is organized into numpy arrays for training LSTM and 3D CNN models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

After creation of a dataset with columns [‘video_path’, ‘sign’, ‘total_number_of_frames’,

‘video_duration_in_sec’, ‘video_duration_in_msec’]

We define a mediapipe holistic model which is used to detect landmarks of left hand, right hand
and pose.

After defining of model, we start by using opencv (hereafter called cv2) library to capture a
individual video path, use cv2 to capture individual frames in video and detect landmarks on that
frame using mediapipe holistic model. Same methodology is applied to each video in the
dataset.

After detecting landmarks, landmarks are inputted in the dataset separately for left hand, right
hand and pose in a list with each row of dataset containing the list of landmarks of specific part
of body (i.e. pose, left hand or right hand) and each list containing landmarks in order of frames
detected in video and each list containing all the detected landmarks in all frames of the video in
that particular row. (## this step was specially done for experimental use of data detected and
we have no requirement of detecting landmarks using mediapipe everytime we use them)

We also created columns detecting the serial number of frame having the landmarks detected
for them. After checking these columns, we noticed that many of the frames cannot be used to
detect landmarks of left hand or right hand either because the specific body parts are not visible
or they are too blurry to detect them.

So our first idea was to sharpen the image using a sharpening kernel and [Link] but it not
provide significant improvements. We still have used it in the landmark detection due to the
slight improvement provide by it.

Then we thought of using the detected landmarks for the target of having landmarks for all the
frames. So we used the frames having their landmarks detected to fill for undetected frames.
We used the detected landmarks to fill the data for undetected frames until no new detected
frame comes across and then using the data of new frame to fill the next frames until another
detected frame comes and so on. The same methodology was applied to each of left hand,
right hand and pose landmarks.

Then the specific data were arranged in numpy arrays combining all three of landmarks i.e. left
hand, right hand and pose landmarks according to the requirement of data format for model
training. Frames were also cropped to uniform the number of frames in all videos with the
middle frames as the central frame.

Then lstm and 3d cnn model were trained.

You might also like