Implementing LiteRT on Google Tensor

1. Overview

The Google Tensor SDK is used to compile LiteRT models for Pixel devices. The compiled models can be deployed on Pixel devices for enhanced ML inference performance. To use the SDK, you must first convert your model into a LiteRT (tflite) model.

This codelab is based on the general colab on GitHub, LiteRT AOT Compilation Tutorial Colab:open in Colab.

Objective

Learn how to use the LiteRT AOT (ahead of time) compiler to compile a selfie segmentation model from a TFLite model into a LiteRT model that is optimized and compiled for on-device EdgeTPUs.

This colab also walks you through the steps for preparing models with Play for On-device AI (PODAI).

PODAI delivers custom models for on-device AI features more efficiently. It simplifies the process for launching, targeting, versioning, and downloading your AI models. When combined with LiteRT EdgeTPU AOT compilation, it lets developers deliver compiled ML models for various devices without the need to know which EdgeTPUs the end user's phone contains.

Models used

The models we're using are originally published on the MediaPipe Image segmentation guide. Here are some details regarding the model used in this codelab:

  • SelfieMulticlass: A LiteRT model that takes an image of a person, locates areas such as hair, skin, and clothing, and outputs an image segmentation map for these items.

2. Get started

Follow these steps to gain access and get started with Google Tensor SDK:

  1. Sign-up to get access to Google Tensor SDK. Before proceeding further, you need wait for an email from Google containing the download link for the compiler plugin.
  2. Download the compiler plugin (litert_plugin_compiler.tar.gz) and place it in a folder of your choice.
  3. Set the environment variable to the local system path of the downloaded file, GOOGLE_TENSOR_SDK_BETA.
    You can run this command on you bash terminal:
    export GOOGLE_TENSOR_SDK_BETA=/path/to/downloaded/compiler
    
    Or you can run this in your Colab notebook:
    %env GOOGLE_TENSOR_SDK_BETA=/path/to/downloaded/compiler
    
  4. Then run this command to install the package:
    pip install ai-edge-litert-sdk-google-tensor
    

3. Install required packages

Start by installing the required packages, including ai-edge-litert-nightly, which contains the EdgeTPU AOT compiler, and other libraries you use for model conversion.

Use this package to install the LiteRT backend for Google Tensor: ai-edge-litert-sdk-google-tensor.

After you install the packages, restart the session and proceed from the installation steps. Don't repeat the installation.

If you plan to perform the setup on your system, we recommend that you use a Python virtual environment (venv) and run these commands within the virtual environment.

Uninstall certain packages

Before that, uninstall the tensorflow that comes with the colab runtime by default.

pip uninstall -y tensorflow ai-edge-litert

Install all libraries

Install the LiteRT backend for Google Tensor

pip install ai-edge-litert-sdk-google-tensor

Install remaining packages

pip install matplotlib huggingface-hub ai-edge-litert-nightly

4. Import all the libraries

Proceed to the main execution after the installation completes.

Import the required packages:

import os
import shutil

from ai_edge_litert.aot import aot_compile as aot_lib
from ai_edge_litert.aot.ai_pack import export_lib as ai_pack_export
from ai_edge_litert.aot.vendors.google_tensor import target as gt_target
import huggingface_hub
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import requests

5. Compile a LiteRT model

This section covers advanced usages like compiling a LiteRT (TFLite) model directly.

EdgeTPU compilation from TFLite model

This step requires a TFLite model. If you don't have a TFLite model, convert your model into the TFLite format.

Get the TFLite Model

We use the MediaPipe MultiClass Segmentation model for this use case.

The TFLite model is available from the MediaPipe Image segmentation page.

work_dir = '.'

model_url = 'https://storage.googleapis.com/mediapipe-models/image_segmenter/selfie_multiclass_256x256/float32/latest/selfie_multiclass_256x256.tflite'
tflite_model_path = os.path.join(work_dir, 'selfie_multiclass_256x256.tflite')

model_content = requests.get(model_url)

with open(tflite_model_path, 'wb') as fout:
  fout.write(model_content.content)

Quickly verify the TfLite model using LiteRT Python API

In the following example, you will see both mask image and blended result.

# Downloading Testing image

test_image = huggingface_hub.hf_hub_download(
    repo_id="litert-community/MediaPipe-Selfie-Segmentation",
    filename="test_img.png",
)
pil_image = Image.open(test_image).convert("RGB").resize((256, 256))

from ai_edge_litert.compiled_model import CompiledModel

SEGMENT_COLORS = [
    (0, 0, 0),
    (255, 0, 0),
    (0, 255, 0),
    (0, 0, 255),
    (255, 255, 0),
    (255, 0, 255),
]
INPUT_SIZE = (256, 256)
NUM_CLASSES = 6

# Load the model and image
model = CompiledModel.from_file(tflite_model_path)
original_image = np.array(Image.open(test_image).convert('RGB'))
img_array = np.array(pil_image).astype(np.float32)

# Normalize the image
normalized = (img_array - 127.5) / 127.5
normalized = np.ascontiguousarray(normalized, dtype=np.float32)

# Run inference
sig_idx = 0
input_buffers = model.create_input_buffers(sig_idx)
output_buffers = model.create_output_buffers(sig_idx)
input_data = normalized.reshape(-1)
input_buffers[0].write(input_data)
model.run_by_index(sig_idx, input_buffers, output_buffers)

# Get output data
height, width = INPUT_SIZE
output_size = height * width * NUM_CLASSES
output_data = output_buffers[0].read(output_size, np.float32)
output_data = output_data.reshape(height, width, NUM_CLASSES)
mask = np.argmax(output_data, axis=2).astype(np.uint8)

# Create colored mask
colored_mask = np.zeros((height, width, 3), dtype=np.uint8)
for label_idx in range(NUM_CLASSES):
  class_mask = mask == label_idx
  color = SEGMENT_COLORS[label_idx]
  colored_mask[class_mask] = color

# Blend with original image
# Resize colored mask to match original image if necessary
if original_image.shape[:2] != colored_mask.shape[:2]:
  colored_mask_pil = Image.fromarray(colored_mask)
  colored_mask_pil = colored_mask_pil.resize(
      (original_image.shape[1], original_image.shape[0])
  )
  colored_mask = np.array(colored_mask_pil)

# Blend images with alpha 0.5
alpha = 0.5
blended_image = (
    original_image * (1 - alpha) + colored_mask * alpha
).astype(np.uint8)

# Display them
fig, axes = plt.subplots(1, 3, figsize=(9, 3))

for idx, (title, image) in enumerate([
    ('Original Image', original_image),
    ('Colored Mask', colored_mask),
    ('Blended Image', blended_image),
]):
  axes[idx].imshow(image)
  axes[idx].set_title(title)
  axes[idx].axis('off')

plt.tight_layout()
plt.show()

Convert to LiteRT model, with EdgeTPU AOT compilation.

We use the APIs from ai_edge_litert.aot to compile the model.

compiled_models = aot_lib.aot_compile(tflite_model_path, keep_going=True)

# This variable will be used later to create the AI Pack.
all_google_tensor_compiled_models = compiled_models

# Print Compilation Report
print(all_google_tensor_compiled_models.compilation_report())

# Saving compiled models to disk. This saves all the compiled models, and a CPU
# fallback model.
all_google_tensor_compiled_models.export(
    work_dir, model_name='selfie_segmentation'
)

When the compilation finishes, use the model.export method to export all models to disk.

By default, the models are stored in a flat structure in the output directory, with each model name suffixed with the backend ID.

For example:

Model filename

Backend

SoC

Note

selfie_segmentation_fallback.tflite

CPU/GPU

N/A

N/A

selfie_segmentation_Google_Tensor_G3.tflite

Google

Tensor_G3

Google Tensor G3

selfie_segmentation_Google_Tensor_G4.tflite

Google

Tensor_G4

Google Tensor G4

selfie_segmentation_Google_Tensor_G5.tflite

Google

Tensor_G5

Google Tensor G5

6. Export and validate on CPU

Once the compilation is complete, verify the TFLite model on the CPU. Do this using the "fallback model" generated during compilation.

# Run LiteRT with test image
from ai_edge_litert.compiled_model import CompiledModel

# Normalize the image to [-1, 1]
img_array = np.array(pil_image, dtype=np.float32)
normalized = (img_array - 127.5) / 127.5
numpy_array = np.ascontiguousarray(normalized)[None, ...]

cpu_model_path = os.path.join(work_dir, "selfie_segmentation_fallback.tflite")
cm_model = CompiledModel.from_file(cpu_model_path)
sig_idx = 0
input_buffers = cm_model.create_input_buffers(sig_idx)
output_buffers = cm_model.create_output_buffers(sig_idx)
input_buffers[0].write(numpy_array)
cm_model.run_by_index(sig_idx, input_buffers, output_buffers)

# Read the 6-channel output and apply argmax
output_data = output_buffers[0].read(256 * 256 * 6, np.float32)
output_data = output_data.reshape((256, 256, 6))
mask = np.argmax(output_data, axis=2).astype(np.uint8)

# Create a colored mask using the previously defined SEGMENT_COLORS
colored_mask = np.zeros((256, 256, 3), dtype=np.uint8)
for label_idx in range(6):
  class_mask = mask == label_idx
  color = SEGMENT_COLORS[label_idx]
  colored_mask[class_mask] = color

mask_image = Image.fromarray(colored_mask)

# Show output results
fig, axes = plt.subplots(1, 2, figsize=(9, 3))

for idx, (title, image) in enumerate([
    ('Test Image', pil_image),
    ('TFLite Mask Image', mask_image),
]):
  axes[idx].imshow(image)
  axes[idx].set_title(title)
  axes[idx].axis('off')

plt.tight_layout()
plt.show()

7. Export Models for PODAI

With your models verified, the next essential step is preparing them for deployment. This section details how to package your compiled models for upload to Google Play, enabling delivery to user devices through the Google Play On-Device AI (PODAI) framework.

The AiEdgeLiteRT AOT (Ahead-of-Time) module provides ai_pack utilities specifically for this purpose. These utilities create an AI Pack, which is a crucial data asset. An AI Pack bundles your compiled models with device-targeting configurations, ensuring that the correct models and assets are delivered to the appropriate user devices. This is particularly vital for NPU (Neural Processing Unit) compilations, as it ensures that models optimized for a specific System-on-Chip (SoC) reach only the devices equipped with that SoC.

# Configuring the AI Pack
os.makedirs('selfie_multiclass', exist_ok=True)
ai_pack_dir = os.path.join(work_dir, 'ai_pack')
ai_pack_name = 'selfie_segmentation'
litert_model_name = 'segmentation_model'

# Clean up
shutil.rmtree(ai_pack_dir, ignore_errors=True)

# Export
ai_pack_export.export(
    all_google_tensor_compiled_models,
    ai_pack_dir,
    ai_pack_name,
    litert_model_name
)

Inspecting AI Pack source

def list_files(startpath):
  """Function to print out the tree structure of a directory."""
  for root, dirs, files in os.walk(startpath):
    level = root.replace(startpath, '').count(os.sep)
    indent = ' ' * 4 * (level)
    print('{}{}/'.format(indent, os.path.basename(root)))
    subindent = ' ' * 4 * (level + 1)
    for f in files:
      print('{}{}'.format(subindent, f))
"""View the files generated within the AI pack directory"""
list_files(ai_pack_dir)

8. Configure advanced options

NPU Compilation for specific device or EdgeTPU

By default, LiteRT AOT compilation compiles to all registered backends. For local development, you might only want to compile for specific devices, such as development phones. Achieve this by providing the compilation targets explicitly.

The following example compiles to Google Tensor G5.

# Specifying the compilation target
tensor_g5_target = gt_target.Target(gt_target.SocModel.TENSOR_G5)

# Compile from the TFLite model for a specific target
compiled_models = aot_lib.aot_compile(
    tflite_model_path,
    target=[tensor_g5_target],
    keep_going=False,  # We want to error out when there's failure.
)

print(compiled_models.compilation_report())

Compilation flags for Google Tensor

Customize the compilation process through compilation flags. Here, this flag is used: google_tensor_truncation_type="half"

When compiling a TFLite model

compiled_models = aot_lib.aot_compile(
    tflite_model_path,
    target=[tensor_g5_target],
    keep_going=False,
    google_tensor_truncation_type="half"
)

9. Next steps

Congratulations!

Your models are ready for consumption by PODAI!

Now move on to Android Studio for the following steps; see LiteRT image segmentation samples for details.