- Easy to use: low-code APIs such as mediapipe-python.
- Low overhead: No unnecessary data copy, allocation, and free during the processing.
- Flexible: Users can use custom media bytes as input.
- For TfLite models, the library not only supports all models downloaded from MediaPipe Solutions but also supports TF Hub models and custom models with essential information.
- Object Detection
- Image Classification
- Image Segmentation
- Interactive Image Segmentation
- Gesture Recognition
- Hand Landmark Detection
- Image Embedding
- Face Detection
- Face Landmark Detection
- Pose Landmark Detection
- Audio Classification
- Text Classification
- Text Embedding
- Language Detection
Every task has three types: XxxBuilder, Xxx, XxxSession. (Xxx is the task name)
-
XxxBuilderis used to create a task instanceXxx, which has many options to set.example: use
ImageClassifierBuilderto build aImageClassifiertask.let classifier = ImageClassifierBuilder::new() .model_asset_path(model_path) // set model path .max_results(3) // set max result .category_deny_list(vec!["denied label".into()]) // set deny list .gpu() // set running device .finalize()?; // create a image classifier -
Xxxis a task instance, which contains task information and model information.example: use
ImageClassifierto create a newImageClassifierSessionlet classifier_session = classifier.new_session()?; -
XxxSessionis a running session to perform pre-process, inference, and post-process, which has buffers to store mid-results.example: use
ImageClassifierSessionto run the image classification task and return classification results:let classification_result = classifier_session.classify(&image::open(img_path)?)?;Note: the session can be reused to speed up, if the code just uses the session once, it can use the task's wrapper function to simplify.
// let classifier_session = classifier.new_session()?; // let classification_result = classifier_session.classify(&image::open(img_path)?)?; // The above 2-line code is equal to: let classification_result = classifier.classify(&image::open(img_path)?)?;
- vision:
- gesture recognition:
GestureRecognizerBuilder->GestureRecognizer->GestureRecognizerSession - hand detection:
HandDetectorBuilder->HandDetector->HandDetectorSession - image classification:
ImageClassifierBuilder->ImageClassifier->ImageClassifierSession - image embedding:
ImageEmbedderBuilder->ImageEmbedder->ImageEmbedderSession - image segmentation:
ImageSegmenterBuilder->ImageSegmenter->ImageSegmenterSession - object detection:
ObjectDetectorBuilder->ObjectDetector->ObjectDetectorSession
- gesture recognition:
- audio:
- audio classification:
AudioClassifierBuilder->AudioClassifier->AudioClassifierSession
- audio classification:
- text:
- text classification:
TextClassifierBuilder->TextClassifier->TextClassifierSession
- text classification:
use mediapipe_rs::tasks::vision::ImageClassifierBuilder;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let (model_path, img_path) = parse_args()?;
let classification_result = ImageClassifierBuilder::new()
.model_asset_path(model_path) // set model path
.max_results(3) // set max result
.finalize()? // create a image classifier
.classify(&image::open(img_path)?)?; // do inference and generate results
// show formatted result message
println!("{}", classification_result);
Ok(())
}Example output in console:
$ cargo run --release --example image_classification -- ./assets/models/image_classification/lite-model_aiy_vision_classifier_birds_V1_3.tflite ./ass
ets/testdata/img/bird.jpg
Finished release [optimized] target(s) in 0.00s
Running `/mediapipe-rs/./scripts/wasmedge-runner.sh target/wasm32-wasi/release/examples/image_classification.wasm ./assets/models/image_classification/lite-model_aiy_vision_classifier_birds_V1_3.tflite ./assets/testdata/img/bird.jpg`
ClassificationResult:
Classification #0:
Category #0:
Category name: "/m/01bwb9"
Display name: "Passer domesticus"
Score: 0.91015625
Index: 671
Category #1:
Category name: "/m/0bwm6m"
Display name: "Passer italiae"
Score: 0.00390625
Index: 495
Category #2:
Category name: "/m/020f2v"
Display name: "Haemorhous cassinii"
Score: 0
Index: 0use mediapipe_rs::postprocess::utils::draw_detection;
use mediapipe_rs::tasks::vision::ObjectDetectorBuilder;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let (model_path, img_path, output_path) = parse_args()?;
let mut input_img = image::open(img_path)?;
let detection_result = ObjectDetectorBuilder::new()
.model_asset_path(model_path) // set model path
.max_results(2) // set max result
.finalize()? // create a object detector
.detect(&input_img)?; // do inference and generate results
// show formatted result message
println!("{}", detection_result);
if let Some(output_path) = output_path {
// draw detection result to image
draw_detection(&mut input_img, &detection_result);
// save output image
input_img.save(output_path)?;
}
Ok(())
}Example output in console:
$ cargo run --release --example object_detection -- ./assets/models/object_detection/efficientdet_lite0_fp32.tflite ./assets/testdata/img/cat_and_dog.jpg
Finished release [optimized] target(s) in 0.00s
Running `/mediapipe-rs/./scripts/wasmedge-runner.sh target/wasm32-wasi/release/examples/object_detection.wasm ./assets/models/object_detection/efficientdet_lite0_fp32.tflite ./assets/testdata/img/cat_and_dog.jpg`
DetectionResult:
Detection #0:
Box: (left: 0.12283102, top: 0.38476586, right: 0.51069236, bottom: 0.851197)
Category #0:
Category name: "cat"
Display name: None
Score: 0.8460574
Index: 16
Detection #1:
Box: (left: 0.47926134, top: 0.06873521, right: 0.8711677, bottom: 0.87927735)
Category #0:
Category name: "dog"
Display name: None
Score: 0.8375256
Index: 17fn main() -> Result<(), Box<dyn std::error::Error>> {
let model_path = parse_args()?;
let text_classifier = TextClassifierBuilder::new()
.model_asset_path(model_path) // set model path
.max_results(1) // set max result
.finalize()?; // create a text classifier
let positive_str = "I love coding so much!";
let negative_str = "I don't like raining.";
// classify show formatted result message
let result = text_classifier.classify(&positive_str)?;
println!("`{}` -- {}", positive_str, result);
let result = text_classifier.classify(&negative_str)?;
println!("`{}` -- {}", negative_str, result);
Ok(())
}Example output in console (use the bert model):
$ cargo run --release --example text_classification -- ./assets/models/text_classification/bert_text_classifier.tflite
Finished release [optimized] target(s) in 0.01s
Running `/mediapipe-rs/./scripts/wasmedge-runner.sh target/wasm32-wasi/release/examples/text_classification.wasm ./assets/models/text_classification/bert_text_classifier.tflite`
`I love coding so much!` -- ClassificationResult:
Classification #0:
Category #0:
Category name: "positive"
Display name: None
Score: 0.99990463
Index: 1
`I don't like raining.` -- ClassificationResult:
Classification #0:
Category #0:
Category name: "negative"
Display name: None
Score: 0.99541473
Index: 0
use mediapipe_rs::tasks::vision::GestureRecognizerBuilder;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let (model_path, img_path) = parse_args()?;
let gesture_recognition_results = GestureRecognizerBuilder::new()
.model_asset_path(model_path) // set model path
.num_hands(1) // set only recognition one hand
.max_results(1) // set max result
.finalize()? // create a task instance
.recognize(&image::open(img_path)?)?; // do inference and generate results
for g in gesture_recognition_results {
println!("{}", g.gestures.classifications[0].categories[0]);
}
Ok(())
}Example output in console:
$ cargo run --release --example gesture_recognition -- ./assets/models/gesture_recognition/gesture_recognizer.task ./assets/testdata/img/gesture_recognition_google_samples/victory.jpg
Finished release [optimized] target(s) in 0.02s
Running `/mediapipe-rs/./scripts/wasmedge-runner.sh target/wasm32-wasi/release/examples/gesture_recognition.wasm ./assets/models/gesture_recognition/gesture_recognizer.task ./assets/testdata/img/gesture_recognition_google_samples/victory.jpg`
Category name: "Victory"
Display name: None
Score: 0.9322255
Index: 6Every audio media which implements the trait AudioData can be used as audio tasks input.
Now the library has builtin implementation to support symphonia, ffmpeg, and raw audio data as input.
Examples for Audio Classification:
use mediapipe_rs::tasks::audio::AudioClassifierBuilder;
#[cfg(feature = "ffmpeg")]
use mediapipe_rs::preprocess::audio::FFMpegAudioData;
#[cfg(not(feature = "ffmpeg"))]
use mediapipe_rs::preprocess::audio::SymphoniaAudioData;
#[cfg(not(feature = "ffmpeg"))]
fn read_audio_using_symphonia(audio_path: String) -> SymphoniaAudioData {
let file = std::fs::File::open(audio_path).unwrap();
let probed = symphonia::default::get_probe()
.format(
&Default::default(),
symphonia::core::io::MediaSourceStream::new(Box::new(file), Default::default()),
&Default::default(),
&Default::default(),
)
.unwrap();
let codec_params = &probed.format.default_track().unwrap().codec_params;
let decoder = symphonia::default::get_codecs()
.make(codec_params, &Default::default())
.unwrap();
SymphoniaAudioData::new(probed.format, decoder)
}
#[cfg(feature = "ffmpeg")]
fn read_video_using_ffmpeg(audio_path: String) -> FFMpegAudioData {
ffmpeg_next::init().unwrap();
FFMpegAudioData::new(ffmpeg_next::format::input(&audio_path.as_str()).unwrap()).unwrap()
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let (model_path, audio_path) = parse_args()?;
#[cfg(not(feature = "ffmpeg"))]
let audio = read_audio_using_symphonia(audio_path);
#[cfg(feature = "ffmpeg")]
let audio = read_video_using_ffmpeg(audio_path);
let classification_results = AudioClassifierBuilder::new()
.model_asset_path(model_path) // set model path
.max_results(3) // set max result
.finalize()? // create a image classifier
.classify(audio)?; // do inference and generate results
// show formatted result message
for c in classification_results {
println!("{}", c);
}
Ok(())
}The session includes inference sessions (such as TfLite interpreter), input and output buffers, etc. Explicitly using the session can reuse these resources to speed up.
Origin :
use mediapipe_rs::tasks::text::TextClassifier;
use mediapipe_rs::postprocess::ClassificationResult;
use mediapipe_rs::Error;
fn inference(
text_classifier: &TextClassifier,
inputs: &Vec<String>
) -> Result<Vec<ClassificationResult>, Error> {
let mut res = Vec::with_capacity(inputs.len());
for input in inputs {
// text_classifier will create new session every time
res.push(text_classifier.classify(input.as_str())?);
}
Ok(res)
}Use the session to speed up:
use mediapipe_rs::tasks::text::TextClassifier;
use mediapipe_rs::postprocess::ClassificationResult;
use mediapipe_rs::Error;
fn inference(
text_classifier: &TextClassifier,
inputs: &Vec<String>
) -> Result<Vec<ClassificationResult>, Error> {
let mut res = Vec::with_capacity(inputs.len());
// only create one session and reuse the resources in session.
let mut session = text_classifier.new_session()?;
for input in inputs {
res.push(session.classify(input.as_str())?);
}
Ok(res)
}The default device is CPU, and user can use APIs to choose device to use:
use mediapipe_rs::tasks::vision::ObjectDetectorBuilder;
fn create_gpu(model_blob: Vec<u8>) {
let detector_gpu = ObjectDetectorBuilder::new()
.model_asset_buffer(model_blob)
.gpu()
.finalize()
.unwrap();
}
fn create_tpu(model_blob: Vec<u8>) {
let detector_tpu = ObjectDetectorBuilder::new()
.model_asset_buffer(model_blob)
.tpu()
.finalize()
.unwrap();
}- LFX Workspace: A Rust library crate for mediapipe models for WasmEdge NN
- WasmEdge
- MediaPipe
- wasi-nn safe
- wasi-nn specification
- wasi-nn
This project is licensed under the Apache 2.0 license. See LICENSE for more details.



