Fast Open source AI
This API utilizes various open source models for different usecases to return the response. The following are the categories available and more will be added soon. 1. Chat API 2. Text to Image Generation 3. Image Classification 4. Speech Recognition 5. Object Detection in Image 6. Audio Classification
Fast Open source AI endpoints
| Method | Endpoint | Description |
|---|---|---|
| Chat API | ||
| POST |
gemma-7b /google/gemma-7b |
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
| Text to Image Generation | ||
| POST |
stable-diffusion-xl-base-1.0 /stabilityai/stable-diffusion-xl-base-1.0 |
Returns 1024x 1024 images based on the input prompt |
| Image Classification | ||
| POST |
mobilenetv3 /timm/mobilenetv3_large_100.ra_in1k |
A MobileNet-v3 image classification model. Trained on ImageNet-1k . |
| POST |
resnet-50 /microsoft/resnet-50 |
ResNet model pre-trained on ImageNet-1k at resolution 224x224. |
| POST |
vit /google/vit-base-patch16-224 |
Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000… |
| POST |
nsfw_image_detection /Falconsai/nsfw_image_detection |
Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification |
| POST |
image-captioning /Salesforce/blip-image-captioning-large |
Image captioning pretrained on COCO dataset - base architecture (with ViT large backbone). This API captions the given input image with relevant description. |
| POST |
vit-gpt2-image-captioning /nlpconnect/vit-gpt2-image-captioning |
This API captions the given input image with relevant description. |
| Speech Recognition | ||
| POST |
whisper-v3 /openai/whisper-large-v3 |
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability… |
| Object Detection in Image | ||
| POST |
detr-resnet-50 /facebook/detr-resnet-50 |
DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images). |
| POST |
yolos-tiny /hustvl/yolos-tiny |
YOLOS model fine-tuned on COCO 2017 object detection (118k annotated images). It was introduced in the paper "You Only Look at One Sequence: Rethinking Transformer in Vision… |
| POST |
trocr-base-handwritten /microsoft/trocr-base-handwritten |
The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The model is used to detect characters for hand… |
| Audio classification | ||
| POST |
ast-finetuned-audioset /MIT/ast-finetuned-audioset-10-10-0.4593 |
The Audio Spectrogram Transformer is equivalent to ViT, but applied on audio. Audio is first turned into an image (as a spectrogram), after which a Vision Transformer is applied.… |
| POST |
speech-emotion-recognition /MIT/ast-finetuned-audioset-10-10-0.4593 |
The model is a fine-tuned version of jonatasgrosman/wav2vec2-large-xlsr-53-english for a Speech Emotion Recognition (SER) task. The dataset used to fine-tune the original… |
| POST |
Musical-genres-Classification /SeyedAli/Musical-genres-Classification-Hubert-V1 |
This model is a fine-tuned version of ntu-spml/distilhubert on the GTZAN dataset. The genres are hiphop, reggae, blues, disco, jazz etc. |
Fast Open source AI pricing
| Plan | Price | Rate limit | Quotas |
|---|---|---|---|
| BASIC | Free | — |
|
| PRO | $4 / month | 1 / second |
|
| ULTRA | $25 / month | 1 / second |
|
| MEGA | $55 / month | 1 / second |
|