MLX LLM Inference AP
Fast, affordable LLM inference powered by Apple Silicon. OpenAI-compatible chat completions API.
1 subscribers
4 endpoints
The in-depth APIMemo review for this API hasn't been published yet —
the data below comes straight from the public marketplace listing.
MLX LLM Inference AP endpoints
| Method | Endpoint | Description |
|---|---|---|
| v1 | ||
| GET |
listModels /v1/models |
Returns all downloaded MLX models with active status indicator. |
| GET |
listTiers /v1/tiers |
Returns available subscription tiers and their limits. |
| POST |
chatCompletions /v1/chat/completions |
Generate a chat response from the loaded LLM. OpenAI-compatible format. |
| health | ||
| GET |
healthCheck /health |
Returns server status, loaded model, and performance metrics. |
MLX LLM Inference AP pricing
| Plan | Price | Rate limit | Quotas |
|---|---|---|---|
| BASIC | Free | — |
|
| PRO Recommended | $29 / month | — |
|
| ULTRA | $99 / month | — |
|
| MEGA | $199 / month | — |
|