GPT Vision
GPT-4 with Vision allows the model to take in images and answer questions about them. Historically, language model systems have been limited by taking in a single input modality, text. For many use cases, this constrained the areas where models like GPT-4 could be used. GPT-4 with vision is an augmentative set of capabilities for the model.
GPT Vision endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST |
GPT Vision / |
Allows the model to take in images and answer questions about them |
GPT Vision pricing
| Plan | Price | Rate limit | Quotas |
|---|---|---|---|
| BASIC | $1 / month | — |
|
| PRO | $5 / month | — |
|
| ULTRA | $25 / month | — |
|
| MEGA Recommended | $75 / month | — |
|