
Overview
Embed 3 translates text and images into numerical vectors that models can understand. The most advanced generative AI apps rely on high-performing embedding models to understand the nuances of user inputs and search results and documents. This Embed model has 1024 dimensions. This version is also a multilingual model that supports 100+ languages and can be used to search within a language (e.g. search with a French query on French documents) and across languages (e.g. search with a Chinese query on Finnish documents). As of July 2025 the minimum requirement to deploy this model is CUDA driver 12.2 and NVIDIA driver 535.
Highlights
- Embed is the market leading multimodal meaning text and images representation model used for semantic search, retrieval-augmented generation, classification, and clustering. As of Nov 2023 these models achieve state-of-the-art performance among 90+ models on the Massive Text Embedding Benchmark and SOTA for zero-shot dense retrieval on BEIR. As of September 2024 these models achieve state-of-the-art performance on a variety of text-to-image retrieval benchmarks.
- Our optimized containers enable low latency inference on a diverse set of hardware accelerators available on AWS providing different cost and performance points for SageMaker customers.
- Embeddings, Semantic Search, Retrieval-Augmented Generation (RAG), Text Classification, Clustering, Multilingual
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g4dn.12xlarge Inference (Batch) Recommended | Model inference on the ml.g4dn.12xlarge instance type, batch mode | $19.80 |
ml.g5.xlarge Inference (Real-Time) Recommended | Model inference on the ml.g5.xlarge instance type, real-time mode | $5.71 |
ml.p3.2xlarge Inference (Real-Time) | Model inference on the ml.p3.2xlarge instance type, real-time mode | $15.49 |
ml.g5.2xlarge Inference (Real-Time) | Model inference on the ml.g5.2xlarge instance type, real-time mode | $6.16 |
ml.g4dn.xlarge Inference (Real-Time) | Model inference on the ml.g4dn.xlarge instance type, real-time mode | $2.98 |
ml.g4dn.2xlarge Inference (Real-Time) | Model inference on the ml.g4dn.2xlarge instance type, real-time mode | $3.81 |
Vendor refund policy
No refunds. Please contact support+aws@cohere.com for further assistance.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
A key feature update adjusts the default maximum token limit for per-model reranking to balance performance and resource use, with customization options available via API or configuration. Critical bug fixes resolve the "Empty EncodedTexts" issue in Rerank and Embed endpoints by improving chunking logic for oversized inputs and adding safeguards to ensure valid outputs.
Additional details
Inputs
- Summary
The model accepts JSON requests that specifies the input text or a data url of a base64 encoded image to be embedded. The model does not accept both text and images in the same request.
{ "texts": [ "hello", "goodbye" ], "input_type": "search_query", "truncate": "END" }
// OR for images { "images": [ "....."//Some image converted to base64 and formated as a data url ], "input_type": "search_query", "truncate": "END" }
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
texts | An array of strings for the model to embed. Maximum number of texts per call is 1024. We recommend reducing the length of each text to be under 512 tokens for optimal quality. | Default value: [] Type: FreeText | No |
images | An array of base 64 encoded data url as strings to embed. Maximum number of images per call is 1 | Default value: [] Type: FreeText | No |
input_type | A required field that will prepend special tokens to differentiate each type from one another. You should not mix different types together. The only exception for mixing types would be for search and retrieval, you should embed your corpus with the type search_document and then queries should be embedded with type search_query. | Type: Categorical Allowed values: search_document, search_query, classification, clustering | Yes |
truncate | One of NONE|LEFT|RIGHT to specify how the API will handle inputs longer than the maximum token length. Passing LEFT will discard the start of the input. RIGHT will discard the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. If NONE is selected, when the input exceeds the maximum input token length an error will be returned. | Default value: NONE Type: Categorical Allowed values: NONE, LEFT, RIGHT | No |
embeddings_type | Specifies the types of embeddings you want to get back. Not required. If unspecified, returns the float response type. Can be one or more of the types specified in Allowed Values. | Default value: NONE Type: Categorical Allowed values: float, int8, uint8, binary, ubinary | No |
Resources
Vendor resources
Support
Vendor support
Contact us at support+aws@cohere.com or join our Discord community at https://discord.com/invite/co-mmunity
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products





Customer reviews
Controlled text generation has supported secure workflows and governed data privacy
What is our primary use case?
We adopted Cohere primarily for their command model to support enterprise-grade text generation and NLP workflows.
There was a use case for one of our customers where they required automated text generation and summarization of long documents and draft creation for internal content, so we used Cohere's command model with AWS Bedrock.
For another customer, there was a similar use case but they also wanted semantic search and RAG, and instruction-based responses for chat and workflow automation were required, so we used Cohere's command model for that.
What is most valuable?
Cohere's command model is particularly useful for scenarios where consistent controlled output is more important, especially where we need creative responses, so I think Cohere's command model fits better in that case. We also found it well suited for structured enterprise tasks such as policy drafting, knowledge extraction, and generating standardized text for operational workflows.
It struck a good balance between fluency and predictability, which helps our team and is valuable for our business-critical applications, giving better insight to our team.
One of the major benefits I saw was data isolation and governance since Cohere has been implemented.
Consistent output quality, strong instruction following, and excellent embedding performance for retrieval tasks have benefited our organization. It was also offered from Amazon Bedrock , so this complete offering and strength from Cohere's command model helped our customers, and it is enterprise-friendly with deployment options such as VPC and data isolation that helped significantly.
Data privacy was a major concern because we operate from Asia-Pacific, and there is strong governance for data privacy in our country, so data privacy is the major compliance that helped us here.
What needs improvement?
Cohere could improve in areas where the command model is not as creative as some larger LLMs available in the market, which is expected but noticeable in open-ended generative tasks.
Reporting and analytics in the dashboard could be more detailed and fine-tuned, which would enhance the experience.
Fine-tuning could be simplified to support broader teams without deep ML expertise.
For speeding up, what I have already suggested is that it can be more creative, and their reporting and analytics can be improved, as this would help teams without machine learning expertise and speed up their end goals.
The dashboard reporting can be improved.
For how long have I used the solution?
We have been using Cohere for around one year.
What do I think about the stability of the solution?
Cohere is stable.
What do I think about the scalability of the solution?
The scalability and performance are quite good.
How are customer service and support?
We have not reached out to customer support yet, but once we encounter an issue and need to raise a ticket, we will provide feedback.
How would you rate customer service and support?
Negative
What was our ROI?
Cohere helped us with all three aspects: money is saved, time is saved, and we needed fewer resources to meet our end goals.
What's my experience with pricing, setup cost, and licensing?
Compared to models available in the market, Cohere's pricing, setup cost, and licensing are better.
Which other solutions did I evaluate?
We have tried multiple models, but we found that Cohere's command was a better fit for our needs.
We explored models from Anthropic and AWS native models such as AWS Titan Text before choosing Cohere.
What other advice do I have?
Data privacy was a major concern because we operate from Asia-Pacific, and there is strong governance for data privacy in our country, so data privacy is a major compliance that helped us here.
Cohere offers great customization options.
If governance, consistency, and data privacy are priorities, Cohere meets our organization's requirements well.
I recommend that anyone, especially in environments where governance, consistency, and data privacy are priorities, should choose Cohere, particularly the command model for teams looking for a controlled enterprise-safe alternative for text generation, summarization, and instruction automation.
Currently, we have used Cohere from the AWS Bedrock offering only, but since AWS has changed their third-party model availability from partner accounts, in the future, we are going to be a reseller for Cohere.
The documentation and learning resources were very helpful.
Our overall review rating for Cohere is 8 out of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Fast document processing has improved tender workflows but documentation still needs work
What is our primary use case?
My main use case for Cohere is for LLM and chatbot development.
I use Cohere to fill boxes about documents, specifically about tenders.
Cohere helps me fill boxes about documents, and I work with docx documents for a private company.
What is most valuable?
The best features Cohere offers are that it is fast and great.
Speed has helped me in my day-to-day work, and I really notice the difference because it responds very quickly to LLM requests.
Cohere has positively impacted my organization because I use it with Oracle, and in an enterprise way, it helped me offer clients a unique place to develop and use LLM. I can tell you that it helped me offer clients a unique place to develop and use LLM, as I use Oracle services.
What needs improvement?
I am uncertain about how Cohere can be improved.
The documentation and support could be improved, as there is limited documentation available on the web.
What do I think about the stability of the solution?
Cohere is stable.
What do I think about the scalability of the solution?
I am uncertain about Cohere's scalability.
How are customer service and support?
I am uncertain about customer support.
Which solution did I use previously and why did I switch?
I used GPT-4 before Cohere, and it is great.
Before choosing Cohere, I evaluated other options, specifically GPT-4.
What was our ROI?
I am uncertain if I have seen a return on investment or any relevant metrics such as time saved, money saved, or fewer employees needed.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing is that it is expensive to use all Oracle services.
What other advice do I have?
I do not want to add anything else about the features, including anything about accuracy or ease of use.
I do not have specific advice to give to others looking into using Cohere. I gave this review a rating of 6.
Reranking has boosted retrieval quality and has improved performance in my information systems
What is our primary use case?
My main use case for Cohere is Retrieval Augmented Generation.
A specific example of how I use Retrieval Augmented Generation with Cohere is for information retrieval systems.
What is most valuable?
The best feature Cohere offers is the Reranking model.
What stands out for me about the ranking model is that it improved performance in my work.
Cohere positively impacted my organization by improving the performance of my RAG system.
I noticed a 10% improvement in my log system after using Cohere.
What needs improvement?
Cohere is good enough, and I think it can be improved.
For how long have I used the solution?
I have been using Cohere for two years.
What do I think about the stability of the solution?
Cohere is stable.
What do I think about the scalability of the solution?
The scalability of Cohere is good.
How are customer service and support?
The customer support for Cohere is good.
How would you rate customer service and support?
Negative
How was the initial setup?
My experience with pricing, setup cost, and licensing for Cohere is good.
What was our ROI?
I have not seen metrics for return on investment, and I have no metrics to share.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing for Cohere is good.
What other advice do I have?
My advice to others looking into using Cohere is to try it.
My company does not have a business relationship with this vendor other than being a customer.
I gave this review a rating of 8.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Have improved project workflows using faster response times and reduced data embedding costs
What is our primary use case?
I have used Cohere in a RAG use case where I had to vectorize some data. I used multiple models in RAG to find a better model that could give superior results. I was trying to find a cloud-hosted model, and Cohere's Embed English v3.0 is a cloud-hosted model that took less time to embed the textual data. When I was trying to get the similarity search after embedding that data, Cohere provided much better results.
Let's suppose I had to embed 100 documents at a time. Most other models, including all-MiniLM-L6-v2, took more time when I was trying to embed using that model. When I tried Cohere, it was much faster. I would say it was more than 50 to 60% faster than those models. It was even somewhat faster than text-embedding-3, which is from OpenAI. So Cohere helped to reduce the development time and embedding times.
What is most valuable?
I believe Cohere offers excellent features, especially the cloud-hosted model and the API calls. The number of times I can call the API within a minute is very good. The ping is great; I have started a request to Cohere model, and it was very quick to respond. The best part was the free tier because most models do not provide a free tier.
Regarding benefits, Cohere is less costly than other models. If I talk about OpenAI or Google embedding models, they charge highly compared to Cohere. Regarding the training data, Cohere has the most data embedded or trained with the most English. Cohere's Embed English v3.0 has been trained with much more data than other models, including OpenAI. This gives an extra benefit to my organization.
What needs improvement?
One thing that Cohere can improve is related to some distances when I am trying similarity search. Let's suppose I have provided textual data that has been embedded. I have to use some extra process from numpy after embedding the model. In the case of OpenAI embedding models, I do not have to use that extra process, and they provide lower distances compared to my results from Cohere. I was getting distances of approximately 0.005 sometimes, but in the case of Cohere, I was getting distances around 0.5 or sometimes more than that. I think that can be improved. It was possibly because of some configuration or the way I was using it, but I am not exactly sure about that.
For how long have I used the solution?
I have been using Cohere for the last seven or eight months.
What do I think about the scalability of the solution?
The scalability was very good because of the response time. Even though I do not need that much processing at a time, I have had a good experience with Cohere so far.
Which solution did I use previously and why did I switch?
Previously, I was using all-MiniLM-L6-v2 and switched to Cohere because all-MiniLM-L6-v2 needed to be locally deployed. That model was processing locally, and the results I was getting from that model, even though it was open source, I was not satisfied. That is why I switched to Cohere.
What was our ROI?
I can highlight two benefits. Cohere charges less than OpenAI, so it saves cost. In the second use case, the timing is significant. Cohere's Embed English model took less time to embed than OpenAI's embedding ada-002 model. In this case, it also saves time. These two benefits I can highlight.
Which other solutions did I evaluate?
I have evaluated OpenAI's Embed English v3 and text-embedding-3 models. I have evaluated multiple models, and I even evaluated some models from Hugging Face .
What other advice do I have?
Cohere provides a free tier, and any developer who is starting their journey can use Cohere for RAG use cases. They can utilize the model benefits. After using Cohere, I got distances after the similarity search that were much lower compared to other vectorization and embedding models. The only model that performed better than Cohere was OpenAI's text-embedding-3-large. It was good, but Cohere was the second-best performing model in my use case.
I think Cohere's use cases are excellent, and I would suggest Cohere to others because of the less response time and time-saving in the process. It is also cheaper than other models. I would give this review a rating of eight out of ten.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Has improved customer interaction speeds and supports flexible model switching
What is our primary use case?
My main use case for Cohere is to use a Cohere embedded model to create our own vector databases and check conversations.
A specific example of how I use Cohere's embedding model for our vector databases or conversation checking involves abilities that take customer approvals and convert that information into vectors. I save this information in our own systems and also store small vectors on customer devices to use during custom customer requests.
My use case involves indexing and saving small portions of information.
What is most valuable?
In my experience, Cohere offers reliable embedding models for customers who do not want to use standard OpenAI models.
I find that the choice of embedding models is limited, and Cohere was available for Azure , which makes it a good alternative for customers who prefer not to use OpenAI.
Cohere has positively impacted my organization by helping our customers work more efficiently when creating requests, and the embedding results are of very high quality.
What needs improvement?
I believe Cohere can be improved technically by providing more feedback, logs, and metrics for embedding requests, as it currently appears to be a black box without any understanding of quality. Quality can only be understood after using it with customer requests, and during the embedding process, measurable metrics are not visible.
There are no particularly unique features distinguishing Cohere from other solutions.
For how long have I used the solution?
I have been using Cohere for approximately nine to ten months.
What do I think about the stability of the solution?
Cohere is stable in my experience.
What do I think about the scalability of the solution?
The scalability of Cohere showed that after sending a large amount of information and embeddings, it became slower, though we do not use any special solution for scaling.
How are customer service and support?
I have not interacted with Cohere's support team. However, I contacted Azure about the slowness, and we decided to use smaller chunks of information during the embedding process.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
I previously used embedding models from OpenAI. I switched to Cohere because customers wanted to use something other than OpenAI models.
How was the initial setup?
I did not purchase Cohere through the Azure Marketplace . I deployed unmanaged models and shared models.
What was our ROI?
I do not have relevant metrics about the return on investment from using Cohere yet because the customer's application is in a paging stage and has not been released. However, I understand that it is performing well, and we plan to continue with it.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing indicates that it does not require a special license, and the prices are competitive compared to competitors.
Which other solutions did I evaluate?
I did not evaluate other options before choosing Cohere. I looked at prices, and since we used Azure cloud, it did not provide many models for selection. Only OpenAI and Cohere were available for embedding.
What other advice do I have?
For others looking into using Cohere, I advise that it is a good model for people who want to be agnostic when using models and creating something flexible to switch from one model to another. I would rate this product an eight out of ten.