Gemini and other generative AI models process input and output at a granularity
called a token.
For Gemini models, a token is equivalent to about 4 characters.
100 tokens is equal to about 60-80 English words.
About tokens
Tokens can be single characters like z or whole words like cat. Long words
are broken up into several tokens. The set of all tokens used by the model is
called the vocabulary, and the process of splitting text into tokens is called
tokenization.
When billing is enabled, the cost of a call to the Gemini API is
determined in part by the number of input and output tokens, so knowing how to
count tokens can be helpful.
The models available through the Gemini API have context windows that are
measured in tokens. The context window defines how much input you can provide
and how much output the model can generate. You can determine the size of the
context window by calling the getModels endpoint or
by looking in the models documentation.
In the following example, you can see that the gemini-2.0-flash model has an
input limit of about 1,000,000 tokens and an output limit of about 8,000 tokens,
which means a context window is 1,000,000 tokens.
All input to and output from the Gemini API is tokenized, including text, image
files, and other non-text modalities.
You can count tokens in the following ways:
Count text tokens
fromgoogleimportgenaiclient=genai.Client()prompt="The quick brown fox jumps over the lazy dog."# Count tokens using the new client method.total_tokens=client.models.count_tokens(model="gemini-2.0-flash",contents=prompt)print("total_tokens: ",total_tokens)# ( e.g., total_tokens: 10 )response=client.models.generate_content(model="gemini-2.0-flash",contents=prompt)# The usage_metadata provides detailed token counts.print(response.usage_metadata)# ( e.g., prompt_token_count: 11, candidates_token_count: 73, total_token_count: 84 )
fromgoogleimportgenaifromgoogle.genaiimporttypesclient=genai.Client()chat=client.chats.create(model="gemini-2.0-flash",history=[types.Content(role="user",parts=[types.Part(text="Hi my name is Bob")]),types.Content(role="model",parts=[types.Part(text="Hi Bob!")]),],)# Count tokens for the chat history.print(client.models.count_tokens(model="gemini-2.0-flash",contents=chat.get_history()))# ( e.g., total_tokens: 10 )response=chat.send_message(message="In one sentence, explain how a computer works to a young child.")print(response.usage_metadata)# ( e.g., prompt_token_count: 25, candidates_token_count: 21, total_token_count: 46 )# You can count tokens for the combined history and a new message.extra=types.UserContent(parts=[types.Part(text="What is the meaning of life?",)])history=chat.get_history()history.append(extra)print(client.models.count_tokens(model="gemini-2.0-flash",contents=history))# ( e.g., total_tokens: 56 )
All input to the Gemini API is tokenized, including text, image files, and other
non-text modalities. Note the following high-level key points about tokenization
of multimodal input during processing by the Gemini API:
With Gemini 2.0, image inputs with both dimensions <=384 pixels are counted as
258 tokens. Images larger in one or both dimensions are cropped and scaled as
needed into tiles of 768x768 pixels, each counted as 258 tokens. Prior to Gemini
2.0, images used a fixed 258 tokens.
Video and audio files are converted to tokens at the following fixed rates:
video at 263 tokens per second and audio at 32 tokens per second.
Media resolutions
Gemini 3 Pro Preview introduces granular control over multimodal vision processing with the
media_resolution parameter. The media_resolution parameter determines the
maximum number of tokens allocated per input image or video frame.
Higher resolutions improve the model's ability to
read fine text or identify small details, but increase token usage and latency.
For more details about the parameter and how it can impact token calculations,
see the media resolution guide.
Image files
Example that uses an uploaded image from the File API:
fromgoogleimportgenaiclient=genai.Client()prompt="Tell me about this image"your_image_file=client.files.upload(file=media/"organ.jpg")print(client.models.count_tokens(model="gemini-2.0-flash",contents=[prompt,your_image_file]))# ( e.g., total_tokens: 263 )response=client.models.generate_content(model="gemini-2.0-flash",contents=[prompt,your_image_file])print(response.usage_metadata)# ( e.g., prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )
fromgoogleimportgenaiimportPIL.Imageclient=genai.Client()prompt="Tell me about this image"your_image_file=PIL.Image.open(media/"organ.jpg")# Count tokens for combined text and inline image.print(client.models.count_tokens(model="gemini-2.0-flash",contents=[prompt,your_image_file]))# ( e.g., total_tokens: 263 )response=client.models.generate_content(model="gemini-2.0-flash",contents=[prompt,your_image_file])print(response.usage_metadata)# ( e.g., prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )
Audio and video are each converted to tokens at the following fixed rates:
Video: 263 tokens per second
Audio: 32 tokens per second
fromgoogleimportgenaiimporttimeclient=genai.Client()prompt="Tell me about this video"your_file=client.files.upload(file=media/"Big_Buck_Bunny.mp4")# Poll until the video file is completely processed (state becomes ACTIVE).whilenotyour_file.stateoryour_file.state.name!="ACTIVE":print("Processing video...")print("File state:",your_file.state)time.sleep(5)your_file=client.files.get(name=your_file.name)print(client.models.count_tokens(model="gemini-2.0-flash",contents=[prompt,your_file]))# ( e.g., total_tokens: 300 )response=client.models.generate_content(model="gemini-2.0-flash",contents=[prompt,your_file])print(response.usage_metadata)# ( e.g., prompt_token_count: 301, candidates_token_count: 60, total_token_count: 361 )
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-12-18 UTC."],[],[]]