I have created an application to extract text from image using AI

 In this article, I am going to demonstrate how text is getting extracted from image file using Gemini AI model. Also, will explain how the application is developed with the help of chat GPT and some custom code.



Very first i gone to chatGPT application and added below prompts so that my application structure i can get the application code to start with. Following is the screenshot of the prompt i have written and answer given by chatGPT.

chatgpt prompt 1

chatgpt prompt 2


 I have copied the code given by chat gpt and modified as per my requirement that means i got 70 to 80 percentage code to develop the application.

Following are the Prerequisites required for development of this application :

  1. Visual studio code
  2. Python
  3. Api Key of Gemini can be obtain from https://aistudio.google.com  

Open visual studio code and create the file with name "imagetextextractapp.py". Now in visual studio code and go to terminal menu and click on New terminal link it will open new terminal. In terminal enter below command to install the Google generative AI library, pillow and steamlit library in your machine.


 pip install streamlit pillow google-generativeai  


following is the complete code which i have used after modification. copy below code to  "imagetextextractapp.py" file.

 import streamlit as st  
 from PIL import Image  
 import io  
 import os  
 import google.generativeai as genai  
 from IPython.display import Markdown  
 # Placeholder function for Gemini LLM interaction  
 def query_gemini_llm(image_bytes, question):  
   os.environ['GOOGLE_API_KEY']="your api key"  
   # set the vision model 
   return Markdown(response.text).data  
 # Streamlit app  
 def main():  
   st.title("Extract Text from Image with Gemini LLM")  
   # Image upload  
   uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])  
   if uploaded_file is not None:  
     # Display the uploaded image  
     image = Image.open(uploaded_file)  
     st.image(image, caption="Uploaded Image", use_column_width=True)   
     # Text input for the question  
     question = st.text_area("Enter your question related to the image","Extract the text data in json format of given image of invoice which having Item data which should extract separately with accuracy")  
     # Button to submit the query  
     if st.button("Submit"):  
       if question:  
         # Query the Gemini LLM model  
         response = query_gemini_llm(image, question)  
         st.write("Please enter a question.")  
 if __name__ == "__main__":  

 In above code i have used gemini's vision mode "gemini-pro-vision" to get the information of image in textual format. In "query_gemini_llm" method takes image bytes and prompt and returns the formatted text. I have called "query_gemini_llm" method called in streamlit button click event and sent image bytes of uploaded image by user. thereafter i have print on screen using write method of streamlit library.

Now run the below command to test the application.

 python -m streamlit run imagetextextractapp.py  

following is the output of program which i have tested.

output image 1
output image 2

output image 3

You can see in above output of the image extraction application is given correctly by the vision model of gemini which can be used for data entry in the other applications which will be common use case of the application. In this user can change the prompt in the text area field and fine tune the output as per user requirement.

Thank you for reading the article.


Popular posts from this blog

Implement Logging in CSV file using Nlog in .net core MVC application- part 2

Implement Nlog in .Net core MVC application part 1

Angular User Session Timeout example step by step

Devexpress Datebox date formatting in angular 6 with example

Disable backspace key using Jquery