I have created an application to extract text from image using AI

- June 29, 2024

In this article, I am going to demonstrate how text is getting extracted from image file using Gemini AI model. Also, will explain how the application is developed with the help of chat GPT and some custom code.

Very first i gone to chatGPT application and added below prompts so that my application structure i can get the application code to start with. Following is the screenshot of the prompt i have written and answer given by chatGPT.

I have copied the code given by chat gpt and modified as per my requirement that means i got 70 to 80 percentage code to develop the application.

Following are the Prerequisites required for development of this application :

Visual studio code
Python
Api Key of Gemini can be obtain from https://aistudio.google.com

Open visual studio code and create the file with name "imagetextextractapp.py". Now in visual studio code and go to terminal menu and click on New terminal link it will open new terminal. In terminal enter below command to install the Google generative AI library, pillow and steamlit library in your machine.

 pip install streamlit pillow google-generativeai

following is the complete code which i have used after modification. copy below code to "imagetextextractapp.py" file.

 import streamlit as st  
 from PIL import Image  
 import io  
 import os  
 import google.generativeai as genai  
 from IPython.display import Markdown  
   
 # Placeholder function for Gemini LLM interaction  
 def query_gemini_llm(image_bytes, question):  
   os.environ['GOOGLE_API_KEY']="your api key"  
   genai.configure(api_key=os.environ['GOOGLE_API_KEY'])  
   
   # set the vision model 
   vision_Mod=genai.GenerativeModel('gemini-pro-vision')  
   
   response=vision_Mod.generate_content([question,image_bytes])  
   
     
      
   return Markdown(response.text).data  
   
 # Streamlit app  
 def main():  
   st.title("Extract Text from Image with Gemini LLM")  
   
   # Image upload  
   uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])  
   
   if uploaded_file is not None:  
     # Display the uploaded image  
     image = Image.open(uploaded_file)  
     st.image(image, caption="Uploaded Image", use_column_width=True)   
   
     # Text input for the question  
     question = st.text_area("Enter your question related to the image","Extract the text data in json format of given image of invoice which having Item data which should extract separately with accuracy")  
   
     # Button to submit the query  
     if st.button("Submit"):  
       if question:  
         # Query the Gemini LLM model  
         response = query_gemini_llm(image, question)  
         st.write(response)  
       else:  
         st.write("Please enter a question.")  
   
 if __name__ == "__main__":  
   main()

In above code i have used gemini's vision mode "gemini-pro-vision" to get the information of image in textual format. In "query_gemini_llm" method takes image bytes and prompt and returns the formatted text. I have called "query_gemini_llm" method called in streamlit button click event and sent image bytes of uploaded image by user. thereafter i have print on screen using write method of streamlit library.

Now run the below command to test the application.

 python -m streamlit run imagetextextractapp.py

following is the output of program which i have tested.

You can see in above output of the image extraction application is given correctly by the vision model of gemini which can be used for data entry in the other applications which will be common use case of the application. In this user can change the prompt in the text area field and fine tune the output as per user requirement.

Thank you for reading the article.

Search This Blog

I have created an application to extract text from image using AI

Comments

Post a Comment

Popular posts from this blog

Angular User Session Timeout example step by step

Create Chat bot application using Python, Streamlit and Gemini AI

Implement Logging in CSV file using Nlog in .net core MVC application- part 2

Understanding the Singleton Design Pattern in C#