I have created an application to extract text from image using AI
In this article, I am going to demonstrate how text is getting extracted from image file using Gemini AI model. Also, will explain how the application is developed with the help of chat GPT and some custom code.
Very first i gone to chatGPT application and added below prompts so that my application structure i can get the application code to start with. Following is the screenshot of the prompt i have written and answer given by chatGPT.
I have copied the code given by chat gpt and modified as per my requirement that means i got 70 to 80 percentage code to develop the application.
Following are the Prerequisites required for development of this application :
- Visual studio code
- Python
- Api Key of Gemini can be obtain from https://aistudio.google.com
Open visual studio code and create the file with name "imagetextextractapp.py". Now in visual studio code and go to terminal menu and click on New terminal link it will open new terminal. In terminal enter below command to install the Google generative AI library, pillow and steamlit library in your machine.
pip install streamlit pillow google-generativeai
following is the complete code which i have used after modification. copy below code to "imagetextextractapp.py" file.
import streamlit as st
from PIL import Image
import io
import os
import google.generativeai as genai
from IPython.display import Markdown
# Placeholder function for Gemini LLM interaction
def query_gemini_llm(image_bytes, question):
os.environ['GOOGLE_API_KEY']="your api key"
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
# set the vision model
vision_Mod=genai.GenerativeModel('gemini-pro-vision')
response=vision_Mod.generate_content([question,image_bytes])
return Markdown(response.text).data
# Streamlit app
def main():
st.title("Extract Text from Image with Gemini LLM")
# Image upload
uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])
if uploaded_file is not None:
# Display the uploaded image
image = Image.open(uploaded_file)
st.image(image, caption="Uploaded Image", use_column_width=True)
# Text input for the question
question = st.text_area("Enter your question related to the image","Extract the text data in json format of given image of invoice which having Item data which should extract separately with accuracy")
# Button to submit the query
if st.button("Submit"):
if question:
# Query the Gemini LLM model
response = query_gemini_llm(image, question)
st.write(response)
else:
st.write("Please enter a question.")
if __name__ == "__main__":
main()
In above code i have used gemini's vision mode "gemini-pro-vision" to get the information of image in textual format. In "query_gemini_llm" method takes image bytes and prompt and returns the formatted text. I have called "query_gemini_llm" method called in streamlit button click event and sent image bytes of uploaded image by user. thereafter i have print on screen using write method of streamlit library.
Now run the below command to test the application.
python -m streamlit run imagetextextractapp.py
following is the output of program which i have tested.
You can see in above output of the image extraction application is given correctly by the vision model of gemini which can be used for data entry in the other applications which will be common use case of the application. In this user can change the prompt in the text area field and fine tune the output as per user requirement.
Thank you for reading the article.
Comments
Post a Comment