How to extract text from image and video in OpenCV Python
Make sure you have install pytesseract before running this code
Note : this line of code define location of pytesseract app in my laptop so that we can use it while executing our command
pytesseract.pytesseract.tesseract_cmd ='C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
Image as input
First, we will see how to extract text from an image having English language
import cv2
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd ='C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('scanned.png',cv2.COLOR_BGR2GRAY)
print(pytesseract.image_to_string(img))
cv2.imshow('Result',img)
cv2.waitKey(0)
Ouput : NEW DELHI: The Central Board of Secondary Education (CBSE) and Council for the
Indian School Certificate Examinations (CISCE) on Thursday submitted the
assessment system for class 12 students in the Supreme Court and said the results
will be declared by July 31.
Attorney general (AG) K K Venugopal placed the scheme for CBSE which said
performance of students in class 10, 11 and 12 examinations will be considered.
Extracting non-English text from image. In this case, I will extract Hindi text from an image
Before we start it you need to make sure you have installed the language which you want to extract in my case it is Hindi.Here you noticed something (pytesseract.image_to_string(Image.open('erw.jpg'), lang='hin')) that i have type lang = 'hin' here lang means language and hin means hindi .But what about other language how you know what you need to type for your language for that just type this line print(pytesseract.get_languages(config='')) it will show you all the language you have installed make sure you have installed your desired language else this will not work
Image as input
import cv2
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd ='C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('erw.jpg')
print(pytesseract.image_to_string(Image.open('erw.jpg'), lang='hin'))
cv2.imshow('Result',img)
cv2.waitKey(0)
Output : मकड़ियों की चौंकाने वाली तस्वीरें: ऑस्ट्रेलिया के बाढ़ग्रस्त
इलाके में मकड़ियों ने जाल की चादर डाली, पानी से बचने के
लिए दूर तक बड़ा और ट्रांसपेरेंट जाल बनाया
you can also extract a car no. plate details with this code easily here is example
Extracting text from a webcam :
#Warning if your system is weak you will see a laggy output from this
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd ='C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
cap = cv2.VideoCapture(0)
while True:
_,frame = cap.read()
imgH ,imgW,_ = frame.shape
x1,y1,w1,h1 = 0,0,imgH ,imgW
imgchar = pytesseract.image_to_string(frame)
imgboxes = pytesseract.image_to_boxes(img)
for boxes in imgboxes.splitlines():
boxes = boxes.split(' ')
x,y,w,h = int(boxes[1]),int(boxes[2]),int(boxes[3]),int(boxes[4])
cv2.rectangle(frame,(x,imgH-y),(w,imgH-h),(0,0,255),3)
cv2.putText(frame,imgchar,(x1 +int(w1/50),y1+int(h1/50)),cv2.FONT_HERSHEY_COMPLEX,0.7,(0,0,255),2)
cv2.imshow('text',frame)
if cv2.waitKey(2) & 0xFF ==ord('q'):
break
cap.release()
cv2.destroyAllWindows()
smaller version of this code :
import cv2
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd ='C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('scanned.png',cv2.COLOR_BGR2GRAY)
print(pytesseract.image_to_string(img))
cv2.imshow('Result',img)
cv2.waitKey(0)
0 Comments
if you are not getting it then ask i am glad to help