

For example, some forms have blocks, some have lines, and some are empty. We used both in-built functions load_img and img_to_array functions and stored the pixels of all images in a list X.įor Image classification, we explored different algorithms, since the dataset is very small, and the classification task is an easy problem as all the four types of forms have distinguished features. We can use the img_to_array() Keras function to convert the loaded data. The pixel data needs to be converted to a NumPy array for use in Keras. Keras provides the load_img() function that can be used to load the image files directly as an array of pixels.
#Target text extractor code#
We write a code that labeled the images from 0 to 3 on the basis of their types and saved the labels in an array Y. We created a directory path in which we saved the path of our folders writes code for labeling. Some have blocks for fields, some have lines, and some are empty and some of them are handwritten so we put them in separate sub-folders. Problem Statement 2: Fields Extraction from formsīased on our group discussion and understanding, we differentiated the forms into 4 different categories. In order to represent the flow in the form of a block diagram we have created the three component-based flow chart which is as follows:

It involves challenging issues, including difficulties in defining standard data sets and standardized performance metrics, the difficulty of comparing multiple document classifiers, and the difficulty of separating classifier performance from pre-processor performance.
