An intelligent indic OCR system under client server environment
Pratik Bhattacharjee, Dept. of CSE,
An OCR system is capable of converting a document image (scanned document) into an editable form. The recent introduction of the E-Governance programs by various government organizations at state and national level requires huge amount of data conversion from the manual documents to a digital one. Also according to the right to information act, the information must be available in a quick and ready to use form. Such huge conversion requires the document to be processed at multiple points and in a multi-user environment. In most of the cases, the OCR software requires huge CPU and memory power which may not be available to individual work stations and even if available, will not be an economical solution.
The proposed system will examine the document at the client’s end and will pass the entire document or part of it, to the server only if it is OCRable[1,2]. The important issue that has to be addressed during this process is the compression and the effective use of the bandwidth. So it will basically perform the three things—examine the document for OCR compatibility, compress it with suitable compression algorithm and at last send the document over the network.
Study on the existing works
There are a few English OCR systems available which can provide the multi-user environment to some extent. But Indian languages along with it’s complicated scripts (w.r.t the English) are much harder to be OCRable. At present no system is available in Indic languages which can operate on a client server environment.
Examine the document for OCR compatibility
The first step in analyzing the image is image segmentation. Each segment is then examined to see whether it is OCRable or not. Traditional systems generally scrap the entire document if a part is not OCRable, but this system will generally examine the entire document , segment by segment and inform whether a part (if not the whole document) is OCRable or not[3-6]. That too at the client side itself without sending the document to the server.
Image segmentation is the process by which the original natural image is partitioned into meaningful regions and is an important initial task for higher level image processing such as object recognition or object tracking. Traditional image segmentation algorithms are showing their shortcomings, especially on the speed of the image segmentation. Large images such as OCR images, are multi-spectral and in multi-scale and so has much higher computational complexity when doing image segmentation operation.
Wavelet-Based Image Segmentation
Wavelet theory is based on strong mathematical foundations and it employs established tools including pyramidal image processing, sub-band coding, and quadrature mirror filtering. One of the advantages is the possibility of multi-resolution analysis, which allows to exploit the signal or image characteristics, matched to a particular scale, which might go undetected in other analysis techniques. This multi-resolution processing helps successful analysis of various kinds of texture. Large image such as remote sensing image always comprise a good many texture information. Texture can be defined as an attribute representing the spatial arrangement of the gray levels of the pixels in a region. One of the most important aspects of texture description has been identified as scale. If it is possible to collect these descriptive features corresponding to a texture at various scales, different textures in an image can be distinguished. Wavelet transform, provides a unified way of multi-resolution analysis.
Figure 1: Sequence of operations for Discrete wavelet-based image segmentation
A wavelet transform decomposes a 1D signal f(x) onto a basis of wavelet functions:
Basis, which is usually taken complete and orthogonal, is obtained translating and dilating a mother wavelet:
The mother wavelet y is localized in both spatial and frequency domain and it has to satisfy zero mean constraint. When a and b are restrained to a discrete lattice (a=2n, bÎ Z), the discrete wavelet transform (DWT) is obtained. The DWT has an efficient implementation in the real space which uses quadrature mirror filters. In this every wavelet corresponds with a high and low pass filter. For the most common case with dilations by a factor of two, the scheme is called ''dyadic'' wavelet transform.
The wavelet decomposition of an 2D image v(x) can be obtained by performing the filtering consecutively along horizontal and vertical directions (separable filter bank). This is depicted schematically in Fig. 2.
Figure 2. First level of 2D wavelet decomposition in three steps: 1. Low and High pass filtering in horizontal direction, 2. the same in vertical direction, 3. subsampling.
It yields 4 subimages for 1 level of decomposition. Every subimage can be subsampled by a factor of 2, hereby retaining the possibility of a complete reconstruction. This leads to a representation with an equal amount of pixels as the original image. To construct a multilevel decomposition, this is repeated iteratively for the low pass subimages. The result is the standard pyramidal wavelet decomposition. When the detail images are also decomposed further, I obtain the tree-structured or wavelet packet decomposition (Fig 3.). Wavelet decomposition of an image v(x) at k-th level can be noted:
Figure 3. First two levels of a pyramidal (top) and packet (bottom) wavelet decomposition.
The wavelet image decomposition provides a representation that is easy to interpret. Every subimage contains information of a specific scale and orientation, which is conveniently separated. Spatial information is retained within the subimages[7-9].
Interactive, Client-Server OCR System
In this section, I illustrate client-server architecture for the Optical Character Recognition system. Our first experience with a small client-server image processing application is summarized below.
Systems for interpreting images of scanned document most often work upon a pixel map of scanned images (i.e., one byte per pixel). For ANSI ‘E’ size drawings scanned at 300 dpi, this requires about 160 MBytes of memory to hold one copy of the image[10-14]. More efficient programs work on the bitmap representation of an image. That cuts down the image size to about 20 MBytes. This is still a large amount of memory, especially if the software needs to use several intermediate copies of the image. For such systems, running one copy of the recognition process ties up most system resources. Running several copies is extremely difficult. Such software also slows down due to large amounts of data transfer involved. Therefore, these systems are not suited to running on an image recognition server that serves several clients.
The document recognition system is built using very fast and memory efficient line finding techniques [16-17]. The fast methods enable us to build an interactive system. The memory efficient nature of the line finding methods puts them in a good position for deployment on a network server. One such image recognition server can serve several clients. In other words, the server can run several instances of the image recognition software with a moderate amount of memory. To make the drop in speed (with several recognition processes running simultaneously) less noticeable, I can use a multi-processor server. The system is designed in such a way that all the recognition happens on the recognition server, and all the user interaction and correction happens on client PC’s. The client application requests an initial download of the next image from the server. Soon after the image download, the recognition results are also transmitted to the client as a sequence of ASCII transactions. Subsequently, the client and the server exchange several graphical editing transactions in both directions.
The communication channel used for transmitting the transactions is the socket interface. The server process creates a socket, maps it to a local address, and waits for client requests. Each client request for image recognition starts a new recognition process on the server and a communication channel is established between the process and the client.
The image recognition server is written in the C language. It runs on Sun Solaris or Linux Operating systems. I am developing two prototype clients for monitoring and editing the recognition results. One is written using the C++ language for a PC running Windows (95/98 or NT), and the other uses the Java language that can run on any platform. Screen shots of the two prototypes are given below. The Java language offers portability and ease of maintenance of applications. However, currently the Java language lacks support for displaying and manipulating large compressed images. Therefore, our current implementation of the client software in Java only displays the results of graphics recognition. It does not show the image. Hopefully, this support will be added to the language soon. In case this doesn’t happen soon enough, I will have the C++ client software to fall back on.
The Client Interface Here I briefly describe some salient features of the C++ client interface. Using this interface, the operator is able to see both the original scanned document and the results of the conversion. I decided to create two scrolling windows, one displaying the original document and the other the conversion. The two windows are completely synchronized; anytime the user scrolls one of them, the other scrolls by the same amount. Also when an area is selected on one, a corresponding area is selected on the other. While the client side was written using the Microsoft Visual C++ environment and the Microsoft Foundation Classes (MFC), these two windows were written using the Win32 API because of their special nature. The application also includes a toolbar (created through MFC) and other auxiliary windows, including one displaying the complete document at a highly reduced scale. Results: The result shows the original image(a) at the server and the recognized image at the client end(b). Only the character part is recognized, the image part is not taken into consideration.
As stated above, the two basic requirements to implement a image conversion system using a client-server architecture are that the graphics recognition algorithms be interactive (i.e., fast) and be designed for very efficient use of memory. As long as these two requirements are met, the client-server architecture offers the following benefits.
The work load is split between the client and the server. The server does the graphics recognition and the client does the monitoring and correction of the intermediate and the final results. Since the client does not need to do any computing, it is possible to use cheap PC’s as clients. Since we designed the communication protocol as mostly ASCII transactions, there is very little network traffic generated by the application. In fact, in testing done using a 10 Base-T Ethernet LAN connection versus a modem connection, there is no noticeable difference in the performance (speed) of the application except for the longer initial delay in the modem scenario for a one-time image download.
Running the recognition software on a central server, and collecting the results of user corrections on the same server, gives us the ability to learn about the weaknesses of the recognition algorithms. We can use this knowledge to improve the recognition algorithms, and can update the algorithms on the server in a manner transparent to the users. Similarly, we can modify the operating parameters of the recognition software without disrupting the users. We can upgrade the recognition software and operating parameters without re-installing any software on users’ desktop PC’s. Further, this architecture gives us the ability to do large scale data collection for training our character recognition software.
The database transactions to fetch the next image to be converted, and to check in the converted data file into the database, are completely transparent to the client PC. The database transactions take place between the image recognition server and the middleware or database server. Similarly, the client PC does not to know where to get the image from. The image recognition server negotiates that with the middleware or image file server and downloads the image to the client PC.
However for image recognition software developed on Unix (or any non-Windows 95/98/NT platform), this architecture obviates the need to port large amount of code to the Windows environment.
As the internet is developing rapidly across the country, the client-server architectures
are fast becoming a standard for any application rather than it’s stand-alone version.
This was my effort to extend this facility to an existing standalone system. However
such system requires more mature technology and language supports in the future.
1. R. Kasturi, S. T. Bow, W. El-Masri, J. R. Gattiker, and U. B. Mokate. A system for interpretation of line drawings. IEEE Trans. Pattern Analysis and Machine Intelligence, 12(10):978–992, 1990. 133
2. D. Antoine. CIPLAN: A model-based system with original features for understanding French plats. In Proc. 1st International Conference on Document Analysis and Recognition, pages 647–655, St. Malo, Paris, 1991. 133
3. D. Antoine, S. Collin, and K. Tombre. Analysis of technical documents: The REDRAW System. In H. S. Baird, H. Bunke, and K. Yamamoto, editors, Structured Document Image Analysis, pages 385–402. Springer Verlag, Berlin/Heidelberg, 1992. 133
4. S. H. Joseph and T. P. Pridmore. Knowledge-directed interpretation of mechanical engineering drawings. IEEE Trans. Pattern Analysis and Machine Intelligence, 14(9):928–940, 1992. 133
5. P. Vaxivi`ere and K. Tombre. CELESSTIN: CAD conversion of mechanical drawings. IEEE Computer Magazine, 25(7):46–54, July 1992. 133
6. L. Wenyin and
D. Dori. Automated CAD conversion with the machine image
understanding system. In Proc. IAPR Workshop on Document Analysis Systems,
9. J. F. Arias, A. Prasad, R. Kasturi, and A. Chhabra.
Interpretation of telephone company central office equipment drawings. In Proc.
12th IAPR International Conference on Pattern Recognition, pages B310–B314,
10. J. F. Arias, S. Balasubramanian,
A. Prasad, R. Kasturi, and A. Chhabra.
Information extraction from telephone company drawings. In Proc. IEEE
Conference on Computer Vision and Pattern Recognition, pages 729–732,
11. J. F. Arias, R. Kasturi,
and A. Chhabra. Efficient techniques for telephone company line image interpretation. In Proc. of 3nd Int. Conf. on Document Analysis and Recognition,
pages 795–798, Montr´eal,
12. J. F. Arias, A. Chhabra,
and V. Misra. Interpreting and representing tabular
documents. In Proc. of CVPR, pages 600–605,
13. J. F. Arias, A. Chhabra,
and V. Misra. Efficient interpretation of tabular documents.
In Proc. International Conference on Pattern Recognition, volume III,
14. H. Luo, R. Kasturi, J. F. Arias, and A. Chhabra.
Interpretation of lines in distributing frame drawings. In Proc.
International Conference on Document Analysis and Recognition, volume I,
F. Arias, A. Chhabra, and V. Misra.
A practical application of graphics recognition: Helping with the extraction
of information from telephone company drawings. In K. Tombre
and A. Chhabra, editors, Graphics Recognition –
Algorithms and Systems, volume 1389 of Lecture Notes in Computer Science,
K. Chhabra, V. Misra, and
J. Arias. Detection of horizontal lines in noisy run length encoded images: The
FAST method. In R. Kasturi and K. Tombre,
editors, Graphics Recognition – Methods and Applications, volume 1072 of
Lecture Notes in Computer Science, pages 35–48.
F. Arias, A. Chhabra, and V. Misra.
Finding straight lines in drawings. In Proc. International Conference on
Document Analysis and Recognition, volume II, pages 788–791,