![]() This package includes a number of useful tools. If you don’t get a man page for pdftotext, then install the Poppler Utilities with the following command. If you don’t get a man page for pdftk, then install it. If you don’t get a man page for xpdf, then install it with the following. Start your windowing system and open a terminal. I assume that you already have Tesseract OCR and ImageMagick installed from the previous lesson. Now we need to install tools for working with Adobe Acrobat PDF documents. Since we will be working with pictures of text as well as raw text files, we need to use a window manager or desktop environment. Here we will use command line tools to extract text, images, page images and full pages from Adobe Acrobat PDF files. So it makes sense to try to convert our sources into text files whenever possible. In the previous post we used optical character recognition (OCR) to convert pictures of text into text files. ![]() As a result, we have a very wide variety of powerful tools for manipulating and analyzing text files. This makes it a good option to get the text out of our images easily and quickly.We have already seen that the default assumption in Linux and UNIX is that everything is a file, ideally one that consists of human- and machine-readable text. It will allow us to work with files, scanned images, PDF, pasted clipboard items, etc. GImageReader is a simple front-end Gtk / Qt for tesseract-ocr that comes simplifying the entire process of extracting printed text from images. Sudo add-apt-repository -r ppa:sandromani/gimagereader The PPA that we use for the installation can be eliminated from our system by typing in the same terminal: To finish eliminating the program, we can also execute: sudo apt-get autoremove In case we want uninstall gImageReaderIn a terminal (Ctrl + Alt + T) we will only have to use the following command: Now we should be able to start the program on our computer. With all of the above, gImageReader should install on your Ubuntu. Sudo apt-get install gimagereader tesseract-ocr tesseract-ocr-eng Sudo add-apt-repository ppa:sandromani/gimagereader Install gImageReaderĪfter the software update available, we can now proceed to install the application typing in the same terminal: We will do this by opening a terminal (Ctrl + Alt + T) and typing the following command: To have this software we will need add the PPA repository to our system. In the following lines we will see the gImageReader installation process in Ubuntu 18.04 as indicated in the project's GitHub page. This is one multiplatform application and it works on both Gnu / Linux and Windows.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |