Developers Club geek daily blog

1 year, 10 months ago
Being engaged in applications programming under Android OS there are interesting ideas which want to be tried, or there is some set of theoretical knowledge and they want to be put into practice, from set of these factors and there was an idea of the described project.

There are many articles about text recognition, about computer sight and about separate algorithms of recognition. In the same publication attempt of implementation of the task connected with finding of a key word on the image of the text that is able to afford to find, for example, the necessary place for reading any text in DjVu without recognition of the text is shown.

Example of implementation is presented in the form by application Android, and the source image is the text screenshot, with the entered key word, different algorithms of processing and image understanding are applied to a solution of a task.

Task


Let's allow we have an image of some text, it can be the photo, the scanned image or a screenshot, and on this image it is necessary to find some phrase or the word as soon as possible quickly to retrieve necessary part of information, here we are come to the rescue by algorithms of image processing and recognition.

In detail stages of creation Android of the application will not be described here, as well as the detailed theoretical description of algorithms will not be submitted. At the minimum test application interface main objectives described below are:

  • Acquaintance with some methods of image processing and pattern recognition;
  • Acquaintance with opportunities and complexity of implementation of these methods for Android.

Receipt of the image


For receipt of the studied image we create Activity in which there will be only three elements:

1. EditText — for input of a key word;
2. TextView — for display of the text in which it is necessary to find this word;
3. Button of creation of a screenshot and transition to other screen.

! All code has exclusively demonstrative character and is not the correct instruction to action.

XML code for 1 and 2 points
<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:id="@+id/main_content_layout"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:background="#ddd"
    android:orientation="vertical"
    android:paddingBottom="@dimen/activity_vertical_margin"
    android:paddingLeft="@dimen/activity_horizontal_margin"
    android:paddingRight="@dimen/activity_horizontal_margin"
    android:paddingTop="@dimen/activity_vertical_margin"
    app:layout_behavior="@string/appbar_scrolling_view_behavior"
    tools:context=".MainActivity"
    tools:showIn="@layout/activity_main">

    <EditText
        android:layout_width="fill_parent"
        android:layout_height="40dp"
        android:singleLine="true"
        android:textColor="#000" />

   <TextView
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="@string/main_text"
        android:textColor="#000"
        android:ellipsize="end"/>
</LinearLayout>


Layout with the button which includes the link to the above-stated layout
<?xml version="1.0" encoding="utf-8"?>
<android.support.design.widget.CoordinatorLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:fitsSystemWindows="true"
    tools:context=".MainActivity">

    <android.support.design.widget.AppBarLayout
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:theme="@style/AppTheme.AppBarOverlay">

        <android.support.v7.widget.Toolbar
            android:id="@+id/toolbar"
            android:layout_width="match_parent"
            android:layout_height="?attr/actionBarSize"
            android:background="?attr/colorPrimary"
            app:popupTheme="@style/AppTheme.PopupOverlay" />

    </android.support.design.widget.AppBarLayout>

    <include layout="@layout/content_main" />

    <android.support.design.widget.FloatingActionButton
        android:id="@+id/fab"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_gravity="bottom|end"
        android:layout_margin="@dimen/fab_margin"
        android:src="@android:drawable/ic_menu_camera" />

</android.support.design.widget.CoordinatorLayout>


Approximately so it will look:

Example of implementation of methods of processing and image understanding on Android

For search, for example, we will enter the word "dreams":

Example of implementation of methods of processing and image understanding on Android

Thus we receive a single key word after which the text follows below. It is worth paying attention that the key word and the text have different type size (for complication of a task, so to say).

We press the button and we receive an area screenshot with a key word and the text.

The method for receipt of a screenshot called on clicking the button
private void takeScreenshot() {
        //Для уникальности файлов используется текущее время
        Date now = new Date();
        //Форматирование даты/времени по образцу
        android.text.format.DateFormat.format("yyyy-MM-dd_hh:mm:ss", now);

        try {
            //Получение пути для сохранения изображений и подготавливаем файл
            String path = Environment.getExternalStorageDirectory().toString() + "/" + now + ".jpg";
            File imageFile = new File(path);

            //Нахождение части layout'a, в которой находится ключевое слово и текст
            View v1 = findViewById(R.id.main_content_layout);

            //Получение изображения
            v1.setDrawingCacheEnabled(true);
            Bitmap bitmap = Bitmap.createBitmap(v1.getDrawingCache());
            v1.setDrawingCacheEnabled(false);
            
            //Сохранение изображения в файл
            FileOutputStream outputStream = new FileOutputStream(imageFile);
            int quality = 100;
            bitmap.compress(Bitmap.CompressFormat.PNG, quality, outputStream);
            outputStream.flush();
            outputStream.close();

            //Переход к следующей Activity
            openScreenshotActivity(now);
        } catch (Throwable e) {
            e.printStackTrace();
        }
    }


The received screenshot opens in new Activity where in NavigationDrawer functions of consecutive actions are collected. In the real application some of operations can be united in one for an exception of excess passes according to the image.

Preprocessing of the received image


For a start it is necessary to execute transfer of the received color image in gray-scale.

Transfer of the image from color in gray-scale

For transfer the scheme RGB to YUV is used.

In our case only intensity (brightness) is necessary, and it is possible to receive it on a forumula:

Y = 0.299*R + 0.587*G + 0.114*B, where R,G,B red, green and blue channels respectively.

The class Color and, in particular, its static red, green and blue methods in which operations with bit-by-bit shifts for selection of the necessary color channel from intovy color value of pixel are implemented, strangely enough, is useful to work with flowers.

Code sample, for transfer of color pixel in brightness:

//Цикл обхода матрицы цветных пикселей pixels (size = количество пикселей = width*height изображения)
for (int i = 0; i < size; i++) {
            int color = pixels[i];

            //получаем значение красного канала
            int r = Color.red(color);

            //получаем значение зеленого канала
            int g = Color.green(color);

            //получаем значение синего канала
            int b = Color.blue(color);

            //вычисляем полутоновую яркость по формуле перехода RGB to YUV
            double luminance = (0.299 * r + 0.0f + 0.587 * g + 0.0f + 0.114 * b + 0.0f);
        }

It is not one many easier to execute segmentation on the halftone image, than on color therefore the following step it is necessary to translate the halftone image in binary (values of brightness of pixels have only two values 0 and 1).

Binarization of the halftone image

In this task for a binarization of rather elementary threshold method, with a threshold by default 128.

Further for correction of results the threshold can be picked up experimentally (in the application the possibility of a task of a threshold is implemented by the user).

For receipt of binary value of brightness check of the received earlier gray-scale brightness is executed:

 pixels[i] = luminance > threshold ? Color.WHITE : Color.BLACK;

Where threshold — a threshold, Color.WHITE and Color.BLACK — constants for convenience not to be confused with 0 and 1.

After the done transfer in gray-scale and a further binarization we receive the following result:

Example of implementation of methods of processing and image understanding on Android

As these two methods of image processing are very close, they are united in one method.

Example of the method making a transfer in gray-scale and a binarization of the image
/**
* imagePath - путь к изображению
* threshold - порог бинаризации
*/
public void binarizeByThreshold(String imagePath, int threshold) {
        //Открываем изображение по его пути
        Bitmap bitmap = BitmapFactory.decodeFile(imagePath);
        
        //Получаем размеры изображения
        int width = bitmap.getWidth();
        int height = bitmap.getHeight();
        int size = width * height;
        
        //Получаем матрицу пикселей изображения
        int[] pixels = new int[size];
        bitmap.getPixels(pixels, 0, width, 0, 0, width, height);
        bitmap.recycle();

        //Проходим по всем пикселям в матрице, выполняя перевод в полутоновое изображение и бинаризацию по порогу
        for (int i = 0; i < size; i++) {
            int color = pixels[i];
            int r = Color.red(color);
            int g = Color.green(color);
            int b = Color.blue(color);
            double luminance = (0.299 * r + 0.0f + 0.587 * g + 0.0f + 0.114 * b + 0.0f);
            pixels[i] = luminance > threshold ? Color.WHITE : Color.BLACK;
        }

        Utils.saveBitmap(imagePath, width, height, pixels);
    }


Segmentation


For finding of a key word it is necessary to complete after a binarization several phases of segmentation which consist in search of lines in the text, search of words in lines and reference of the found words to words to applicants by quantity of letters (for narrowing of a circle of words for recognition).

Segmentation of lines

For determination of lines of the text it is necessary to find parts of the histogram with the number of black pixels more than zero which are between parts of the histogram with the zero number of pixels. For this purpose the histogram of number of black pixels in each one-pixel picture line then it is processed for receipt of coordinates of lines in vertical direction is formed (in fact row numbers in the histogram). Knowing this coordinate and the general image sizes, it is possible to select the area containing a line with words.

"Inline" the histogram for descriptive reasons (at the left) looks, for example, so:

Example of implementation of methods of processing and image understanding on Android

Method for receipt of the histogram of lines
public ArrayList<GystMember> getRowsGystogram(String imagePath) {
        //Открытие изображения, получение его размеров и матрицы пикселей
        Bitmap bitmap = BitmapFactory.decodeFile(imagePath);
        int width = bitmap.getWidth();
        int height = bitmap.getHeight();
        int size = width * height;
        int[] pixels = new int[size];
        bitmap.getPixels(pixels, 0, width, 0, 0, width, height);
        bitmap.recycle();

        ArrayList<GystMember> gystogram = new ArrayList<>();

        //Получение гистограммы
        for (int x = 0; x < height; x++) {
            gystogram.add(new GystMember(x));
            for (int y = 0; y < width; y++) {
                int color = pixels[y + x * width];
                if (color == Color.BLACK) {
                    gystogram.get(x).add();
                }
            }
        }
        return gystogram;
    }

/*Класс, который здесь и в дальнейшем используется для удобства составления гистограмм в виде коллекции*/
public class GystMember implements Serializable {
    public int grayValue;
    public int count;

    public GystMember(int grayValue) {
        this.grayValue = grayValue;
        this.count = 0;
    }

    public void add() {
        count++;
    }
}



For convenience of display the histogram is normalized.

Example of normalization of the histogram in onDraw of custom View
if (mGystogram != null) {
            float max = Integer.MIN_VALUE;
            for (GystMember gystMember : mGystogram) {
                if (gystMember.count > max) {
                    max = gystMember.count;
                }
            }
            int pixelSize = getWidth();
            int coef = (int) (max / pixelSize);
            if (coef == 0){
                coef = 1;
            }
            int y = 0;
            for (GystMember gystMember : mGystogram) {
                int value = gystMember.count;
                canvas.drawLine(0, y, value / coef, y, mPaint);
                y++;
            }
        }


When the arrangement of lines on the image is known, it is possible to pass to search of words.

Segmentation of words

To understand where words and where just next characters, it is necessary to be defined what distances between characters to consider as distances between words, and what in distances between characters in one word. Here the method of modes, to be exact its adapted option because most often this method meets in a binarization comes to the rescue, but its essence can be applied and in this situation.

For a start as well as for lines, similarly we build the histogram in a line (in fact was vertical, horizontal now), it is necessary to understand where characters and where white intervals.

The example of receipt of such histogram is lower and is similar to previous.

Processing this histogram it is possible to receive the new histogram — intervals between black pixels, i.e., its first gradation will be quantity of "spaces" 1 pixel wide, the second gradation quantity of "spaces" 2 pixels wide and so on.

Example of receipt of the histogram of intervals
public ArrayList<GystMember> getSpacesInRowsGystogram(String imagePath, ArrayList<GystMember> rowsGystogram) {
        //Загрузка изображения, получение его размеров и матрицы пикселей
        Bitmap bitmap = BitmapFactory.decodeFile(imagePath);
        int width = bitmap.getWidth();
        int height = bitmap.getHeight();
        int size = width * height;
        int[] pixels = new int[size];
        bitmap.getPixels(pixels, 0, width, 0, 0, width, height);
        bitmap.recycle();

        //Для гистограммы используется тот же GystMember, только немного менять суть параметров.
        ArrayList<GystMember> oneRowGystogram = new ArrayList<>();
        ArrayList<Integer> spaces = new ArrayList<>();
        ArrayList<GystMember> spacesInRowsGystogram = new ArrayList<>();

        int yStart = 0, yEnd = 0, yIter = -1;
        boolean inLine = false;

        //Последовательный обход строк
        for (GystMember gystMember : rowsGystogram) {
            yIter++;

            //Если встречаем черные пиксели (имеется в виду в гистограмме), то "начинаем строку"
            if (gystMember.count > 0 &&!inLine) {
                inLine = true;
                yStart = yIter;
            } else if (gystMember.count == 0 &&inLine) { //Когда встречаем пустую градацию (нет черных пикселей), "заканчиваем строку"
                inLine = false;
                yEnd = yIter;

                //Строим гистограмму внутри строки, аналогично гистограмме строк
                for (int x = 0; x < width; x++) {
                    GystMember member = new GystMember(x);
                    for (int y = yStart; y < yEnd; y++) {
                        int color = pixels[x + y * width];
                        if (color == Color.BLACK) {
                            member.add();
                        }
                    }
                    oneRowGystogram.add(member);
                }

                int xStart = 0, xEnd = 0, xIter = -1;
                boolean inRow = false;

                //Горизонтальный обход одной строки
                for (GystMember oneRowMember : oneRowGystogram) {
                    xIter++;

                    //Поиск символов аналогичен поиску строк
                    if (oneRowMember.count == 0 &&!inRow) {
                        inRow = true;
                        xStart = xIter;
                    } else if ((oneRowMember.count > 0 || xIter == oneRowGystogram.size()-1) &&inRow) {
                        inRow = false;
                        xEnd = xIter;

                        //Вычисляем количество пикселей между соседними символами и добавляем в список промежутков
                        int xValue = xEnd - xStart;
                        spaces.add(xValue);
                    }
                }
            }
        }

        //Сортируем промежутки и собираем гистограмму промежутков
        Collections.sort(spaces);
        int lastSpace = -1;
        GystMember gystMember = null;
        for (Integer space : spaces) {
            if (space > lastSpace) {
                if (gystMember != null) {
                    spacesInRowsGystogram.add(gystMember);
                }
                gystMember = new GystMember(space);
            }
            gystMember.add();
            lastSpace = space;
        }

        return spacesInRowsGystogram;
    }


Something like that will turn out approximately:

Example of implementation of methods of processing and image understanding on Android

Proceeding from logic of a method of modes we find two pronounced picas (on the image above they are obvious), and everything that there is about one pica — distances between characters in the word, about the second — distance between words.

Having such information on "spaces" and the histogram of a line it is possible to select image sections where there are words, and also to consider how many characters contain in these words.

Watch a complete code of this process in source codes.

For visual evaluation of the work of the above set of algorithms of the word in the text are colored in alternating colors, and words applicants (I will remind, at what the quantity of characters matches quantity the character in a key word) remain black color:

Example of implementation of methods of processing and image understanding on Android

And so, at this stage there are pieces of images (more precisely than the coordinate of these pieces) which applicants on the basis of quantity of characters contain a key word and words, it is possible to try to find a key word among applicants.

Recognition


I pay attention that here recognition is meant as recognition of a key word among applicants, at the same time about recognition of what characters they consist speeches does not go.

For recognition the set of informative signs is selected: it can be quantity of end points, nodal points, and also the number of pixels with 3, 4 and 5 black pixels neighbors, and others. It is by practical consideration established that a large number of signs loses meaning as they "block" each other.

At this stage we will stop on quantity of end points, taking into account their arrangement (in the upper and lower part of the image — for each part signs are considered separately).

For each word calculation of quantity of signs then there is a search in a method of the closest neighbors on the basis of Euclidean distance is executed.

Example of structure formation with the counted signs
   private void generateRecognizeMembers() {
        mRecognizeMembers.clear();
        for (PartImageMember pretendent : mPretendents) {
            RecognizeMember recognizeMember = new RecognizeMember(pretendent);

            int pretendentWidth = pretendent.endX - pretendent.startX;
            int pretendentHeight = pretendent.endY - pretendent.startY;
            int[][] workPixels = new int[pretendentWidth][pretendentHeight];

            //Получение матрицы претендента из большой матрицы всего изображения
            for (int pY = pretendent.startY, py1 = 0; pY < pretendent.endY; pY++, py1++)
                for (int pX = pretendent.startX, px1 = 0; pX < pretendent.endX; pX++, px1++) {
                    workPixels[px1][py1] = mImagePixels[pX + pY * mImageWidth];
                }

            int half = pretendentHeight / 2;
            //Обход сверху вниз
            for (int ly = 0; ly < pretendentHeight; ly++) {
                //Обход слева направо
                for (int lx = 0; lx < pretendentWidth; lx++) {
                    int currentColor = workPixels[lx][ly];

                    //При вычислении характерных чисел нас интересуют только пиксели объекта.
                    if (currentColor != Color.WHITE) {
                        //Заполнение 3х3 матрицы соседей каждого пикселя
                        int[][] pixelNeibours = Utils.fill3x3Matrix(pretendentWidth, pretendentHeight, workPixels, ly, lx);
                        int[] pixelsLine = Utils.getLineFromMatrixByCircle(pixelNeibours);      

                        //Подсчет характерных чисел для определения того, является ли пиксель концевым
                        int A4 = getA4(pixelsLine);
                        int A8 = getA8(pixelsLine);
                        int B8 = getB8(pixelsLine);
                        int C8 = getC8(pixelsLine);
                        int Nc4 = A4 - C8;
                        int CN = A8 - B8;

                        recognizeMember.A4.add(A4);
                        recognizeMember.A8.add(A8);
                        recognizeMember.Cn.add(CN);

                        //При выполнении этих условий пиксель считается концевым
                        if (A8 == 1 &&Nc4 == 1 &&CN == 1) {
                            if (ly < half) {
                                recognizeMember.endsCount++;
                            } else {
                                recognizeMember.endsCount2++;
                            }
                        }
                    }
                }
            }
            mRecognizeMembers.add(recognizeMember);
        }
    }

   private int getA4(int[] pixelsLine) {
        int result = 0;
        for (int k = 1; k < 5; k++) {
            result += pixelsLine[2 * k - 2];
        }
        return result;
    }

    private int getA8(int[] pixelsLine) {
        int result = 0;
        for (int k = 1; k < 9; k++) {
            result += pixelsLine[k - 1];
        }
        return result;
    }

    private int getB8(int[] pixelsLine) {
        int result = 0;
        for (int k = 1; k < 9; k++) {
            result += pixelsLine[k - 1] * pixelsLine[k];
        }
        return result;
    }

    private int getC8(int[] pixelsLine) {
        int result = 0;
        for (int k = 1; k < 5; k++) {
            result += pixelsLine[2 * k - 2] * pixelsLine[2 * k - 1] * pixelsLine[2 * k];
        }
        return result;
    }


About all characteristic A4, A8 numbers, etc. it is possible to find information in addition.

Code directly recognitions on the basis of Euclidean distance
private void recognize() {
        //Подсчет характерных чисел (описан выше)
        generateRecognizeMembers();

        mResultPartImageMembers.clear();
        ArrayList<Double> keys = new ArrayList<>();

        //Составление результатов на основании расстояний
        RecognizeMember firstMember = mRecognizeMembers.get(0);
        mRecognizeMembers.remove(firstMember);
        for (RecognizeMember recognizeMember : mRecognizeMembers) {
            if (recognizeMember.getPretendent() != firstMember.getPretendent()) {
                double keyR = firstMember.equalsR(recognizeMember);
                recognizeMember.R = keyR;
                keys.add(keyR);
            }
        }

        //Сортировка полученных результатов
        Collections.sort(mRecognizeMembers, new Comparator<RecognizeMember>() {
            @Override
            public int compare(RecognizeMember lhs, RecognizeMember rhs) {
                return (int) Math.round(lhs.R - rhs.R);
            }
        });

        //Приведение результатов к виду ответа, учитывая повторяемость (все повторы включаются) и два первых соседа
        double firstKey = -1;
        double secondKey = -1;
        for (RecognizeMember member : mRecognizeMembers) {
            double key = member.R;
            if (firstKey == -1) {
                firstKey = key;
                mResultPartImageMembers.add(member.getPretendent());
            } else if (key == firstKey) {
                mResultPartImageMembers.add(member.getPretendent());
            } else if (secondKey == -1) {
                secondKey = key;
                mResultPartImageMembers.add(member.getPretendent());
            } else if (secondKey == key) {
                mResultPartImageMembers.add(member.getPretendent());
            }
        }
    }

<...>
//Вычисление евклидово расстояния для количества концевых точек в верхней и нижней(2) части изображения
public double equalsR(RecognizeMember o) {
        return Math.sqrt(Math.pow(this.endsCount - o.endsCount, 2)
                + Math.pow(this.endsCount2 - o.endsCount2, 2));
    }
<...>


Considering recurrence of words and an error, as a result it is possible to receive 2-3 closest neighbors among whom the key word will be found (in drawing are selected by red).

Example of implementation of methods of processing and image understanding on Android

Also in drawing it is visible that among red words there is a required word "dreams".

For improvement of quality of recognition it is possible to try to pick up other threshold of a binarization, and also to select other informative signs, having added to them, for example, probes.

Conclusion


Goals managed to be achieved, some methods of processing and image understanding were tested, at the same time their implementation under Android does not impose any additional difficulties, it is just necessary to consider an expense of memory and not to store at the same time several big Bitmap.

I hope, information will be useful to beginners the way in a solution of tasks of work with images under Android.

The description of all used algorithms, calculation of characteristic numbers, etc. can be found with ease in open access both to habrahabr, and in numerous textbooks and online resources.

Completely the project is available on GitHub

P.S.: The given implementation of algorithms not optimum, also demands further study and optimization, is only minimum necessary example for acquaintance, visualization and evaluation of the work of the above described methods on Android.

This article is a translation of the original post at habrahabr.ru/post/274323/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus