Lupashevskyi Vladyslav

German Technical Faculty
Computer Engineering Department (CE)
Specialty «Computer Systems and Networks»

Hand gesture recognition based on segmentation methods

Supervisor: Ph.D., prof. Anoprienko Alexander

Abstract

Contents

  1. Introduction
  2. Main tasks and planned results
  3. Researches and developments review
  4. Skin color method
  5. Depth method
  6. Histogram of Oriented Gradients method
  7. Conclusion
  8. References

Introduction

Currently an urgent problem in the field of information technologies and robotics is the problem of human-computer interaction without using special peripherals such as keyboard, mouse, and other. The desire to organize human-computer interaction through such familiar for people methods, as speech, gestures and vision, is one of the major tasks in the development of modern computer technology. Vision plays a key role, since it is known that a person gets 80-90% of information about the world using vision. One of the most urgent tasks in the field of machine vision is face and human hands recognition.
Technologies of computer vision started to develop in the 60s of the last century, and in the 70 years began to appear the first fundamental work in this area, in which computer vision was seen as an integral part of the artificial intelligence systems [1, 2]. At the beginning of the new millennium, the problem remained largely unsolved, but considerable progress in this area is reflected in a number of new fundamental works [3, 4]. At the same time gradually formed an understanding of computer vision as the most common technologies of computer perception of visual information, and machine vision - a specialized technology, focused on the use in specific industrial processes. For example, to reduce the percentage of the output of defective parts in the production is the main task to control the assembly sequence of the employee. Studies have shown that during performing the monotonous, repetitive sequences of actions, a person can make unconscious errors. While using the system of assembly process controlling and immediate notification of errors we can reduce the percentage of release of defective parts at least 50%. Often the assembly process is done by hand, so the system should identify their location in real time with the help of cameras.
This problem still is classified as very trivial, since the shape of the hands can be very varied, hands can be partially blocked by other objects, have a different articulation of the fingers, etc. The solution to this problem on the basis of modern technologies will provide a full hand gestures detector focused mainly on the use of controlling the sequence of operations during assembly in manufacturing. With sufficient flexibility such a detector can be useful in recognizing the sign language for people with disabilities and other areas.
The rapid progress in computer technology allows us hoping for a significant new results in this field [5-8]. Obtained in the test results can be used in the process of supersensor computing concepts [9-10] and augmented reality [11], and - in all sorts of simulator systems, including those developed in DonNTU [12-14].

Main tasks and planned results

The main objective of the master's work is to develop software that can identify a human hand on the image in real time and provide information on the status of the center of the palm. Just want to note that the main condition is to obtain as much as possible stable coordinates of the center of the palm. In addition, if I will have enough time, it is assumed to detect fingers and to provide this information, together with information on the status of the center of the palm.
It also assumes to implement all functions in a separate module. Thus at the module input will be raw image, and the output will provide with information on the human hands position on the original image. This technique allows you to use these functions in any application, regardless of its architecture and complexity.
Thus for the realization of these goals it is necessary, first, to examine the existing algorithms for the identification of human hands in the image, as well as algorithms to track moving objects. Next, each of these test algorithms and highlight its advantages and disadvantages, and then, if possible, a combination of several algorithms to try to compensate disadvantages of each other. In addition, it is necessary to test each algorithm performance and stability while real time operating.
Then it is needed to develop software, which is a set of functions in a separate library. Also, to show the results it is needed to develop software that will reveal the full functionality of the developed algorithm.

Researches and developments review

Currently, there are a large number of researches in the field of objects recognition using machine vision technology. Most large amount of information on these studies can be found on international resources of the Internet.
«Real-Time Hand Gesture Recognition Using Finger Segmentation» [31] work was reviewed. In this work, a novel real-time method for hand gesture recognition was proposed. In this framework, the hand region is extracted from the background with the background subtraction method. Then, the palm and fingers are segmented so as to detect and recognize the fingers. Finally, a rule classifier is applied to predict the labels of hand gestures. The experiments on the data set of 1300 images show that this method performs well and is highly efficient. Moreover, the method shows better performance than a state-of-art method on another data set of hand gestures. Example of hand identification and gesture recognition is shown on Figure 1.
Figure 1. Hand identification and gesture recognition based on background subtraction method [31]

A system of hand gesture recognition using Microsoft Kinect sensor was developed in the work «Robust Hand Gesture Recognition Based on Finger-Earth Mover’s Distance with a Commodity Depth Camera» [30]. To handle the noisy hand shape obtained from the Kinect sensor, it is proposed a novel distance metric for hand dissimilarity measure, called Finger-Earth Mover’s Distance (FEMD). As it only matches fingers while not the whole hand shape, it can better distinguish hand gestures of slight differences. The extensive experiments demonstrate the accuracy, efficiency, and robustness of this hand gesture recognition system. Figure 2 represents the basic essence of the algorithm.
Figure 2. FEMD algorithm [30]

Authors of the work «A New Framework for Sign Language Recognition based on 3D Handshape Identification and Linguistic Modeling» [29] use absolutely different approach from approaches mentioned above to recognise hand gesture. They use a hand 3D model. This methodology will provide effective results, which are less dependent on the background, as well as overlapping with other objects, in addition to this it is increased efficiency of hand tracking. Also a 3D - model provides information that can be used, such as sign language recognition. Processing result is shown in Figure 3. The accuracy of the algorithm is about 80-85%.
Figure 3. Identifying the hands on the image and obtaining its 3D - models [29]

Three methods of hand gesture recognition using machine vision technologies are represented below.

Skin color method

The essence of this method lies in extracting the fragments from the raw image which color is within the color of human skin [24]. In HSV model values of human skin color are 0.05-0.17 for Hue, 0.1-0.3 for Saturation and 0.09-0.15 for Value [25].
Firstly, a raw image is converted into the HSV model. Then a V axis is projected onto a HS space. After all, EM-algorithm is used to separate a gaussians mixture. Gaussians which color is not in range of a skin color are removed fro, the image. The resulting image is filtered from the noise and the we obtain an image consisting of the areas of the face and hands of the person.
K-means algorithm is used for clustering the areas, obtained from previous stage. It is assumed that the face area is much greater than the size of the hands area and it is allows to filter the the hands area for obtaining only segments of hands.
However, this simple methods has a large number of disadvantages. It can not operate properly, while the hand is intersected with the face or with another hand. Also it some problems with recognition could happen when the background is very complex, especially when it has the same color as human skin.

Depth method

Representatives of this type of cameras are Microsoft Kinect, Leap Motion, Creative Depth Camera. These cameras receive data using infrared sensors and return a monochrome image, each pixel of which is the distance from the camera to the subject, which reflects infrared rays.
The main condition for this method is the assumption that the hand will be the closest object to the camera. It is easy to assume that for determining the hands contour it is necessary to use methods for finding pixels having the largest value, considering a certain threshold. Then, because the hand has unique geometric shapes, there is a comparison of the resulting image with a set of hand contours. After that, we are getting the hands area on the original image.
The obvious disadvantage of this method is that it is necessary hand to be the closest object to the camera.

Histogram of Oriented Gradients method

The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy.
The essential thought behind the histogram of oriented gradients descriptor is that local object appearance and shape within an image can be described by the distribution of intensity gradients or edge directions. The image is divided into small connected regions called cells, and for the pixels within each cell, a histogram of gradient directions is compiled. The descriptor is the concatenation of these histograms. For improved accuracy, the local histograms can be contrast-normalized by calculating a measure of the intensity across a larger region of the image, called a block, and then using this value to normalize all cells within the block. This normalization results in better invariance to changes in illumination and shadowing.
The main idea of this method is to develop an artificial neural network that will learn a lot of hand templates, which represent a variety of the cases how the can could be located on the image.
After training, the neural network will be ready to recognize human hands on the static image.
The main disadvantage of this method is its low speed, that makes implementing it in real-time systems impossible.

Conclusion

The main goals and objectives for this work were set. In addition, a lot of work related to the theme of the work as well as three methods for the human hands recognition on the images were reviewed. Each method has its advantages and disadvantages. In the future, it is planned a more detailed study of each method, as well as looking for other ways to identify hand gestures on the images. In addition, it is planned to investigate the possibilities of combination of several methods to compensate their disadvantages for obtaining optimal results.

References

1. Хант Э. Искусственный интеллект. – М.: Мир, 1978. 558 с
2. Дуда Р., Харт П. Распознавание образов и анализ сцен. – М.: Мир, 1976. 512 с.
3. Форсайт Д., Понс Ж. Компьютерное зрение. Современный подход. – М.: Вильямс, 2004. 928 с.
4. Шапиро Л., Стокман Дж. Компьютерное зрение. – М.: БИНОМ. Лаборатория знаний, 2006. – 752 с.
5. Аноприенко А.Я. Периодическая система развития компьютерных систем и перспективы нанокомпьютеризации // Инновационные перспективы Донбасса: Материалы международной научно-практической конференции. Донецк, 20-22 мая 2015 г. Том 5. Компьютерные науки и технологии. – Донецк: Донецкий национальный технический университет, 2015. С. 5-13.
6. Аноприенко А.Я. Системодинамика ноотехносферы: основные закономерности // «Системный анализ в науках о природе и обществе». – Донецк: ДонНТУ, 2014, №1(6)-2(7). С. 11-29.
7. Аноприенко О.Я., Варзар Р.Л., Иваница С.В. Закономерности развития аналого-цифровых преобразователей и перспективы использования постбинарного кодирования // Научные труды Донецкого национального технического университета. Серия: «Информатика, ки¬бернетика и вычислительная техника» (ИКВТ-2014). Выпуск 1 (19). – Донецк: ДонНТУ, 2014. С. 5-10.
8. Аноприенко А.Я. Модели эволюции компьютерных систем и средств компьютерного моделирования // Материалы пятой международной научно-технической конференции «Моделирование и ком¬пьютерная графика» 24-27 сентября 2013 года, Донецк, ДонНТУ, 2013. C. 403-423.
9. Аноприенко А.Я., Варзар Р.Л. Разработка прототипа суперсенсорного компьютера: особенности реализации и визуализации результатов измерений // Материалы пятой международной научно-технической конференции «Моделирование и ком¬пьютерная графика» 24-27 сентября 2013 года, Донецк, ДонНТУ, 2013. C. 218-229.
10. Варзар Р.Л., Аноприенко А.Я. Суперсенсорный компьютер для измерения и анализа параметров окружающей среды // Информатика и компьютерные технологии / Сборник трудов VIII международной научно-технической конференции 18-19 сентября 2012 г., Донецк, ДонНТУ. – 2012. В 2-х томах. Т. 2. С. 156-161.
11. Дуденко М.В., Аноприенко А.Я. Расширенная реальность // Материалы III международной научно-технической конференции «Информатика и компьютерные технологии – 2007», 11-13 декабря 2007 года, Донецк, ДонНТУ, 2007. С. С. 106-109.
12. Бабенко Е.В., Аноприенко А.Я. Организация модульного интерактивного приложения для трехмерного моделирования угольных шахт // Мате¬риалы III всеукраинской научно-технической конференции «Информационные управляющие системы и компьютерный мониторинг (ИУС и КМ 2012)» – 17-18 ап¬реля 2012 г., Донецк, ДонНТУ, 2012. С. 680-684.
13. Аноприенко А.Я., Забровский С.В., Каневский А.Д. Опыт реинжиниринга системы моделирования сложных технологических процессов // Научные труды Донецкого национального технического университета. Выпуск 20. Серия «Вычислительная техника и автоматизация». – Донецк, ДонГТУ, 2000. С. 139-148.
14. Аноприенко А.Я., Забровский С.В., Потапенко В.А. Современные тенденции развития тренажерных систем и их модельного обеспечения // «Прогрессивные технологии и системы машиностроения»: Международный сборник научных трудов. Вып. 10. – Донецк: ДонГТУ, 2000, с. 3-7.
15. Аноприенко А.Я., Кривошеев С.В., Приходько Т.А. Тетракоды в кодировании и распознавании образов // Сборник научных трудов ДонГТУ. Серия «Информатика, кибернетика и вычислительная техника». Выпуск 1 (ИКВТ-97). – Донецк: ДонГТУ. – 1997. С. 99-104.
16. Федяев О.И., Бондаренко И.Ю. Нечёткое сопоставление образов с оптимальным временным выравниванием для однодикторного и многодикторного распознавания изолированных слов // Научные труды Донецкого национального технического университета, серия «Информатика, кибернетика и вычислительная техника», вып. 8 (120), Донецк, ДонНТУ, 2007. – С.273-281.
17. Алфимцев А.Н. Современные тенденции принятия управляющих решений на основе распознавания жестов // Информационные технологии и системы: Сб. трудов Всерос. конф.- М., 2007. – С. 152- 157.
18. Девятков В.В., Алфимцев А.Н. Распознавание динамических жестов // Применение теории динамических систем в приоритетных направлениях науки и техники: Сб. трудов Всерос. конф.- Ижевск, 2007. – С. 15-23.
19. Девятков В.В., Алфимцев А.Н. Распознавание манипулятивных жестов // Вестник МГТУ им. Н.Э.Баумана. Сер. Приборостроение. – 2007. Т. 68, № 3. - С.56-75.
20. Болотова Ю.А., Федотова Л.C., Спицын В.Г. Алгоритм детектирования областей лиц и рук на изображении на основе метода Виолы-Джонса и алгоритма цветовой сегментации // Фундаментальные исследования. – 2014. – № 11-10. – С. 2130-2134.
21. Куракин А. В. Распознавание жестов ладони в реальном времени на основе плоских и пространственных скелетных моделей // Информатика и ее применения. 2012. Т. 6, № 1. С. 114-121.
22. Kurakin A., Zhang Z., Liu Z. A Real Time System for Dynamic Hand Gesture Recognition with a Depth Sensor // EUSIPCO-2012: Proceedings of the 20th European Signal Processing Conference. 2012. P. 1975-1979.
23. Нагапетян В.Э. Обнаружение пальцев руки в дальностных изображениях // Искусственный интеллект и принятие решений, №1, 2012. — С. 90-95.
24. Нюнькин К.М. Использование цвета при распознавании жестов // «Искусственный интеллект», 2002, №4. С. 503-511.
25. Хомяков М.Ю. Классификация цвета кожи человека на цветных изображениях // Компьютерная оптика, 2011, том 35, №3. С.373-379.
26. Глушко Ю.Э., Бабков В.С. Оценка возможности применения платформы Microsoft Kinect в составе виртуальных тренажеров // Информационные управляющие системы и компьютерный мониторинг. - Донецк: ДонНТУ, 2012. - С. 368 - 372
27. Бабков В.С., Соболев Е.Г. Разработка подсистемы интерактивного взаимодействия в составе тренажерной системы с использованием платформы Microsoft Kinect // Информационные управляющие системы и компьютерный мониторинг. - Донецк: ДонНТУ, 2012. - С. 353 - 357.
28. Пеньков А.С., Бабков В.С. Анализ методов распознавания жестов руки с использованием камеры глубины // Информационные управляющие системы и компьютерный мониторинг. – Донецк: ДонНТУ, 2013. - С. 334 - 337.
Copyright © 2016 Lupashevskyi Vladyslav
All rights are reserved