- 1 -
中国科技论文在线
A Light Eyetracking System for Webpage Designing#
Chen Daiwu, Zhang Honggang, Guo Jie, Zhang Nannan
**
(Beijing University of Posts and Telecommunications, Beijing, 100876) 5
Foundations: National Natural Science Foundation of China under Grant (, 61175011 and
61171193);the 111 project under Grant (),the Fundamental Research Funds for the Central
Universities" Grant(No.~2013XZ11),EU FP7 IRSES Mobile Cloud Project Grant ( )
Brief author introduction:Chen Daiwu(1989-),Male,Beijing Key Laboratory of Pattern Recognition & Intelligent
System
Correspondance author: Zhang Honggang(1974-),Male,Associate Professor,Image Processing. E
Abstract: Eye-tracking technology has always being considered as one of the most important
methods in various fields concerning human's behaviour and cognition. This paper targets on
designing and implementing a cheap real-time eye-tracking system mainly depend on a
web-camera and software processing for rising need of webpage designing. We will introduce new
tracking pattern, new eye features sets and SVM(Support Vector Machine)in this system. And 10
experiments are taken based on proposed method with satisfied precision.
Key words: Pattern Recognition and Intelligient System; eye-tracking; fixation area; dependence
on software; SVM
0 Introduction 15
The earliest eye-tracking research was conducted completely through human's observation
[1]
,
and with the endeavor of many outstanding researchers from different fields which consists of
computer science and somatology etc., combining the growing need from multiple industries in
the last decade, now this technology mostly depend on equipment specially designed and accurate
computation by computer. 20
Based on massive theories accumulated, along with the booming computation capacity of
computer, we can perform real-time eye-tracking with good accuracy practically. But there is one
problem that those systems are highly hardware dependent, eg. infrared equipment, some of those
approaches are even body invasive, eg. using a heavy headset or stretching eyelids with little
gadgets, which means much more discomfort for users. However, to crack these shortcomings, 25
researchers from a company named GazeHawk acquired by Facebook in 2012, it did a great job by
using a web-camera to do eye-tracking. Now, eye-tracking is mainly used in labs and institutions,
and to help the limb handicapped, or to remind the sleepy driver etc. as well. We can also see
some applications on mobile devices, such as Galaxy S4(smart phone produced by Sumsung).
From the foreseeable usefulness of Eye-tracking technology, we can know the big the contribution 30
it will make to academic researches and our daily life. As we mentioned earlier, eye-tracking
equipment specially made for user study in the market are very expensive and complicated, which
does results in high accuracy. IT SMEs(small and medium-sized enterprises) in IT industries
grows rapidly, many want to reconstruct their products' graphical layout to provide better user
experiences and improve their business performances, while those equipment might be too 35
expensive if they want to conduct user study by themselves.
In this paper we will try to build a very cheap real time eye-tracking system based on a
personal computer with a web-camera to estimate the user's fixation area on screen, as shown in
, it might not have the best tracking precision, but effective enough for ordinary user study
case in which user faces the screen directly. 40
- 2 -
中国科技论文在线
Tracking the fixation area
The rest of this paper is organized as follows: Section 1 introduces popular methods for target
detection and specifies the physiological eye features we select. Section 2 gives detailed 45
explanation about the structure of this eye-tracking system and how it works. In section 3, we
analyzed the performance of this system. Finally, a summary is provided in section 4.
1 Related Work
Face and Eye Detection
Two popular approaches for face detection are eigenface and adaboost based on haar feature. 50
Eigenface method was firstly worked out by Sirovich and Kirby in 1987[2] and was applied to face
detection by M. Turk in 1991[3], it is time efficient but sensitive to illumination or view angle,
works well when recognising the frontal faces in fixed illumination.
AdaBoost algorithm was introduced by Freund and Schapire in 1997[4], then was applied to
target detection and improved by many researchers, eg. P. Viola and M. Jones combined it with 55
haar features to do face detection, is one of the most popular methods that are fast and fairly
illumination endurable. And we choose to utilize the one optimized by Viola, P., Jones, M. in
2001[5], which is available in OpenCV library.
Well, new method combining hough transform was introduced by Rahayfeh to perform real
time eye-tracking on PC[6]. A novel approach using particle filters was realized by Campos in 60
2013[7].
Also we can see some new applications in internet products, one excellent system created by
Guangyu Piao was applied to social media[8] and one application on mobile devices[9] is quite
heuristic too.
Features of Eye Motion 65
From abundant researches, we know that the fixation of human eye is the point where the
axis of eyeball and the surface being gazed at intersect, and the fixations are not continuous as we
think it should be when we looking at something smoothly, it jumps from one point to another
with milliseconds gap
[10]
. After looking into those experimental results, and observing video clips
closely in our user study cases, we create several physiological features which support two main 70
traits we use in this paper.
More details will be included in section 2.
2 Eye-tracking System for Webpage Designing
To reduce the money cost, we aim to use as less hardware as possible. So our approach only
has two hardwares: a web-camera used for attaining raw images and a computer used for image 75
processing, so that even a laptop can run our system. This approach also bring no discomfort to the
user, that means the testee wouldn't even perceive that their eyes are being tracked during user
study, which clears out the experimental situation force, generates more realistic data sets.
- 3 -
中国科技论文在线
We do the rough visual line or the fixation estimation purely based on the image processing,
which enables us to get rid of other hardwares like infrared camera. We programmed this system 80
with OpenCV library in C++, and got a really fast and robust system.
Firstly, we have the image from a camera, then pre-process it with scaling, graying, find the
face part sub-image, find the 2 eye sub-images, extract features, classify features with SVM to
estimate the gazing area on the screen at last. A classification model based on a sample collection
over 20, 000 frames was trained ahead with SVM. 85
System framework is shown in Fig. 2.
System framework
Pre-processing 90
To reduce the detection time of face and eyes, we resize the original frame from 320*240 to
80*60 right after the graying operation, we do the detection of face and eye based on haar feature,
so appropriate image contrast would be helpful for detection process later, which is done by
histogram equalization.
Target Detection 95
The features we're trying to extract are contained in the image of eye, so find the eye parts in
the raw frames would be very important to this system. Users mostly face the screen directly,
which makes face detection much easier for us because only frontal face need to be find out. We
cut the face part out from that original frame in accordance with face detection results, ex. position
and size, then do eyes detection on the sub-image, again we cut the eye parts out from that original 100
frame, we have 2 sub-images of eyes, then we do pupil center detection on these 2 images. To
make system robust and minimize the mis-detection, we do eyes detection only when one face is
found in the raw frame, and extract features only when two eyes are detected in that frame. Now
we have 2 images of user eyes after targets detection part. Results shown in Fig. 3.
105
a: Raw frames, b: preprocessed, c: detection results
- 4 -
中国科技论文在线
Feature Extraction
We describe the 2 main features from physiological aspect. Firstly, muscle structure around
the eye gives us 3 hardly changed reference clues, they are down eye lip, outer and inside eye 110
corners, which stay almost still even when eyeball moves fast and widely
[11]
. Furthermore, we
found out the eyeball's position, more specifically, the pupil's center is directly relevant to any
kind of eye motions, shown in Fig. 4. And the upper eyelid takes the second place, its position
changes slightly when fixation moves horizontally within a normal range and varies a lot when
fixation moves vertically, shown in Fig. 5. After a steady explanation over those information, we 115
figured out to estimate the fixation on 2 dimension plain by extract the information between the
moving parts and 3 referencing parts. The distance between upper and down eyelids denotes how
wide the eye open which could be used to describe the y coordinate of the fixation point, the
horizontal distances between the pupil and inside/outer eye corners give the information about the
x coordinate of the fixation. 120
Critical points of human eyes
h varies when fixation moves vertically 125
We choose 7 features to extract.
Area of eye part between upper and down eyelids, Se;
Area of the left white part of eyeball, Swl;
Area of the right white part of eyeball, Swr; 130
Height of pupil part in sight, Hp;
Horizontal distance between outer eye corner and pupil center, D1;
Horizontal distance between inner eye corner and pupil center, D2;
Horizontal distance between outer eye corner and inner eye corner, D3;
The features listed above are mainly extracted by simple approaches to ensure the real-time 135
performance, which was verified effective in our experiment.
Firstly we have to locate the center of pupil, we attained the threshold image of eye part first,
- 5 -
中国科技论文在线
then do integral projection on the horizontal direction to locate 2 horizontal boundaries of pupil,
and do projection on the vertical direction within the boundaries we confirmed in last step to
locate 2 vertical boundaries of pupil, with 4 boundaries we have now, pupil center is located. 140
Shown in Fig. 6.
Locate pupil center
Thresholding the gray image with a threshold, contouring the eye part, then complete the 145
unconnected contour with linear line. Shown in Fig. 7. Knowing the pupil center’s position and
the eye part’s contour, we can extract 7 features that are listed above all within the contour easily.
Contouring eye to extract feature
150
To eliminate the scale problem caused by the changing distance between user head and
camera, 5 features we use here are: Sp/Se, Sw1/Se, Sw2/Se, D1/D2, Hp/D3. And to decrease the
interference caused by varying illumination, we extract features from 3 different threshold images,
which means we have 15(3*5) feature values for left eye and right eye separately, so we have a 30
dimension feature vector to input into SVM. 155
Classification by SVM
One static prediction of our experiments made by trained model has an accuracy around
percent, which is acceptable in user study case during real-time prediction. The cross validation
accuracy is over 85%, shown in Fig. 8.
160
One cross validation results
- 6 -
中国科技论文在线
3 Results
We use LIBSVM provided by Chin-Jen Lin here to do training and prediction, and the
integrated tool ‘easy’ to do the cross validation. We do normalization before input the data set into
SVM. Then we trained the model with over 10000 frames that contains 1200+ frames when user 165
gazing at each of the 9 areas’ centers on screen. The whole system was implemented on a laptop
with a T6600 CPU, 2Gb RAM and Windows 7 OS.
Time Consumption
Face detection on a 320*240 image usually costs 20ms-50ms, if we choose not to cut the face
image properly, for eyes detection uses information about nose and eyebrow, will cost 200ms at 170
most, which disables the realtime trait of this system. So we keep the nose, the eye brow parts to
shorten eyes detection time in 20ms-50ms successfully. Pupil detection and feature extraction time
of 2 eyes is less than 10ms due to the small size of eye part images(typically 40*20 pixels) and the
simplicity of the eye-motion features we selected. The one time load of svm model file cost about
2s-3s, while it needs only 5ms to finish one time prediction. Overall, our system can process one 175
frame averagely in 50ms, which generates good real-time performance.
Feature Input
We tried a lot to optimize the eye motion features we used in this system. We introduced
SVM in this system after the fail trying to classify fixation area by experiential thresholds.
At the beginning, we represented these features with a 8 dimensions vector, the classification 180
performance by SVM is not that good, shown in Fig. 9.
Results of simple representation of features
185
We thought there might be some interference between features of 2 eyes, so we tried the
features attained from the left eye separately, a 4 dimensions vector set, but the outcome was
worse, shown in Fig. 10.
- 7 -
中国科技论文在线
Results of left eye features only 190
So we utilized features of both eyes, and increased the dimension of feature vector with a
strategy: one feature with several descriptions. After some experiments, we kept the 30
dimensions vector introduced in section 3 to represent the eyes’ motion feature, and the
performance is quite good, as shown in . 195
4 Conclusions
In this paper, we proposed a light real-time eyetracking system which could estimate user’s
fixation area on a computer screen by utilizing the features extracted purely from raw web-camera
frames, which enables cheap user study to meet small organisations’ webpage designing needs.
Future work in need to improve the precision by exploring more effective features and strengthen 200
the robustness by eliminating the interference caused by illumination, glasses, etc.
Acknowledgements
This work was partially supported by National Natural Science Foundation of China under
Grant , 61175011 and 61171193, the 111 project under Grant , the
Fundamental Research Funds for the Central Universities" No.~2013XZ11, and EU FP7 IRSES 205
Mobile Cloud Project (Grant ).
References
[1] Ilg U J, Churan J, Schumann S. The Physiological Basis for Visual Motion Perception and Visually Guided
Eye Movements[J]. The Primate Visual System: A Comparative Approach, 2005: 285-310. 210
[2] Sirovich L, Kirby M. Low-dimensional procedure for the characterization of human faces[J]. JOSA A, 1987,
4(3): 519-524.
[3] Freund Y, Schapire R E. A desicion-theoretic generalization of on-line learning an an application to
boosting[A].Computational learning theory[C]. Springer Berlin Heidelberg,-37.
[4] Turk M A, Pentland A P. Face recognition using eigenfaces[A].Computer Vision and Pattern Recognition, 215
1991. Proceedings CVPR'91., IEEE Computer Society Conference on[C]. Maui, HI:IEEE, -591.
[5] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[A].Computer Vision and
Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. IEEE[C].
IEEE,-518.
[6] Murakami I. Eye movements during fixation as velocity noise in minimum motion detection[J]. Japanese 220
Psychological Research, 2010, 52(2): 54-66.
[7] Al-Rahayfeh A, Faezipour M. Enhanced frame rate for real-time eye tracking using circular hough
transform[A].Systems, Applications and Technology Conference (LISAT), 2013 IEEE Long Island[C]. Enhanced
frame rate for real-time eye tracking using circular hough transform:IEEE, -6.
[8] Campos R, Santos C, Sequeira J. Eye tracking system using particle filters[A].Bioengineering (ENBENG), 225
- 8 -
中国科技论文在线
2013 IEEE 3rd Portuguese Meeting in[A].Braga: IEEE, -4.
[9] Piao G, Jin Q, Zhou X, et al. Eye-Tracking Experiment Design for Extraction of Viewing Patterns in Social
Media[A].Ubiquitous Intelligence and Computing, 2013 IEEE 10th International Conference on and 10th
International Conference on Autonomic and Trusted Computing (UIC/ATC)[C].Eye-Tracking Experiment Design
for Extraction of Viewing Patterns in Social Media: IEEE, 2013. 308-313. 230
[10] Kunze K, Kawaichi H, Yoshimura K, et al. The Wordometer--Estimating the Number of Words Read Using
Document Image Retrieval and Mobile Eye Tracking[A].Document Analysis and Recognition (ICDAR), 2013
12th International Conference on[C]. Washington, DC:IEEE, 2013. 25-29.
235
适用于网页设计的轻量眼动跟踪系统
陈代武,张洪刚,郭杰,张囡囡
(信息与通信工程学院,北京邮电大学 100876)
摘要:在人类行为学和认知学相关的众多领域内,眼动跟踪技术一直被视为最为重要的研究240
手段。本文着力设计并实现一个基于网络摄像头和图像处理技术的轻量眼动跟踪系统,系统
针对互联网网页设计,具有廉价的特点。文中会介绍系统采用的新的跟踪模式、新的眼部特
征和 SVM(支持向量机)的应用。依据设计所实现的系统在实验中表现出了不错的精度。
关键词:模式识别与智能系统;眼动跟踪;注视点;软件依赖;SVM
中图分类号:TP399 245