Authors:
Zdenek Mikovec
(Using BCs thesis of Martin Klima and Dusan Pavlica)
Czech Technical University
Faculty of Electrical Engineering
Department of Computer science and Engineering
Prague December 1998
The main goal of this project is to develop software tools for quick and easy picture "reading" for the blind. This tools can be divided into two parts: browsing tools for reading the picture and description editor tools for creating the description of the picture which is later read by the browsing tools.
When analyzing problem blind user vs. Information Technologies
(IT) we found out that development of IT at the end of 20th
century brings two main changes for the blind. The first,
positive one is huge electronic communication (Internet), which
gave the blind ability to work and communicate with the rest of
the world at nearly the same level as non-blind. The second
negative change (for the blind) is more intensive use of
graphical user interface, more pictures and visual effects. The
problem of graphical information is very difficult presentation
for the Blind and the problem of GUI is difficult orientation in
it for the blind.
The motivation for me is to suppress these negative aspects of
new IT and support positive changes in IT development for the
blind.
The general goal of this project is to develop tools for quick and easy picture "reading" for the blind.

Figure 1: BIS (Blind Information System) - Subject Layer
This system of tools called Blind Information System (BIS - see Figure 1) is divided into three parts:
We want to give blind possibility to get important information hidden in the picture. This is why the picture we are focused on will be an information picture such as:
Our system will be designed for this kind of picture mentioned above and will not be suitable for describing virtual reality, description of art pictures
When designing our BIS system we are facing these problem areas:
When analyzing this problem at first we wanted to map activities in the world. We were focused on activities around blind users and their problems with access to information and problems with structuring information.
Then we started to define my own approach to this problem.
In general all approaches are not intensively focused on describing graphical information.
In their methodologies they are looking at the picture as one object which is described. That implies to describe subjective feeling form the picture and not the real information hidden in it.
Only technical methods of how to describe the picture are defined. For example ALT text in HTML pages, description links to pages with description of picture. These methods bring inconsistency to the final document, because picture and its description are in separate files. Creation and updates of these documents is more difficult.
There is no methodology how to create the good description of picture.
Advantages of this approach:
Disadvantages:
Basic idea of this methodology is to give the blind ability of free movement inside the picture. So not to explain what should the blind see on the picture, but describe each part of the picture (objects) and relations between them and let the blind to create their own vision of the picture.
When creating the methodology we was focused on maximum easy use for the blind when browsing through the picture and easy description of the picture when creating the description for blind.
To fulfill this idea we choose object-oriented paradigm. That means the entire world consists of objects with their descriptions, these objects have behavior with influence on other objects.
So if we would like to describe any situation we need to describe objects, their descriptions and relations to other objects (see chapters below).
First of all we have to analyze what objects are around us and could be on the picture, how blind understand the world and objects. This is why we start to cooperate with national organization for Blind - SONS (Czech Blind United). The results of this consultation are these categories of objects:
The problem of describing object is that description creator will describe the object in different way and will include different characteristics then other one. To eliminate differences in object description and speed up describing we have defined standard for object description. We defined basic categories of information that describes objects, their behavior and relations. The browsing user than knows what characteristics of objects he can get and how to ask for them. For example the browsing user knows that he can ask for position of object by choosing characteristic with type "position".
The basic categories of description are:
Example: see Figure 6: Woman and Boy - Common view.
Boy is running to door.elation 1: Object "Boy" is described by Action "is running too". This Action relates to object "door".
Between objects there are special relations which have to be described too:
Example: see Figure 6: Woman and Boy - Common view.
Object "Door" is in object "House".
Object "Head: is in object "Boy".
Now we will demonstrate object-oriented approach by means of an Example - Woman and Boy (Appendix 1).
There are 4 hierarchical levels of objects. On the first (root level) there are objects: House, Boy, Tree and Cap. Hierarchical relations are represented by full arrowhead lines. Than you can see three other relations coupled with actions: running to, waving to, falling down of. These relations are represented by dotted arrowhead lines with numbers.
So the main information about the picture are objects and their hierarchical relation and most significant actions with relations.
Of course each object could have other description like position, color, etc. but these information are not crucial for understanding information value of most pictures (this conclusion was made after several tests with blind and non-blind users).
The problem of browsing through the picture is a lot of complex information that could be browsed.
If we want the blind to catch the whole information in the picture we must filter this information. We have prepared these methods of filtering:
While we have defined category of object description the browsing user can choose group of description categories he want to "see". Applying this filter will eliminate other description and objects that don't match chosen categories.
When analyzing creating picture description we found out that we could understand the picture from very different point of view, which means different object and their description.
So we defined two types of view:
From the above problem description it is clear that graphical
information has a very complex structure. A suitable means for
handling such a complex structure is the use of grammar. We will
use the formal grammar in the sense of theory of formal languages
where grammar is defined as a quadruple G=(N,T,R,S).
This grammar is developed with idea to generate universal
description language for representing any objects, their behavior
and relations.
The description of object is defined by types (categories), which
allows intelligent filtering.
picture ::= view+
view ::= object*
object ::= description*
description ::= description*
Picture defines the name of picture. View defines type of view. Object defines each object in the picture. Description describes information about object. Each element (picture, view,...) has several attributes.
picture
Example:
picture(idpicture="pictA" name="Woman and boy")
view
Example:
view(idview="viewAA" type="common" language="english")
object
Example:
object(idobject="objAA" name="house")
object(idobject="objAB" name="boy")
description
Example:
description(iddescription="descAB" type="action" value="running to" obj="objAF")
description(iddescription="descAE" type="color" value="black")
Both description documents and object libraries will be
created following the same grammar (see below).
When implementing grammar we can define our own grammar or use
defined international standard.
When defining structure of description document we wanted to use
some common method that will be known for large group of people.
This is why we choose XML (Extensible Markup Language) for
defining structure of document. The main advantage of XML
approach is (when comparing with other markup language such as
HTML) is that XML defines language for defining language. This
means that first we will define grammar of my language for
describing pictures (DTD files) and then we can generate XML
document that follows this grammar.
We were very carefully choosing programming language for
development of BIS. The main reasons why we choose JAVA are:
<!ELEMENT picture (view+)>
<!ATTLIST picture
idpicture ID #REQUIRED
name CDATA #IMPLIED>
<!ELEMENT view (object+)>
<!ATTLIST view
idview ID #REQUIRED
type (common|special) #REQUIRED
name CDATA #IMPLIED
language CDATA #REQUIRED>
<!ELEMENT object (description*)>
<!ATTLIST object
idobject ID #REQUIRED
name CDATA #REQUIRED>
<!ELEMENT description (description*)>
<!ATTLIST description
iddescription ID #REQUIRED
type CDATA #REQUIRED
value CDATA #REQUIRED
obj IDREF #IMPLIED>
How the picture description in XML file looks like you can see in Appendix 1 - Picture description examples).
We have to design library of objects, which will implement
basic methods for creating and browsing through picture
description.
All objects (picture, view, object, description) will be stored
into hierarchical tree corresponding to grammar.
The main functionality of this library is:
When programming the basic library of methods around the defined grammar we choose object-oriented programming tools. This is why objects are prepared for future extension and reuse. Object approach was very useful when coordinating three bachelor works together.
Functionality of this library you can see in Appendix 2 -
Library of methods.
Detailed description of this library is in HTML document
(http://cs.felk.cvut.cz/~xmikovec/BIS/BisLib/JavaDoc/tree.html).
![]() |
![]() |
![]() |
Figure 2: BIS Creator Interface
Following instruction shows how to describe picture.

Figure 3: Creation of description (step by step)
Figure 4a: BIS Browser Interface Figure

4b: Set Filter dialog window
The browser workspace is divided into three main regions.
When a description is loaded, all attributes are implicitly
enabled. All attributes may be switched off using menu Attributes
-> Hide Attributes or switched on again using menu Attributes
-> Show Attributes. Both the functions are reachable via
shortcuts F6 & F5.
Current item is displayed in form:
item_name ( att1 att2 …..) when attributes displaying is
on or
item_name ( ) when attributes displaying is off.
More precise filtration is available under menu Attributes
-> Set Filter (F4) (see Figure 4b).
There are two list frames in the left half of this dialog. They
include list of attribute types available for each object type.
If the item is in the right list window, it is included to the
types displayed. When in the left list window, it is excluded.
Types of attributes differ according to the object type. Object
type is selected via roll down menu.
The right half of the dialog window consists of two list box too.
These maintain all available attribute values in the current
description. So we can include or exclude some additional values
to be displayed even if we don’t want the whole type to be
visible.
The first thing we usually want to learn is the list of views in the description. A simple picture consists usually from only one view, more complicated of more than one view. Using menu View -> List or function key F7 we can quickly reach the view level of the hierarchy and than browse throw it using up & down arrow keys.
Next view may be easily reached throw menu View -> Next or shortcut F8.
Our approach is based on universal object-oriented paradigm
that allows us to define one general methodology for describing
any picture.
Describing picture as separate objects with description (using
categories) helps to quickly define uniform description of
picture.
Using tools for information filtering the browsing user can very
quickly go through the picture and get the information message.
The XML architecture of grammar and description documents makes
this methodology and implementation opened for any future
modifications and extensions.
We have removed all disadvantages described in the chapter
Solutions in the World.
In this chapter we want to mention several future strategies which we want to realize in the next school year 1998/1999 as a diploma work.
Develop tools for "reading" of any information (picture, text, formulas, ...) in structured hierarchical way. These tools will make graphics UI much more accessible for blind people. It will improve the orientation on the screen, understandability of information displayed.
Preparing UI with no visual interface - no visual presentation and feedback. This UI must be object oriented and structured - there must be definition of objects, behavior and relations. For this UI we want to design grammar and use universal reading tools for navigating through this UI.
Zdenek Mikovec, xmikovec@fel.cvut.cz, http://cs.felk.cvut.cz/~xmikovec/bis
Martin Klima, xklima@hwlab.felk.cvut.cz
Dusan Pavlica, D.Pavlica@sh.cvut.cz
Blind Forum: http://www.humanware.com/Blindlinks.html
SGML definition: http://www.sil.org/sgml/sgml.html
Online reference - implement. and develop.: http://www.lists.ic.ac.uk/hypermail/xml-dev/
DTD examples: http://www.sil.org/sgml/sgml.html, http://www.ucc.ie/cgi-bin/PUBLIC?-//IETF//DTD

Figure 5
Common view description Special view description
![]() |
![]() |
| Figure 6: Woman and Boy - Common view Figure | Figure 7: Woman and Boy - Special view |
<picture idpicture="pictA" name="Woman and boy">
</picture>

Figure 8