Picture description for Blind Users

Authors:
Zdenek Mikovec
(Using BCs thesis of Martin Klima and Dusan Pavlica)

Czech Technical University
Faculty of Electrical Engineering
Department of Computer science and Engineering

Prague December 1998

Abstract

The main goal of this project is to develop software tools for quick and easy picture "reading" for the blind. This tools can be divided into two parts: browsing tools for reading the picture and description editor tools for creating the description of the picture which is later read by the browsing tools.

Introduction

When analyzing problem blind user vs. Information Technologies (IT) we found out that development of IT at the end of 20th century brings two main changes for the blind. The first, positive one is huge electronic communication (Internet), which gave the blind ability to work and communicate with the rest of the world at nearly the same level as non-blind. The second negative change (for the blind) is more intensive use of graphical user interface, more pictures and visual effects. The problem of graphical information is very difficult presentation for the Blind and the problem of GUI is difficult orientation in it for the blind.
The motivation for me is to suppress these negative aspects of new IT and support positive changes in IT development for the blind.

The general goal of this project is to develop tools for quick and easy picture "reading" for the blind.

Figure 1: BIS (Blind Information System) - Subject Layer

This system of tools called Blind Information System (BIS - see Figure 1) is divided into three parts:

Problem specification

We want to give blind possibility to get important information hidden in the picture. This is why the picture we are focused on will be an information picture such as:

Our system will be designed for this kind of picture mentioned above and will not be suitable for describing virtual reality, description of art pictures

When designing our BIS system we are facing these problem areas:

Solution

When analyzing this problem at first we wanted to map activities in the world. We were focused on activities around blind users and their problems with access to information and problems with structuring information.

Then we started to define my own approach to this problem.

Solutions in the world

In general all approaches are not intensively focused on describing graphical information.

In their methodologies they are looking at the picture as one object which is described. That implies to describe subjective feeling form the picture and not the real information hidden in it.

Only technical methods of how to describe the picture are defined. For example ALT text in HTML pages, description links to pages with description of picture. These methods bring inconsistency to the final document, because picture and its description are in separate files. Creation and updates of these documents is more difficult.

There is no methodology how to create the good description of picture.

Advantages of this approach:

Disadvantages:

Our Solution - new methodology

Basic idea of this methodology is to give the blind ability of free movement inside the picture. So not to explain what should the blind see on the picture, but describe each part of the picture (objects) and relations between them and let the blind to create their own vision of the picture.

When creating the methodology we was focused on maximum easy use for the blind when browsing through the picture and easy description of the picture when creating the description for blind.

To fulfill this idea we choose object-oriented paradigm. That means the entire world consists of objects with their descriptions, these objects have behavior with influence on other objects.

So if we would like to describe any situation we need to describe objects, their descriptions and relations to other objects (see chapters below).

Objects classification

First of all we have to analyze what objects are around us and could be on the picture, how blind understand the world and objects. This is why we start to cooperate with national organization for Blind - SONS (Czech Blind United). The results of this consultation are these categories of objects:

Object description

The problem of describing object is that description creator will describe the object in different way and will include different characteristics then other one. To eliminate differences in object description and speed up describing we have defined standard for object description. We defined basic categories of information that describes objects, their behavior and relations. The browsing user than knows what characteristics of objects he can get and how to ask for them. For example the browsing user knows that he can ask for position of object by choosing characteristic with type "position".

The basic categories of description are:

Example: see Figure 6: Woman and Boy - Common view.
Boy
is running to door.elation 1: Object "Boy" is described by Action "is running too". This Action relates to object "door".

Between objects there are special relations which have to be described too:

Example: see Figure 6: Woman and Boy - Common view.
Object "Door" is in object "House".
Object "Head: is in object "Boy".

Object-oriented view on picture

Now we will demonstrate object-oriented approach by means of an Example - Woman and Boy (Appendix 1).

There are 4 hierarchical levels of objects. On the first (root level) there are objects: House, Boy, Tree and Cap. Hierarchical relations are represented by full arrowhead lines. Than you can see three other relations coupled with actions: running to, waving to, falling down of. These relations are represented by dotted arrowhead lines with numbers.

So the main information about the picture are objects and their hierarchical relation and most significant actions with relations.

Of course each object could have other description like position, color, etc. but these information are not crucial for understanding information value of most pictures (this conclusion was made after several tests with blind and non-blind users).

Information filtering

The problem of browsing through the picture is a lot of complex information that could be browsed.

If we want the blind to catch the whole information in the picture we must filter this information. We have prepared these methods of filtering:

Description filtering

While we have defined category of object description the browsing user can choose group of description categories he want to "see". Applying this filter will eliminate other description and objects that don't match chosen categories.

View

When analyzing creating picture description we found out that we could understand the picture from very different point of view, which means different object and their description.

So we defined two types of view:

Grammar

From the above problem description it is clear that graphical information has a very complex structure. A suitable means for handling such a complex structure is the use of grammar. We will use the formal grammar in the sense of theory of formal languages where grammar is defined as a quadruple G=(N,T,R,S).
This grammar is developed with idea to generate universal description language for representing any objects, their behavior and relations.
The description of object is defined by types (categories), which allows intelligent filtering.

picture ::= view+
view ::= object*
object ::= description*
description ::= description*

Picture defines the name of picture. View defines type of view. Object defines each object in the picture. Description describes information about object. Each element (picture, view,...) has several attributes.

picture

Example:
picture(idpicture="pictA" name="Woman and boy")

view

Example:
view(idview="viewAA" type="common" language="english")

object

Example:
object(idobject="objAA" name="house")
object(idobject="objAB" name="boy")

description

Example:
description(iddescription="descAB" type="action" value="running to" obj="objAF")
description(iddescription="descAE" type="color" value="black")

Implementation

Grammar for descriptions and object libraries

Both description documents and object libraries will be created following the same grammar (see below).
When implementing grammar we can define our own grammar or use defined international standard.
When defining structure of description document we wanted to use some common method that will be known for large group of people. This is why we choose XML (Extensible Markup Language) for defining structure of document. The main advantage of XML approach is (when comparing with other markup language such as HTML) is that XML defines language for defining language. This means that first we will define grammar of my language for describing pictures (DTD files) and then we can generate XML document that follows this grammar.
We were very carefully choosing programming language for development of BIS. The main reasons why we choose JAVA are:

Grammar definition in XML

<!ELEMENT picture (view+)>
<!ATTLIST picture
idpicture ID #REQUIRED
name CDATA #IMPLIED>

<!ELEMENT view (object+)>
<!ATTLIST view
idview ID #REQUIRED
type (common|special) #REQUIRED
name CDATA #IMPLIED
language CDATA #REQUIRED>

<!ELEMENT object (description*)>
<!ATTLIST object
idobject ID #REQUIRED
name CDATA #REQUIRED>

<!ELEMENT description (description*)>
<!ATTLIST description
iddescription ID #REQUIRED
type CDATA #REQUIRED
value CDATA #REQUIRED
obj IDREF #IMPLIED>

How the picture description in XML file looks like you can see in Appendix 1 - Picture description examples).

Basic library of methods

We have to design library of objects, which will implement basic methods for creating and browsing through picture description.
All objects (picture, view, object, description) will be stored into hierarchical tree corresponding to grammar.
The main functionality of this library is:

When programming the basic library of methods around the defined grammar we choose object-oriented programming tools. This is why objects are prepared for future extension and reuse. Object approach was very useful when coordinating three bachelor works together.

Functionality of this library you can see in Appendix 2 - Library of methods.
Detailed description of this library is in HTML document (http://cs.felk.cvut.cz/~xmikovec/BIS/BisLib/JavaDoc/tree.html).

Description Creator

Application Design

Figure 2: BIS Creator Interface

Example

Following instruction shows how to describe picture.

Figure 3: Creation of description (step by step)

Description Browser

Application Design

Figure 4a: BIS Browser Interface Figure

4b: Set Filter dialog window

The browser workspace is divided into three main regions.

Filtration of attributes

When a description is loaded, all attributes are implicitly enabled. All attributes may be switched off using menu Attributes -> Hide Attributes or switched on again using menu Attributes -> Show Attributes. Both the functions are reachable via shortcuts F6 & F5.
Current item is displayed in form:
item_name ( att1 att2 …..) when attributes displaying is on or
item_name ( ) when attributes displaying is off.

More precise filtration is available under menu Attributes -> Set Filter (F4) (see Figure 4b).
There are two list frames in the left half of this dialog. They include list of attribute types available for each object type. If the item is in the right list window, it is included to the types displayed. When in the left list window, it is excluded. Types of attributes differ according to the object type. Object type is selected via roll down menu.
The right half of the dialog window consists of two list box too. These maintain all available attribute values in the current description. So we can include or exclude some additional values to be displayed even if we don’t want the whole type to be visible.

Navigating in the views

The first thing we usually want to learn is the list of views in the description. A simple picture consists usually from only one view, more complicated of more than one view. Using menu View -> List or function key F7 we can quickly reach the view level of the hierarchy and than browse throw it using up & down arrow keys.

Next view may be easily reached throw menu View -> Next or shortcut F8.

Conclusion

Our approach is based on universal object-oriented paradigm that allows us to define one general methodology for describing any picture.
Describing picture as separate objects with description (using categories) helps to quickly define uniform description of picture.
Using tools for information filtering the browsing user can very quickly go through the picture and get the information message.
The XML architecture of grammar and description documents makes this methodology and implementation opened for any future modifications and extensions.
We have removed all disadvantages described in the chapter Solutions in the World.

Advantages of this approach

Disadvantages

Future strategies

In this chapter we want to mention several future strategies which we want to realize in the next school year 1998/1999 as a diploma work.

Reading tools of any structured information

Develop tools for "reading" of any information (picture, text, formulas, ...) in structured hierarchical way. These tools will make graphics UI much more accessible for blind people. It will improve the orientation on the screen, understandability of information displayed.

UI without visual contact

Preparing UI with no visual interface - no visual presentation and feedback. This UI must be object oriented and structured - there must be definition of objects, behavior and relations. For this UI we want to design grammar and use universal reading tools for navigating through this UI.

Authors

Zdenek Mikovec, xmikovec@fel.cvut.cz, http://cs.felk.cvut.cz/~xmikovec/bis

Martin Klima, xklima@hwlab.felk.cvut.cz

Dusan Pavlica, D.Pavlica@sh.cvut.cz

References

Blind centers

http://trace.wisc.edu

Blind Forum: http://www.humanware.com/Blindlinks.html

XML

SGML definition: http://www.sil.org/sgml/sgml.html

Online reference - implement. and develop.: http://www.lists.ic.ac.uk/hypermail/xml-dev/

DTD examples: http://www.sil.org/sgml/sgml.html, http://www.ucc.ie/cgi-bin/PUBLIC?-//IETF//DTD

Appendix 1 - Picture description examples

Example - Woman and boy

Figure 5

Common view description Special view description

Figure 6: Woman and Boy - Common view Figure Figure 7: Woman and Boy - Special view

XML description

<picture idpicture="pictA" name="Woman and boy">

<view idview="viewAA" type="common" language="english">
<object idobject="objAA" name="house">
<description iddescription="descAA" type="group" value="groupA"/>
</object>
<object idobject="objAB" name="boy">
<description iddescription="descAB" type="action" value="running to" obj="objAF"/>
<description iddescription="descAC" type="group" value="groupB"/>
</object>
<object idobject="objAC" name="tree">
<description iddescription="descAD" type="category" value="leafy"/>
</object>
<object idobject="objAD" name="cap">
<description iddescription="descAE" type="color" value="black"/>
<description iddescription="descAF" type="action" value="falling down from" obj="objAI"/>
<description iddescription="descAG" type="group" value="groupB"/>
</object>
<object idobject="objAE" name="window">
<description iddescription="descAH" type="hierarchical" value="is in" obj="objAA"/>
</object>
<object idobject="objAF" name="door">
<description iddescription="descAI" type="hierarchical" value="is in" obj="objAA"/>
</object>
<object idobject="objAG" name="window">
<description iddescription="descAJ" type="group of" value="6"/>
<description iddescription="descAK" type="hierarchical" value="is in" obj="objAA"/>
</object>
<object idobject="objAH" name="roof">
<description iddescription="descAL" type="color" value="gray"/>
<description iddescription="descAM" type="hierarchical" value="is in" obj="objAA"/>
</object>
<object idobject="objAI" name="head">
<description iddescription="descAN" type="hierarchical" value="is in" obj="objAB"/>
</object>
<object idobject="objAJ" name="body">
<description iddescription="descAO" type="hierarchical" value="is in" obj="objAB"/>
</object>
<object idobject="objAK" name="treetop">
<description iddescription="descAP" type="color" value="green"/>
<description iddescription="descAQ" type="hierarchical" value="is in" obj="objAC"/>
</object>
<object idobject="objAL" name="trunk">
<description iddescription="descAR" type="color" value="braun"/>
<description iddescription="descAS" type="hierarchical" value="is in" obj="objAC"/>
</object>
<object idobject="objAM" name="mother">
<description iddescription="descAT" type="hierarchical" value="is in" obj="objAE"/>
</object>
<object idobject="objAN" name="chimney">
<description iddescription="descAU" type="group of" value="2"/>
<description iddescription="descAV" type="hierarchical" value="is in" obj="objAH"/>
</object>
<object idobject="objAO" name="face">
<description iddescription="descAW" type="color" value="pink"/>
<description iddescription="descAX" type="hierarchical" value="is in" obj="objAM"/>
</object>
<object idobject="objAP" name="body">
<description iddescription="descAY" type="color" value="red"/>
<description iddescription="descAZ" type="hierarchical" value="is in" obj="objAM"/>
</object>
<object idobject="objAQ" name="hand">
<description iddescription="descAAA" type="color" value="pink"/>
<description iddescription="descAAB" type="hierarchical" value="is in" obj="objAM"/>
<description iddescription="descAF" type="action" value="waving to" obj="objAB"/>
</object>
</view>

</picture>

Appendix 2 - Library of methods

Figure 8