Language Technologies - Key to Preserving one's Own Identity

by Miro Romih, Director

With the integration of Slovenia with Europe and the globalization of the world communication network, language technologies are a field which becomes ever more important daily. Large corpuses of text, electronic dictionaries, programs for language translation support, systems for voice recognition and generation, and different programs for detecting grammatical errors in texts are fields that are flourishing with the growth of computers' speed and power. Amebis decided some years ago to get involved in the development of that field of science, and keep in step with the world with cooperation with other groups in Slovenia and abroad.
Amebis d.o.o. was established in the middle of 1991. The first major project was the BesAna (word analysis) program. It was the first Slovene grammatical checking program, which detected errors in Slovene texts, written by computer. The expansion of the program was relatively fast and, at the same time, it was an important bridge between the company and users. The need for a smaller, easier version of the program soon became apparent. This house version was called micro-BesAna (micro word analysis). It detected only spelling errors. Both micro-BesAna and BesAna were DOS programs. At that time, the share of the Windows (3.1) operating system on the market was fast increasing. The first serious editors appeared in Windows, and also the need to use the speller in a graphics environment. In 1993, BesAna for Windows was created. Since then, we have developed a new version of BesAna for every new version of Word or WordPerfect. It is adapted with editors and the dictionary is being constantly expanded.

     
     
 
 
 
 
Peter Holozan  
     
      Matej Pivec

At the same time, we began to cooperate with DZS d.d., which is the biggest publishing company in the field of dictionaries in Slovenia. Various tools were required for computer data preparing and error checking in dictionaries, lexicons and encyclopedias. The tool collection gradually increased in size and quality. This collection of tools made it possible for us to handle large language databases very fast and effectively.
In addition to preparing paper dictionaries by computer, we began to consider electronic versions of dictionaries. Since the proper tool for fast searching big databases on PC was not available, we decided to build our own system ASP (Amebis database). In ASP, we fused all our experience and the user's wishes and needs. We combined in a single program (ASP viewer), which was initially run under DOS and Windows, simplicity, speed of use and a major level of data compression, which was very important in the age of floppy disc installations. One of the specialities of the ASP system was that we were able to put different types of databases (dictionaries, registers, catalogues, business relation databases, texts) into the same ASP format. All these databases have very different structures. We have read various collections in entirely the same way with a single program.
In order to spread the program among users, we have had to provide interesting collections. The first language collection in ASP format was the Large German-Slovene dictionary of the DZS d.d. publishing company, which was published in the middle of 1994. The user response was very favourable, which gave a further impetus to the further development of the ASP system. We prepared also a few smaller useful demonstration collections (Postal numbers, Improper English verbs...). In view of the good sales of the Large German-Slovene dictionary, we decided in conjunction with DZS d.d., to move other dictionaries into the electronic form, too. The next dictionary in ASP format, with similar appearance, was the Slovene-English dictionary. At the end of 1996 and begining of the 1997, the next three electronic dictionaries were published: the Dictionary of Slovene literary language, the Great English-Slovene dictionary and the Great Slovene-German dictionary. We are also planning new electronic dictionary editions which will be available also on CD-ROM media with the new 32-bit ASP viewer.
During this period, we have completed a number of other projects based on the ASP system. The CD-ROM Region lexicon of Slovenia was very attractive, a product of cooperation between the Geographic institute ZRC SAZU, DZS d.d. and Amebis d.o.o. In contrast with other collections, data included pictures and photos, which added to the attractiveness of the CD-ROM.

 
   
   
   
   
  Iztok Grilc   Ales Veluscek
           

Simultaneous with the development of the ASP system, we also continued with the development of language modules. We developed various language modules for the Microsoft company, for the Slovenian version of Microsoft Office. We prepared for the Slovene market a spelling checker, hyphenation module and thesaurus. A grammatical module is still under development. For the Serbian speaking market, we developed with our Serbian partner, a speller and hyphenation module, built into the Serbian version of Microsoft Office. We have also made a Slovenian speller for the Corel company and their editor WordPerfect.
In thedevelopment of language modules, first of all the grammatical module, it became clear that we need a large number of sentences, so we have started to collect texts and build our own corpus. The structure and functions of the corpus are based on our own development system ABIS, which provides not only storage and fast data searching but also the automatic execution of defined operations on texts. We also cooperate in other projects in the field of corpuses, domestic and foreign (FIDA, Copernicus MULTEXT-EAST).
We are also developing acoustic interface. Taking advantage of the fact that two very strong groups are working on the field of voice recognition, we started to work on generation of the human voice. Primarily for the needs of the blind and weak-sighted, we are making a module for voice generation in Windows, which will be integrated into two concrete applications.
We have recently started in the new hot theme - Internet. Among other things, we have made an Internet presentation of the Chamber of Craft of Slovenia, using an Internet version of ASP. With its help, users can search for data very fast on large databases like the Craft register OZS.
For the future, we are planning to continue developing language tools, especially in new fields, such as a system for computer translation support, for example. We will still devote attention to Internet applications and the introduction of language modules, expansion into new ASP collections (new dictionaries, lexicons, books) and multimedia applications.