Untitled Document
[an error occurred while processing this directive]
www.expresscomputeronline.com WEEKLY INSIGHT FOR TECHNOLOGY PROFESSIONALS
23 April 2007  
Untitled Document
Sections

Market
Management
Technology
Technology Life

Columns

Between The Bytes

Events

Technology Senate
Technology Sabha

Specials

HMA Bankbiz
UPS Batteries

Services
Subscribe/Renew
Archives
Search
Contact Us
Network Sites
Network Magazine India
Exp.Channel Business
Express Hospitality
Express TravelWorld
feBusiness Traveller
Express Pharma
Exp. Healthcare Mgmt.
Express Textile
Group Sites
ExpressIndia
Indian Express
Financial Express

Untitled Document
 
Home - Technology - Article

Application

Deciphering the Vedas

Perfect understanding of ancient works is a necessary key to unlocking the secrets of the past, and this is where Artificial Intelligence comes into play. By Varun Aggarwal

The Vedas are among the oldest religious books in the world. They tell us a lot about our ancient culture and civilisation. Unfortunately only a few people make the effort to know what they mean. There are courses available in Sanskrit and Vedic studies in India; however, they have few takers. The subject is more popular in the west, where an Indian researcher is tapping the power of computing to unearth the knowledge in these ancient texts.

The Mystic Vedas

"My hope is to eventually complete the research and provide a complete substantiate software system, necessary new algorithms and the requisite corpora, so that the work can be self-sustaining and will not need to be done only by hand after extensive instruction"

- Prabhu Ram Raghunathan
research engineerfrom the Carnegie Mellon University

Prabhu Ram Raghunathan, an Indian born research engineer, software developer, roboticist, technology developer/entrepreneur and writer /historian from Carnegie Mellon University, Pennsylvania, is trying to use Artificial Intelligence to answer real world questions, such as the chronology of the Vedic people. Different verses and chapters in the Rig Veda are supposed to have been composed at different times. He feels, “The study of the Vedas is quite interpretive, as words can have multiple meanings. Further Sanskrit has a number of compound words, which need to be split according to Sandhi and Samasa rules, and according to the correct number and case. Sandhis are rules for fusing words retaining the meanings of both the original words as in boatman. Samasas are rules for forming compound words where a new word is formed. So, the Vedas are presented in Samhitha or native form and a padapatha or split up form which can be interpreted. Due to all these variations, a concordance study is valuable. We can profitably apply ontological and inferential techniques to analyse the Vedas.”

If you consider the major philosophies of Hinduism, (Upanishadic and later), they all discuss the concept of the “Brahman” or the Supreme Being or Godhead. But there is hardly a mention of Brahman in the Vedas. So questions like when did these concepts emerge or what views emerged at what times and so on arise. Currently such questions are answered by comparative linguists, philologists etc. 

Raghunathan’s work is focused on trying to answer such questions on a statistical basis, using Machine Learning (AI), aided by computational linguistics and natural linguistics techniques, knowledge engineering and information retrieval.

Tools and techniques
Raghunathan has had to develop a number of tools and techniques for his work.

  • Regular expression based Sanskrit parser for certain common constructs
  • Transliteration software package, that lets one to convert from different encodings and conventions to write Sanskrit in Latin fonts, such as Harvard Kyoto and ITrans and to do inter-conversion between them. This also does parsing and conversion to Unicode.
  • Transliteration correction software, which lets you fix punctuations and errors committed by people digitising Sanskrit text. (For this, extensive parsing ability is needed)
  • A bunch of databases, made partly by hand and partly automatically to have a reliable corpus that can be analysed.
  • A minimalist grammar parsing system, based on direct grammar and heuristics as found in, say, a high school text book.
  • Several XML based conversion, processing and analysis systems
  • Distributed Sanskrit dictionary lookup and concordance system which can compare words, concepts and concordances across Vedic texts.

A work in progress

Raghunathan started working on new techniques and systems like analytical and ontological tools for Sanskrit analysis in 2000. Ontology can be thought of as something on the lines of the Google directory. It gives information about information. This has a lot of value in machine processing, in speeding up and compacting queries, helping comparative study etc.

He is currently trying to build a comprehensive corpus of all available recensions and classical commentaries of the Rig Veda. “With the parts of the Rig Veda that I have assembled so far, I have been developing techniques and conducting analytical studies, part by part. My hope is to eventually complete the research and provide a complete substantiate software system, necessary new algorithms and the requisite corpora, so that the work can be self-sustaining and will not need to be done only by hand after extensive instruction.”

Such a new repository with analytic techniques with complete statistics on a scientific basis and not ‘rule of thumb’ or purely experiential basis and a software system will help conduct deep comparative analysis and close reading, across languages and different eras of Sanskrit.

Food for thought
The Vedas are the ancient Hindu Scriptures and are among the oldest religious books of the world. They are four in number—the Rig, Yajur, Sama and Atharva Vedas, listed in order of antiquity. The first three are cognates and were chanted during rituals. The Rig Veda is the primary Vedic text and consists of 10 books and a few thousand verses. It was meant to be recited by the Hotar priests during the fire sacrifice. The Yajur was assigned to the Advharyu priests who conducted the sacrifices and the Sama Veda to the Udhgathi priests who “sang” the Sama verses during the preparation of the Holy Soma juice. The Yajur and Sama Vedas repeat a lot of Rig Vedic content, with special syllables being added in some cases. Each Veda could have several recensions. Apart from Hindu theology, study of the Vedas becomes necessary to understand ancient Indian history as the Vedas are the closest we have to historical texts of those times and are studied using myriad approaches: Philology, Etymology, Comparative Religion, Sanskritology, Indology etc.

The commonest version of the Sama Veda consists of 1,875 verses, arranged in four parts, which are further divided into books and chapters. The chapters are then broken into decades. Each decade could consist of 10 verses. To each decade is assigned a God or a Collection of Gods to whom the verses are addressed, either as a salutation or as a prayer, such as Varuna, the Water God or to Visvedeva, or All of the Gods. Most verses are addressed to Indra, the thunderbolt wielding Rain God, and King of the Gods and to Agni, the fire God, who is considered the conveyor of the sacrificial offerings to the Devas (the Gods). A number of the verses are naturally in praise of Soma. Decades also identify the metre in which they were composed and the Rishi or sage who composed them.

The trouble with ontology

While most common ontology creation systems such as Protégé and WebOnto tend to support manual creation with a set of editor, language processor, visualisation, maintenance and exchange tools, several new systems have attempted to automate the creation process. The ontologies generated by them have to be evaluated and corrected by a human expert, in an iterative process, before they can be deployed. In this sense, they are semi-automatic. Ontology learning systems attempt to parallel the workflow in a manual creation environment. Typically, the first stage is to process the natural language corpus available with a text-processing module to perform tasks such as stemming, part of speech tagging, parsing etc. and an intermediate corpus is generated. Ontological information is then extracted with the use of known lexical constructs and a domain lexicon. The information learned is the main concepts in the domain and subsequently their interrelationships. The first pass ontology so generated, is then pruned, evaluated and refined iteratively by a domain expert. Apart from extraction from scratch, ontologies can also be automatically refined or reverse engineered. The additional stages involve the import of existing ontologies as a starting point, the export of the generated ontology into an existing higher ontology and merging of related ontologies. Most such environments support common ontology exchange languages such as RDF schema, OIL, OWL etc. Starting with natural language text may always not be the case. Ontology learning systems can also start from specialised data repositories and schema.

The underlying learning algorithms and natural language processing facilities can vary greatly as also the role of syntactic and semantic patterns rely on term frequency metrics (TF, TFIDF etc.) and domain specific heuristics for identifying syntactic patterns. ASIUM is another, earlier system that generates domain taxonomies through hierarchical clustering.

Although ontology learning by itself is a nascent area, the actual techniques used derive from computational linguistics, machine learning, information retrieval etc. However, ontology learning methods noted above have been highly domain specific, as they rely on domain heuristics.

Tools and techniques

Raghunathan has had to develop a number of tools and techniques for his work.

  • Regular expression based Sanskrit parser for certain common constructs
  • Transliteration software package, that lets one to convert from different encodings and conventions to write Sanskrit in Latin fonts, such as Harvard Kyoto and ITrans and to do inter-conversion between them. This also does parsing and conversion to Unicode.
  • Transliteration correction software, which lets you fix punctuations and errors committed by people digitising Sanskrit text. (For this, extensive parsing ability is needed)
  • A bunch of databases, made partly by hand and partly automatically to have a reliable corpus that can be analysed.
  • A minimalist grammar parsing system, based on direct grammar and heuristics as found in, say, a high school text book.
  • Several XML based conversion, processing and analysis systems
  • Distributed Sanskrit dictionary lookup and concordance system which can compare words, concepts and concordances across Vedic texts.

The case for using ontology on the Vedas

The typical uses would be indexing, annotating, reasoning, improving parsing models on other texts and semantic disambiguation etc., given the number of recensions and interpretations; it becomes useful to be able to ontologise them prior to analysis. This facilitates both intelligent retrieval, for instance in the form of concordances, as also inferences and revisions to the interpretations through disambiguation. We are also able to reconcile post Rig Vedic works with the Rig Veda and understand the role of each in the Indological domain. As it is commonly held that the hymns of the Rig Veda were extant for centuries and were redacted as text over a long period of time, during which there were changes to the beliefs, attributes and dramatis personae in the Vedic hymns, developing a formal structure such as a domain ontology could give us some insights into the chronology and concept evolution therein.

“I am developing or ‘learning’ or ‘deducting’ an ontology or structure of knowledge from a bunch of Sanskrit documents. Constructing any good ontology by hand is itself difficult. Since the domain is vast, we need to construct it automatically, and this is even tougher. To deduct or ‘learn’ an ontology, I am trying to deduce the relationships between the concepts, their structures etc., by machine analysing the Rig Veda, both in text form and translation form. Since the grammar is very difficult, I try to solve the grammar problem in this paper by using, raw statistical learning techniques, which has never before been done for Sanskrit. It has been used on English mainly. But why do it? If we can solve this problem for something like Sanskrit, we can then adapt the same techniques to a number of current problems where learning languages based on inconsistent or incomplete grammar is necessary, such as discovering and providing Web services on the semantic Web,” says Raghunathan.

Further, this work in an ancient domain is highly relevant and extensible to modern domains, including the provision of digital assistant services through a broadband telephony network. If we wished to provide a Semantic Web application to broadband telephony users in a network, and allow that the actual Web services available to a user, through the digital assistant agent, such as scheduling engagements, automatic email response, task allocation and prioritisation etc. needed to be discovered, added to a services list, managed and supported over the Internet, by processing multilingual and multimodal descriptions and maintaining them in a domain services ontology, we see that there is a close parallel to the earlier domain. We are faced with similar challenges. Our multimodal natural language descriptions could come from voice interfaces, multimedia repositories, including pictures and oral advertisements as also common Web service description formats such as OWL-S and XML and may not be reliably or homogeneously phrased. “Some of our current work is on such a project, where we find the need to ontologise domain service information without the benefit of the implicit restrictions to facilitate other tasks,” adds Raghunathan.

 


[an error occurred while processing this directive]
[an error occurred while processing this directive]
Untitled Document
[an error occurred while processing this directive]
[an error occurred while processing this directive]

UNSUBSCRIBE HERE
Untitled Document
Copyright 2001: Indian Express Newspapers (Mumbai) Limited (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in Mumbai by the Business Publications Division (BPD) of the Indian Express Newspapers (Mumbai) Limited. Site managed by BPD.