Friday 17 December 2021

A Few Major Errors in the Transliteration of Indian words into English

A Few Major Errors in the Transliteration of Indian words into English

Shrikant G. Talageri


There is a difference between "translation" and "transliteration": the former is a word in one language converted into its equivalent meaning in another language, the latter is a word in one language written in the alphabet or orthography of another language. Thus the Hindi word  "घर"  is translated into English as "house" and transliterated into English (or more correctly into the Roman alphabet) as "ghar".

I am not trying to underestimate the knowledge of my readers in general by pointing this out; I am just making this clear at the very beginning so that those of my readers who do not know the difference do not raise irrelevant questions or objections based on lack of knowledge of this distinction.

I am compelled to write this article because many Indians labor under blatantly and militantly wrong ideas of transliteration of Indian words into English, and some have in the past even argued vehemently with me on this point in the comments section of my blogs or song-videos, so that I feel it necessary to clarify the issue in one place. I hope what I am writing will be read and understood in an honest and rational spirit by my readers — since this is a rational and logical issue — and not made into a personal issue or an issue of chauvinistic dispute. Basically, this article is a plea for a rational consensus on transliteration into English.

It should first be made clear that every language has sound distinctions peculiar to itself. In some cases, the written language may be following older outdated spelling conventions (e.g. Burmese) or conventions based on the pronunciation of words in an ancestral or model language (e.g. Sanskrit), and so the spoken words may be pronounced differently from the written forms if the actual individual letters are taken into consideration. Thus the Bengali word  "সুন্দর"  (Devanagari equivalent "सुंदर",  Roman "sundar") is actually pronounced, I think, as "shundor". So it would be as correct to transliterate the word as "shundor" (I personally would back this) as per the pronunciation, as to transliterate it as "sundar" as per the conventional Sanskrit-based spelling.

In this article I am not talking about the difference between these two variant transliterations.

I am talking about actual conventions popularly in use, in transliterating Indian words into English (Roman), which blatantly violate both conventional Indian spellings as well as the actual pronunciations; and, while showing a very hazy or  lazy understanding of phonetic values, also result in unnecessary or avoidable fallacies in transliteration and unnecessary or avoidable confusion within sets of two different phonetic sounds.

 

I will discuss this under two heads:

A. South Indian mis-transliterations.

B. Other Indian mis-transliterations.

 

A. South Indian mis-transliterations

Before some South Indian reader takes up arms against me and accuses me of being "anti"-South India, let me clarify that (although both my father and therefore I myself were born and brought up in Mumbai) my paternal native place is Sagar (Shimoga district) and my maternal native place is Mangalore, both in Karnataka. As a Chitrapur Saraswat (see my blog articles on the Chitrapur Saraswat Community and on the Konkani language), both Kannada and Marathi languages and cultures are part of my identity. My mother was born and brought up in Mangalore, studied in Kannada and spoke Tulu very well, and was a passionate lover and singer of Kannada songs (I have — it is one of the biggest regrets of my life that she was not alive to see it — uploaded two Kannada kritis sung by her long ago in an old video cassette, Tāraka Bindige and Innū Daya Bārade, on my youtube channel). I am well-versed in all the four South Indian alphabets, and know the numbers 1-100 in all of them (see my blog article on India's Unique Place in the World of Numbers and Numerals), can speak Kannada reasonably well (as far as I could learn it from my mother and practice it during my rare visits to Sagar), and had made separate attempts in my college-days to learn Tamil and Telugu from Teach-yourself books (and gave up only because honestly I was not acquainted at that time with any Tamil or Telugu speaker with whom I could actually try to speak the language). I have the greatest respect for Tamil and the rich culture of Tamilnadu, and even have sympathies (in spite of my strong antipathy towards "Dravidianist" separatist ideologies and cranks) with attempts to "purify" Tamil vocabulary. And among my favorite often-listened-to videos are videos of the mass singing of Tyagaraja Kritis at the annual Tyagaraja Aradhana Festival at Tiruvaiyaru. I also love Bharata-Natyam, Kuchipudi and Yakshagana dance music.

So I beg that no-one should take what I am writing below in any hostile or wrong sense; though, of course, ultimately no-one can really stop those grimly determined to take anything in a hostile or wrong sense from doing so.

 

To come to the point: Indian languages generally have the following eight consonantal sounds: ,,,,,,,.

These eight consonantal sounds are translated into phonetic Roman as follows: ta, tha, da, dha, ṭa, ṭha, ḍa, ḍha.

 

Now it so happens that in normal or common usage, phonetic symbols (with diacritical marks, such as the "dot" below the last four Roman transliterations above) are unknown or not in common use or difficult to type. So people outside the South generally transliterate the eight sounds in common or pedestrian usage as follows: ta, tha, da, dha, ta, tha, da, dha. That is, both and are transliterated as ta, etc. This is an inevitable shortcoming necessitated by the shortage of correct symbols in the Roman alphabet: the Roman alphabet does not have separate letters for the two sounds, so the same letter is used.

In more scholarly works (especially on the internet), where phonetic letters with diacritical marks are not easily available, people skillfully make use of the distinction between Lowercase and Capital letters in the Roman alphabet to write the eight sounds as follows: ta, tha, da, dha, Ta, Tha, Da, Dha.

 

Unfortunately, the South has developed a system of transliteration which confuses up everything: the eight sounds are represented as follows: tha, tha, dha, dha, ta, tha, da, dha. Here, three distinct sounds  , and   are represented by the transliteration tha; and three distinct sounds  , and   are represented by the transliteration dha. This gross state of affairs is not necessitated by the shortage of correct symbols in the Roman alphabet but by a hazy or lazy understanding of phonetic values and symbols.

 

I find the fact that my own name, Shrikant Talageri (I myself pronounce the initial syllable as Sri instead of Shri, so the pronunciation or spelling of the first part is not the question) is written by so many people as Shri-/Shree-/Sri-/Sree-kanth Thalageri or even Shri-/Shree-/Sri-/Sree-kantha Thalageri  rather amusing: if  कान्त  is "kanth/kantha", how would you transliterate the word कंठ?

But I am shocked rather than amused when I find so many people in the South (and Hindu people in the South) pronouncing the word Hindu as हिंडू =  Hiṇḍu, simply because the convention of transliterating the word with a "d" rather than a "dh" is difficult to shake off, and so they automatically mispronounce the correctly transliterated word — the name of their own religious identity — as Hiṇḍu as per their idea of the way the Roman letter "d" should be pronounced instead of as Hindu, which is how they actually spell the word even in their four own South Indian alphabets where no-one writes it as ḍu.

I first noticed this trend at a Seminar organized in Bangalore at the Mythic Society premises by N.S. Rajaram in 2001, where the banners written in the Kannada alphabet clearly and correctly read  ಹಿಂದೂ, but many speakers (Kannada and Konkani, and probably others) firmly pronounced the word as Hiṇḍu. After that I have noticed this fallacy many times.

 

I will not bother to argue the issue with anyone who has not understood what the problem is. I will just point out the historical reason why this convention of transliteration arose very logically and reasonably in the first place. It arose from the special features of the Tamil alphabet (and language):

The Tamil alphabet has for example the following four letters:  ,  ,  ,  .

In most other Indian languages there is the need to distinguish between unaspirated unvoiced sounds, aspirated unvoiced sounds, unaspirated voiced sounds and aspirated voiced sounds (e.g.  ,,,  etc. respectively) and so the alphabets of these languages have separate letters for all four types of sounds.

However, in Tamil, the language does not require to distinguish between these four types. So one Tamil letter suffices for all four types of sounds:

=  ,  ,  ,  .

=  ,  ,  ,  .

=  ,  ,  ,  .

=  ,  ,  ,  .

 

Now the special feature of Tamil phonetics is that Tamil has no aspirate sounds at all (i.e. consonants with "h"), so there is no need to distinguish aspirated and unaspirated sounds in the Tamil alphabet.

So in effect:

=  ,  .

=  ,  .

 

Again, the occurrence of the voiced and unvoiced sounds follows a fixed pattern: thus for example   in the beginning of a word is always unvoiced (i.e. "pa")  and in the middle of a word it is always voiced (i.e. "ba"). But, when doubled in the middle of a word, it is again always unvoiced ("ppa"). so there is really no need to distinguish voiced and unvoiced sounds with separate letters in the Tamil alphabet: the sound is immediately clear from its position in the word (though, as in all languages, there may be some dialects which may differ slightly). The Tamil alphabet is therefore perfectly designed to write the Tamil language without a surfeit of non-required letters.

 

So to transliterate Tamil words into the English (Roman) alphabet, in respect of the eight sounds discussed earlier, eight Roman equivalents were not required: four were enough, and the letter "h" for the sound "h" had no place anywhere.

So the transliterators made use of the redundant Roman letter "h" combined with the Roman letters "t" and "d" to distinguish on the one hand between and  , and on the other between and  , without using diacritical marks, as follows:

,,,  were transliterated as tha, dha, ta, da respectively.

This system of transliteration was quite all right and adequate for Tamil because of its limited sound system. But the other South Indian languages do distinguish between unaspirated and aspirated sounds and do actually require the use of the Roman letter "h" to transliterate aspirated consonants. So this Tamil transliteration convention was utterly unsuitable for transliterating words from other South Indian languages into the Roman alphabet.

Nevertheless somehow this system was wrongly adopted for all South Indian languages, and then further misapplied in transliterating even North Indian and Sanskrit words and names!

[A word on Malayalam: Malayalam has its own peculiarities, and in fact there is also the same trend as in Tamil to fudge the dividing line between voiced and unvoiced sounds and aspirated and unaspirated sounds: and in fact it is more complicated in Malayalam because the Malayalam alphabet has separate letters for all the four types of consonants but does not maintain these distinctions in actual pronunciation].

 

I do not of course expect that a practice which has so completely and universally become the accepted convention among writers of a whole major part of India can actually be fully corrected in practice very easily: reading the legacy of more than a century of South Indian writings containing words already transliterated using this convention would be a problem.

And no-one must treat this article as a criticism of those who use this wrong system. If they are people who have been using this system of transliteration for as long as they can remember, then clearly it has become so much of a habit that to read the letter "t" as    is very difficult and only  seems natural.

But at least, people who still continue to adopt or follow this wrong system of transliterating Indian words into the Roman alphabet should not go to the extent of insisting that this practice is the right one, and that it should be adopted for transliterating all Indian words by everyone else.

 

B. Other Indian mis-transliterations

Every language has sets of letters which can confuse speakers of other languages. Thus the distinction between the English letters "v" and "w", to most non-English speakers, would be almost non-existent, although there is indeed a difference in pronunciation of the two letters. Similarly, not all people (even Indians) who use Indian alphabets which distinguish between and fully understand the actual difference in their pronunciations. Likewise, not all Indians who know Urdu will know, or be able to explain clearly, the difference in the actual pronunciations of the Arabic letters ذ,ز, ض, ظ, (zaal, ze, zaad, zoy) as used in Urdu.

In reality, in the mouths of most speakers, some sounds which are distinguished by different letters for historical reasons (mostly because they are pronounced differently in the language from which words containing those letters are borrowed) are actually pronounced similarly because of the sound system of the receiving language. Thus, in Bengali, the three letters  sa,   śa, and   ṣa  (equivalent to Devanagari  ,  ,  and   ) are all pronounced sha (except in certain compound sounds like sta where s is pronounced). And at the same time, since the language does not distinguish between va and ba, both have one common symbol   (pronounced ba).    

 

At this point I must mention one apparent distinction in sounds found in (certain dialects of?) Hindi which I have never been able to understand: it is the letter ḍa written with a dot below it in Hindi. Thus the word baḍā (big) which should be बडा  is actually written  बड़ा. On various occasions I have tried to find out the actual difference in pronunciation between the ḍa without the dot below it and the ḍa with the dot below it; and people educated in the best Lucknow Hindi, and who insist that there is a distinct difference between the two, have tried to explain it to me, but I have never quite grasped the point. The only thing I have noticed is that in the colonial period, Englishmen seemed to have often transliterated and pronounced the ḍa with the dot below it as ra.

 

But all these were mere asides: this article is not about the different sounds present or absent in different languages: it is about the errors in transliterating Indian sounds into the Roman alphabet. I will not go deeply into the subject, only deal with one particular error made by Marathi speakers when transliterating Marathi words into Roman (and this, again, only because some Marathi speakers have in the past argued with me on this point in the comment sections of my Marathi-song uploads on youtube).

Indian languages generally distinguish between voiced (,,,,) and unvoiced (,,,,) sounds.

In respect of the palatal sounds, most Indian languages have the sounds ca () and ja (). However, some languages like Marathi, Konkani, Telugu, Kashmiri, Nepali, and some dialects of Kannada, among the major Indian languages, also have what may be called softer versions of these palatals, more properly called alveolar affricates: tsa, dza.

In my articles on Konkani/Marathi, I always represent these alveolar affricate sounds by the letters ça and za respectively.

Generally these languages  do not have separate letters in their alphabets for the palatals and the alveolar affricates: certainly Marathi does not. Thus the letter suffices for both ca and tsa, and the letter suffices for both ja and dza.

Now when transliterating the Marathi words with these sounds into Roman, how does one translate the four sounds on a general level (e.g. in a youtube video title) without using diacritical marks?

1. The ja is of course easy and direct: it is transliterated with the Roman letter j.

2. The ca could also be transliterated with the Roman latter c. However, while this is perfectly possible in a scholarly article (where in fact one would even use diacritical marks, or at least capital letters to indicate different sounds), the general Roman transliteration of this sound is cha, with the aspirated form  written as chha.

3. The sound tsa also should be transliterated with the Roman letters ts. However, in general writing, no-one would be familiar with this combination as a representative of a Marathi sound, so perforce it is transliterated (by me, though not very happily) with the same sound cha as the sound above. This can be excused with the fact that there is no other alternate (single) letter in plain Roman which could be used here.

4. However, it is in respect of the fourth sound that the dispute arises. The sound dza should be transliterated with the Roman latter dz. However, apart from the fact that in general writing no-one would be familiar with this combination either as a representative of a Marathi sound, here we do have an alternative (single) letter in plain Roman which could be used here: z.

 

So the question here is: should the alveolar affricate Marathi sound be transliterated as j or as z? I firmly use the letter z (rather than the letter j) because it is closer to the sound dz (and in fact many Marathi speakers do pronounce the dza as za, but no-one, except a non-Marathi speaker unfamiliar with the language, would pronounce it as ja). And yet I have had many Marathi viewers of my youtube Marathi-song uploads correcting me and telling me I should use ja instead of za.

Thus, the word  जा  for "go" is common to both Hindi and Marathi: but the Hindi pronunciation is and the Marathi pronunciation is dzā, alternately . I urge my Marathi readers to ask themselves which sounds more correct (or more wrong): or . 

 

I started writing this article enthusiastically, but now I am wondering whether it was at all necessary for me to write it. In any case, as I have completed the article, I think I will upload it anyway!

         

 

12 comments:

  1. Is the 'ça' sound in Marathi a borrowing from Konkani? Also, this letter apparently exists in Portuguese, too, evident in the way that the Tupi-Guarani word 'asai' became the Portuguese 'açai' (name of an Amazonian plant). So is this something that they borrowed from Konkani, or is it a very ancient PIE connection? Just curious to know.

    ReplyDelete
    Replies
    1. I would like to believe that Konkani brought this sound from the northernmost areas, and that the other three languages borrowed it from Konkani. But that is complicated by the fact that the sound is found in the eastern dialects of Marathi as well, and in the whole of Telugu, but only in certain northeastern dialects of Kannada: all of which were really outside the sphere of Konkani influence of any kind. But it is not found in the coastal dialects of Kannada closest to Konkani; in fact the southernmost Cochin dialect of GSB Konkani has lost the tsa/dza sounds: thus "gadzra godzu" is pronounced as "gajra gojju".

      So probably it is a coincidence that the sound is found individually in both the ancient and modern areas of Konkani?

      Delete
  2. Sir , I have noticed two major things which can direct us towards knowing the common link between vedic Sanskrit , north Indian languages and South Indian languages.We all know mountains and rivers were integral part of human settlement.So earliest settlements in India were mostly near rivers and mountains whether north or south.Tamil word for river is nati , For Telugu it's nadi and in vedic sanskrit it's nadi .Now for mountain : Tamil ~ Malai (can be compressed form of Himalay) , In Telugu , Sanskrit it's parvatam .So from here can we deduce crucial facts that either Rig vedic Sanskrit adopted word for river , mountain from proto Dravidian word or Proto Tamil adopted words from sanskrit as Sanskrit was already present in india?? If we consider Tamil and Sanskrit as independently developed language then words for river and mountain would had been different.But it's not the case as both have similar words.Is it possible that Indus Saraswati basin was already a multi ethnic, multilinguistic region.If it is so then We have to take DR.Neeraj Rai's(molecular biologist) word seriously.In his previous paper he already proved that Iranian and ASI mixing started in around 12000 BCE ...And in his next paper he his indicating that R1A1 was Indian Origin haplogroup only which later migrated to the eastern Europe region from Indus valley site.According to Neeraj Rai's version steppe mixing has only happened in India after 600 BCE.Sir what's your opinion on this?? Neeraj Rai recently has shared his views on genetics on the channel Vaad and Jaipur dialogues in the presence of Abhijit Chavda

    ReplyDelete
  3. As a Kannada and Sanketi speaker, I completely agree with you on the need for consensus on a convention for transliterating the 8 sounds you mentioned (t, th, T, Th, d, dh, D, Dh). I disagree on a few points:
    1. I'm not convinced that these incorrect transliterations originated in Tamil and carried to the other South Indian languages. Each language probably independently developed these incorrect transliterations. I say this because there are some of these transliterations which are exclusive to Tamil. The use for th for d and zh for L come to mind. For example, Prasath (Prasad), Govinth (Govind) and Mozhi (Mouli - Yes there's a u missing) are all exclusive to Tamil.
    2. While I don't dismiss the role of the incorrect transliterations in the mis-pronounciation of the word "Hindu" among South Indians, I think it probably has more to do with western imitation (who consistently pronounce the word as hinDu).
    3. I would argue that 2 separate systems for general transliterations and name transliterations is probably more reasonable. I completely agree with the capitalized form transliteration for general words/sounds. This system doesn't work for names though as the first letter is always capitalized and you're not sure which sound the first letter is without familiarity with Indian names. There aren't a lot of Indian names that have the aspirated (mahaapraaNa) sounds in them. So it is reasonable in my opinion to have th for t sound and t for T sound. The example you gave Shrikaanta vs ShrikanTha is very apt for the point you're making, but I'd say it is quite uncommon. Of course, this point assumes that we keep the current system of naming where the first letter is capitalized and the rest of the letters are not (to keep it consistent with rest of the world). Additionally, I think it is reasonable to use d for d sound and dh for dh sound. I'm not sure we have a lot of D or Dh sounds in Indian names (in fact I can't think of any at the moment), but I agree it is hard to make a distinction in such cases.

    ReplyDelete
    Replies
    1. I am sorry but there are innumerable names all over India with D and Dh sounds: surely you must have heard the famous Karnataka surname Padukone which most (non-Karnataka) people pronounce as "पादुकोने" instead of "पडुकोण"? [Except for the last "e" sound because of the unnecessary "e" used in the English spelling, the rest is because of the general problems of transliterating into English].

      The spelling of words like Prasath and Govinth is because of the ambiguity and confusion in the mainly Tamil-Malayalam speakers' minds about the difference between voiced and unvoiced sounds.

      I have been studying since my school days about different languages and alphabets of the world. I must thank you for bringing to my notice again (by mentioning it) the Sanketi speech form. It seems to have very little detailed literature on it. Similarly, a paying-guest in our house in my childhood, named Sharma, had told me that there was a separate Havyaka dialect of Tulu where (I still remember the single example he gave me) the Tulu word boḍci ("don't want" = Kannada bēḍa ) was bōtri. Speakers should produce studies of such distinctive dialects.

      Delete
  4. Also about the letter "ज्ञ" how to transliterate it in roman letter? marathi speaking persons I observed , we tend to pronounce and transliterate it as Dnya , whereas in north india it is Jna or Gna... so is it Yadnya or Yagna or Yajna ?

    ReplyDelete
    Replies
    1. About words in general, it should be at the discretion of the transliterator, because it is the same Devanagari letter which is being pronounced in three different ways in Sanskrit, Hindi and Marathi.

      But take a specific case for example: in Marathi, as you point out, it would be "Sant Dnyaneshwar" and in Hindi "Sant Gyaneshwar". Sanskrit would be transliterated as "Jñāneśvar" or (without diacritics) "Jnaneshwar". I would go with the first Marathi transliteration because the name is of a Marathi saint, but non-Marathi people could be excused for pronouncing and therefore transliterating it in the Hindi or Sanskrit way if that is the pronunciation of that letter that they are used to.

      Delete
    2. Jña is the proper translation. In telugu script the word is written జ "ja" plus ఞ "ña",thus "జ్ఞ " "jña". The interesting thing is Avestan it is yazna. It seems the Vedic people borrowed the word from Iranians. If it is an original Vedic word it would be yagna with a "g".

      The same can be said about "aja" meaning goat. The root is "aj". In Avestan goat is "aza". The Vedic sanskrit word should be "aga".

      Delete
    3. Consequently "Aja" as in goat is a Later Rig Vedic word I think.

      Delete

  5. Talageri, after pondering on Phonology, I came to think about the Mittani. I found a paper claiming teh Indo Iranian names are speculation, here it is.:
    https://www.journals.uchicago.edu/doi/pdf/10.1086/369831
    Please give me your thoughts.

    More importantly, how are letters/sounds like "f", "z", "q", or "w" or "tamil "zh" in India scripts, besides tamil or Malayalam, like Telugu or Devanagari? I would like to know.

    ReplyDelete
    Replies
    1. Thank you for sending me the paper. But remember, this nonsense should be addressed not by me but by all western scholars, including Hock, Witzel, etc. and a hundred years of western scholars who have absolutely upheld the Indo-Iranian connection. Note also that one of the names is Indrota, found nowhere else outside on reference in the Rigveda except among the Mitanni names.

      If anything, this shows that the western scholars, while pretending not to, have taken very serious note of my work, have now silently started setting themselves to the task of trying to discredit the rock-solid Mitanni Indo-Aryan identity. Note how, after 200 years of identifying the Vedic Sarasvati with the Ghaggar Hakra, these "scholars" have now started denying it after the satellite imagery and geological dating of the river. Note how Johanna Nichols was forced to publicly perform a Stalinistic Confession to deny her own thesis of a Bactria-Margiana locus of IE dispersals. Western academics (at least in history) is proving to be the most crooked racket in existence today.

      We really cannot set rules as to how the speakers of a language will depict in their own alphabet a sound foreign to their language. Marathi speakers for some reason transliterate English "v" as "व्ह" with a very prominent "h" sound", and many even pronounce the Hindi "मुझे" as "muzhe" instead of "mujhe". Obviously, no-one (outside Tamil-Malayalam) will have a separate letter for the Tamil "zh" sound.

      Delete
  6. Side note. The austro asiatic loans in Rig veda, Austroasiatic people entered India (likely bringing rice which scholars say is etymology derived from Austroasiatic people) around 4000 years ago, this kind of restrains date of Rigveda.

    ReplyDelete