Sunday, 16 February 2025

The Statistical Logic Behind the Analysis of the Chronological Gulf Between the Old Rigveda and the New Rigveda

 

The Statistical Logic Behind the Analysis of the Chronological Gulf Between the Old Rigveda and the New Rigveda

 Shrikant G. Talageri

 

I have written and uploaded many articles which are the fruit of my detailed research most of them on the subject of the “Aryan” or AIT-vs.-OIT debate, especially, but not only, in the context of the data in the Rigveda and I am proud of most of them. Also, apart from politically loaded articles (mostly on the subject of Hindutva Ideology or BJP politics), I have also written on fundamentally different subjects like Musical Scales (as well as on Hindi and Marathi songs in Jhap taal and Roopak Taal and on Film Music in general), and on the history of Numbers and Numerals. And of course on my first subject of research and study: the Konkani Language and my own Chitrapur Saraswat community, apart from other stray subjects.

One article in particular I am extremely proud of is my article: “FINAL VERSION OF THE CHRONOLOGICAL GULF BETWEEN THE OLD RIGVEDA AND THE NEW RIGVEDA: [WITH THE THE HYMN-AND-VERSE WISE LIST OF NEW WORDS AND OTHER NEW ELEMENTS IN THE RIGVEDA]”, since this was the culmination of a long, arduous and taxing study, involving plenty of paperwork and frequent and repetitive cross-referencing and correcting (and re-correcting). And also because I had been searching everywhere for a list of this kind since decades, without success: the only very great and inspiring work (but only pertaining mainly to Book 8 of the Rigveda) that I came across was the work of a great American Indologist (not an AIT-sceptic) of the late 19th century, and I felt compelled to pay tribute to him in my above article as follows: “let me, at this point, start out by first paying homage to Edward W. Hopkins (1857-1932), in many ways the great pioneer of analysis of the internal chronology of the Rigveda”. Finally it was left to me to take up and complete the task myself:

https://talageri.blogspot.com/2022/08/final-version-of-chronological-gulf.html

 

My above article is my particular pride because it is full of rich, complete and statistics-based data covering the whole of the Rigveda, and it will be impossible for anyone to disprove it (unless, like the fake new Gospeldiscovered” in Irving Wallace’s book The Word, which is so quickly lapped up by the entire politics-ridden and vested-interest-controlled Christian world that it is declared to be genuine, someone “discovers” a new version of the  Rigveda, different from the present one, which has completely different contents and data and this new text is accepted as being older and more genuine, than the only Rigveda known so far). I must repeat what I have said many times elsewhere: this chronological study of the Rigvedic vocabulary will be of prime importance in the study of Indo-Aryan, Indo-Iranian and Indo-European history.

I would have loved it to see anyone genuinely taking up the challenge, and hence was very happy when a person (whom I have never met, but who was in irregular and friendly email contact with me since a few years) wrote a mail to me expressing strong doubts about the statistical method he felt I had used in my analysis. I will refer in detail to this email dialogue (but I will not name this person, now or after this, unless he himself comes out on his own and for whatever reasons of his own, since it was a purely personal mail, and ended unfortunately in a not-so-friendly manner). I am writing this article to show how the principles of statistical textual analysis are little understood and much misunderstood in general in studying the evidence of the data in the Rigveda:

I. The Statistical Arguments.

II. Can the New Words Be Sifted Out?

 

I. The Statistical Arguments

There were a total of five mails from his side, and I will first only quote his purely statistical arguments (given in his first three of these five mails):

Mail 1. 4-2-2025:  

the usage of a word by itself is dependent more on the context rather than whether it is new or old.  If the themes of the New Rigveda are different from those of the Old Rigveda, you would expect different words to be used.  That would not necessarily make those words themselves 'new,' requiring explanation in terms of linguistic evolution/borrowing”.


Mail 2. 4-2-2025:

Take the Old and New Rigvedas.  Together, the old books contain 3241 verses, and the new books contain 7311 verses.  

There are large enough samples.  But there are more than twice as many verses in the new books as in the old books .  Unequal sample sizes have consequences.  

If we assume that each verse contains, on average, a certain number of words, the chances of a randomly chosen word appearing in the new books but not in the old books is correspondingly greater, simply because of the greater number of verses.   The problem now is to prove that, given any word occurring in the new Rigveda but not in the old Rigveda, it is a genuinely 'new' word, and not a 'false positive' due to a combination of the following 

(a) Larger sample size.

(b) Different context/theme.

(c) Different choice of synonyms to ensure adherence to poetic metre.    

Note that the above three conditions are not mutually exclusive either.  

This is where things can go either way, and distinguishing a true positive from a false positive becomes problematic

 

Mail 3. 4-2-2025:

This is not about the sample sizes per se but about unequal sample sizes being compared.  The samples are large enough, running into 3000+ and 6000+ verses.  But, you simply cannot compare a randomly chosen sample of 3000 men with randomly chosen 6000 women if you want to understand average height associations with sex.  You have to compare 6000 to 6000 or 3000 to 3000 - comparable numbers.  In a sample of 6000, you are more likely to come across some 6ft people as well as some 3ft people - extreme values - as compared to a smaller sample, that could skew the average one way or the other.  In such comparisons, one asks:  'How representative is my sample of the entire population?'  Thus far, about unequal sample size.

In a strict sense, the word samples we have from the old and new books are both biased samples.  Their purpose being primarily liturgical, they don't represent the 'actual' word distribution of the Vedic language any more than Latin hymn books represent the entirety of Latin vocabulary and word frequencies.  And, the practical problem is that there is no Rigvedic era dictionary of liturgical and general terms, unlike the case for classical Sanskrit.

 

It will be seen that his main argument is that “unequal sizes” of the “samplesskew the statistical result:

Take the Old and New Rigvedas.  Together, the old books contain 3241 verses, and the new books contain 7311 verses.

There are large enough samples.  But there are more than twice as many verses in the new books as in the old books .  Unequal sample sizes have consequences.

This is not about the sample sizes per se but about unequal sample sizes being compared.  The samples are large enough, running into 3000+ and 6000+ verses.  But, you simply cannot compare a randomly chosen sample of 3000 men with randomly chosen 6000 women if you want to understand average height associations with sex.  You have to compare 6000 to 6000 or 3000 to 3000 - comparable numbers.  In a sample of 6000, you are more likely to come across some 6ft people as well as some 3ft people - extreme values - as compared to a smaller sample, that could skew the average one way or the other.  In such comparisons, one asks:  'How representative is my sample of the entire population?'  Thus far, about unequal sample size.

 

Here are the main and very basic flaws in his arguments:

1. He describes the Rigvedic data as “randomly chosen samples”, and gives, as an example to be compared, a hypothetical case to show that “you simply cannot compare a randomly chosen sample of 3000 men with randomly chosen 6000 women”.

But the Rigvedic data compared does not consist of “randomly chosen samples: it consists of the total database of:

a) The full 280 Hymns, 2368 verses in the Old Rigveda Books 2,3,4,6,7;

b) The full 62 Hymns, 873 verses in the Redacted Hymns in the Old Books 2,3,4,6,7;

c) The full 686 Hymns, 7311 verses in the New Rigveda Books 1,5,8,9,10.1 verses.

Whereas in the hypothetical case he cites, there are indeed randomly chosen samples of 3000 men (out of millions and millions) and 6000 women (out of millions and millions).

2. He tells us (in his hypothetical case) “In a sample of 6000, you are more likely to come across some 6ft people as well as some 3ft people - extreme values - as compared to a smaller sample, that could skew the average one way or the other.

But how does this apply to the Rigvedic data given by us? Out of 280 hymns in the Old Rigveda, all 280 out of the 280 are lacking in new words, meters and composer names; while out of 62 Redacted Hymns, only 1 out of the 62 is lacking in new words, meters and composer names, and out of 686 hymns in the New Rigveda, only 2 out of the 686 are lacking in new words, meters and composer names.

This could have been compared to his “men and women” case only if more than 5900 out of 6000 men, were of 6 feet or 3 feet, while 0 out of 3000 women were of 6 feet or 3 feet, which, obviously, he is not claiming is the case!

This also applies to his argument that the lack of new words could be because of “different choice of synonyms to ensure adherence to poetic metre” or “different context/theme”: how would poetic meter or theme dictate that these new words should appear (and sometimes in numbers as high as 56 or 71 words in even a single hymn) in 61 out of 62 Redacted Hymns and in 684 out of 686 New Rigveda hymns, but in 0 out of 280 Old Rigveda hymns?

3. Fundamentally, his case is comparing apples and oranges: i.e. “men and women”: our case compares hymns and hymns. We don’t get different figures because we are comparing two different objects, we get different figures when we are comparing the same objects: hymns. But yes, they are different objects only in the sense that they are hymns in the Old Rigveda and hymns in the Redacted-Hymns/New-Rigveda respectively. But then that is what my article was all about (and that is in fact what he is arguing against): that words in the New Rigveda are a different object from words in the Old Rigveda!

He also, for some reason (perhaps he notices this above flaw in his argument) tries to tell us that the full corpus of the words in the  Rigveda does not represent the full corpus of words in the Rigvedic language of that time as a whole:

The data are obviously fragmentary, and we may never get corroborative proof for all words, as we don't have any dictionary of the Vedic language.  The Rigvedic language lives only in the Rigveda, and its words are only a subset of the words ever used in the Vedic language.

In a strict sense, the word samples we have from the old and new books are both biased samples.  Their purpose being primarily liturgical, they don't represent the 'actual' word distribution of the Vedic language any more than Latin hymn books represent the entirety of Latin vocabulary and word frequencies.  And, the practical problem is that there is no Rigvedic era dictionary of liturgical and general terms, unlike the case for classical Sanskrit.

But then again, here, he fails to realize that, whether the words represent the entire Vedic language or only a liturgicalsubset” of the Vedic language, nevertheless the words in all three groups (Old Rigveda, Redacted Hymns and New Rigveda) still do not represent apples, oranges and bananas, all of them still represent the same object: liturgical hymns.

If the same principle is applied to all ancient texts (including those in the Latin language), all chrono-linguistic studies of the ancient vocabulary of any language will have to be declared extremely flawed (if not null and void) on the ground that the available texts may not represent the entire corpus of words in the language concerned.

In this particular case of the Rigveda, it may be noted, the comparison is not only between the vocabulary of two distinct parts of the text (Old Rigveda and New Rigveda) – distinct as per the testimony of a long line of Indologists from Oldenberg through Witzel to Proferes – when entire text nevertheless has wrongly been treated as a single monolithic text for the purpose of chrono-linguistic analysis till date; but also with the vocabulary of the available texts of the Avesta and the Mitanni inscriptions. Did the composers or writers of all these different sources enter into a common conspiracy to stonewall the same items of extant vocabulary, or to use the same items of vocabulary, in their writings?

So it is clear that the identification of the words so profusely present in the New Rigveda, but so totally absent in the Old Rigveda, as New Words cannot be dismissed, nor their value as chronological evidence be under-valued, in this cavalier manner.

  

II. Can the New Words Be Sifted Out?

In the second mail, he also wrote:

The problem now is to prove that, given any word occurring in the new Rigveda but not in the old Rigveda, it is a genuinely 'new' word, and not a 'false positive' due to a combination of the following 

(a) Larger sample size.

(b) Different context/theme.

(c) Different choice of synonyms to ensure adherence to poetic metre.    

Note that the above three conditions are not mutually exclusive either.  

This is where things can go either way, and distinguishing a true positive from a false positive becomes problematic, unless you have other lines of corroborating evidence, as in (1) & (2) above.

In the third mail, he wrote:

The data are obviously fragmentary, and we may never get corroborative proof for all words, as we don't have any dictionary of the Vedic language.  The Rigvedic language lives only in the Rigveda, and its words are only a subset of the words ever used in the Vedic language.  Given the nature of the problem, it is not about whether a given word is or is not a new word.  The words you identified can be grouped into candidates about which certainty about 'newness' varies from the most reliable to the least reliable.

I have seen the data you presented, and I was wondering how certain one could be about the reliability of individual data points based on first principles. No one is saying that all the entries in your list are all wrong or all right.  That is an extreme position I never took - that would be just like saying that the Puranas are either all history or all myth.

Given these unknowns (and unknowables), all I am saying is that some of your identifications are bound to be true positives beyond any reasonable doubt, others very reliable, some others less so in various grades ending in some that are false positives.  False discoveries are statistically inevitable in such cases, and it helps not to take an 'all or none' position on the entire dataset you have undoubtedly assembled after much painstaking study.  Rather, it is a resource for further detailed study of its individual data points in terms of their reliability (some already proven beyond doubt).

 

In short, what he suggests is that all the words in my list of new words may not be new with the same degree of certainty: that there could be many different degrees of certainty for the words actually being new, ranging from what he calls “true positives” to “false positives”. Therefore, he suggests that I must examine the “individual data points” (i.e. each word individually), and that the new words in my list “can be grouped into candidates about which certainty about 'newness' varies from the most reliable to the least reliable” rather than “take an 'all or none' position on the entire dataset you [i.e. I] have undoubtedly assembled after much painstaking study”, to check whether it is really new or only seems to be so because of its presence only in the New Rigveda.

I think this is simply splitting straws, refusing to accept the data available, and setting impossible tasks based on hypothetical assumptions (seeing that he also accepts or admits that “we may never get corroborative proof for all words, as we don't have any dictionary of the Vedic language.  The Rigvedic language lives only in the Rigveda” and that “the practical problem is that there is no Rigvedic era dictionary of liturgical and general terms, unlike the case for classical Sanskrit.”).

 

To begin with, the division of the Rigveda into three groups of hymns (Old Rigveda, Redacted Hymns and New Rigveda) is practically set out by a long line of Indologists from Oldenberg through Witzel to Proferes (see my main article above on the “chronological gulf”). And all the new words listed by me are words which only appear in the Redacted Hymns and the New Rigveda but are completely missing in the Old Rigveda.

I have done my work in meticulous detail. If anyone feels the need to check out each word in the list, invent completely new criteria to determine its “degree” of “certainty” for “newness”, and then sift out words from the list as not being new, it is a senseless task which I at least see absolutely no need to undertake myself, but I welcome sincere efforts by anyone else who wants to undertake that task!

It must be kept in mind that even if anyone undertakes this task, and invents arbitrary and whimsical criteria to sift out words from the list as not being new, all they will manage to achieve “after much painstaking study” is to slightly reduce the number of new words in the list to their own satisfaction. But the end result will be the same: all these words, with some reductions in numbers, will still be only in the Redacted Hymns and New Rigveda but not in the Old Rigveda.

Therefore, I will only conclude with my original remark that this chronological study of the Rigvedic vocabulary will be of prime importance in the study of Indo-Aryan, Indo-Iranian and Indo-European history.   


6 comments:

  1. Talageri Sir,

    There is two people on the internet who have criticized your work in their blogspot.
    Benjamin Lyod and a DXRSam.
    Could you please rebut their arguments?

    ReplyDelete
    Replies
    1. I went through the article by Benjamin Lloyd, and it is so outdated and half-baked that I will not waste time answering it. After my article on the Chronological Gulf between the Old Rigveda and the New Rigveda, anyone who casts doubts on the internal chronology of the Rigveda should think of better ways of doing it rather than picking out stray sentences from my second book from the year 2000 and demonstrating his inability to understand English.

      I didn't get the identity of the second person, DXRSAm. Why don't you just tell me what are the exact criticisms made by him which you feel require to be replied to?

      Delete
  2. Hi Talageri sir,

    DXRSam article is titled: "Shattering S.Talageris' " The Elephant and Proto-Indo-European Homeland".
    He claims to have refuted your above article.
    His article can be be found here below:
    https://lingetc.wordpress.com

    ReplyDelete
    Replies
    1. I have already replied to this man in detail. After which he made another desperate attempt to save his reputation, so I followed it up with a second article. Now even many of his supporters accept that he is beaten. My first article:

      https://talageri.blogspot.com/2022/08/indian-fauna-elephants-foxes-and-ait.html

      The second:

      https://talageri.blogspot.com/2022/11/the-shatterer-again-on-leopards-rather.html

      Delete
    2. In one of your books, you made an interesting point about some mesopotamian words being found in the Rig Veda, like for example "Bekanata". Any idea how many mesopotamian words might be found in the Rig Veda? This would be another solid evidence in favour of the equation of Harrapans=Vedic Aryans. Since the Harappans had trade contacts with mesopatamia, it would explain how these words entered into the Vedic language. Therefore, the AIT would have a hard time trying to account for this..

      Delete
  3. Thank you Talageri sir.
    Have you responded to Benjamin Lloyd's claims? I think it would be a good idea just for the record so that there is less maneuvering space for these AIT types.

    ReplyDelete