The Statistical
Logic Behind the Analysis of the Chronological Gulf Between the Old Rigveda and
the New Rigveda
Shrikant G.
Talageri
I have written and uploaded many
articles which are the fruit of my detailed research – most of them on the subject of the “Aryan”
or AIT-vs.-OIT debate, especially, but not only,
in the context of the data in the Rigveda – and I am proud of most of them. Also, apart from
politically loaded articles (mostly on the subject of Hindutva Ideology
or BJP politics), I have also written on fundamentally different
subjects like Musical Scales (as well as on Hindi and
Marathi songs in Jhap taal and Roopak Taal and on
Film Music in general), and on the history of Numbers and Numerals.
And of course on my first subject of research and study: the Konkani
Language and my own Chitrapur Saraswat community, apart from
other stray subjects.
One article in particular I am extremely
proud of is my article: “FINAL
VERSION OF THE CHRONOLOGICAL GULF BETWEEN THE OLD RIGVEDA AND THE NEW RIGVEDA: [WITH THE THE
HYMN-AND-VERSE WISE LIST OF NEW WORDS AND OTHER NEW ELEMENTS IN THE RIGVEDA]”,
since this was the culmination of a long, arduous and taxing study, involving
plenty of paperwork and frequent and repetitive cross-referencing and
correcting (and re-correcting). And also because I had been searching
everywhere for a list of this kind since decades, without success: the only
very great and inspiring work (but only pertaining mainly to Book 8
of the Rigveda) that I came across was the work of a great American
Indologist (not an AIT-sceptic) of the late 19th century, and
I felt compelled to pay tribute to him in my above article as follows: “let me, at this point, start
out by first paying homage to Edward W. Hopkins (1857-1932), in many ways the
great pioneer of analysis of the internal chronology of the Rigveda”.
Finally it was left to me to take up and complete the task myself:
https://talageri.blogspot.com/2022/08/final-version-of-chronological-gulf.html
My above article is my particular
pride because it is full of rich, complete and statistics-based
data covering the whole of the Rigveda, and it will be impossible
for anyone to disprove it (unless, like the fake new Gospel “discovered”
in Irving Wallace’s book The Word, which is so
quickly lapped up by the entire politics-ridden and vested-interest-controlled Christian
world that it is declared to be genuine, someone “discovers”
a new version of the
Rigveda, different from the present one, which has completely
different contents and data and this new text
is accepted as being older and more genuine,
than the only Rigveda known so far). I must repeat what I have
said many times elsewhere: this chronological study of the Rigvedic vocabulary
will be of prime importance in the study of Indo-Aryan, Indo-Iranian
and Indo-European history.
I would have loved it to see
anyone genuinely taking up the challenge, and hence was very happy when a
person (whom I have never met, but who was in irregular and friendly email
contact with me since a few years) wrote a mail to me expressing strong doubts
about the statistical method he felt I had used in my analysis. I will refer in
detail to this email dialogue (but I will not name this person, now or after
this, unless he himself comes out on his own and for whatever reasons of his
own, since it was a purely personal mail, and ended unfortunately in a not-so-friendly
manner). I am writing this article to show how the principles of
statistical textual analysis are little understood and much misunderstood in
general in studying the evidence of the data in the Rigveda:
I. The Statistical Arguments.
II. Can the New Words Be Sifted Out?
I. The Statistical
Arguments
There were a total of five mails
from his side, and I will first only quote his purely statistical
arguments (given in his first three of these five mails):
Mail 1. 4-2-2025:
“the
usage of a word by itself is dependent more on the context rather than
whether it is new or old. If the themes of the New Rigveda are different
from those of the Old Rigveda, you would expect different words to be
used. That would not necessarily make those words themselves 'new,'
requiring explanation in terms of linguistic evolution/borrowing”.
Mail 2. 4-2-2025:
“Take
the Old and New Rigvedas. Together, the old books contain 3241 verses,
and the new books contain 7311 verses.
There
are large enough samples. But there are more than twice as many
verses in the new books as in the old books . Unequal sample sizes
have consequences.
If
we assume that each verse contains, on average, a certain number of words, the
chances of a randomly chosen word appearing in the new books but not in the old
books is correspondingly greater, simply because of the greater number of
verses. The problem now is to prove that, given any word
occurring in the new Rigveda but not in the old Rigveda, it is a genuinely
'new' word, and not a 'false positive' due to a combination of the
following
(a)
Larger sample size.
(b)
Different context/theme.
(c)
Different choice of synonyms to ensure adherence to poetic metre.
Note
that the above three conditions are not mutually exclusive either.
This is where things
can go either way, and distinguishing a true positive from a false
positive becomes problematic”
Mail 3. 4-2-2025:
“This is not about
the sample sizes per se but about unequal sample sizes being
compared. The samples are large enough, running into
3000+ and 6000+ verses. But, you simply cannot compare a randomly chosen
sample of 3000 men with randomly chosen 6000 women if you want to understand
average height associations with sex. You have to compare 6000 to 6000 or
3000 to 3000 - comparable numbers. In a sample of 6000, you are more
likely to come across some 6ft people as well as some 3ft people -
extreme values - as compared to a smaller sample, that could skew the average
one way or the other. In such comparisons, one asks: 'How
representative is my sample of the entire population?' Thus far,
about unequal sample size.
In a strict
sense, the word samples we have from the old and new books
are both biased samples. Their purpose being primarily
liturgical, they don't represent the 'actual' word distribution of the
Vedic language any more than Latin hymn books represent the entirety of
Latin vocabulary and word frequencies. And, the practical problem is that
there is no Rigvedic era dictionary of liturgical and general terms,
unlike the case for classical Sanskrit.”
It will be seen that his main
argument is that “unequal sizes” of the “samples” skew
the statistical result:
“Take
the Old and New Rigvedas. Together, the old books contain 3241 verses,
and the new books contain 7311 verses.
There
are large enough samples. But there are more than twice as many
verses in the new books as in the old books . Unequal sample sizes
have consequences.”
“This
is not about the sample sizes per se but about unequal sample
sizes being compared. The samples are large
enough, running into 3000+ and 6000+ verses. But, you simply cannot
compare a randomly chosen sample of 3000 men with randomly chosen 6000 women if
you want to understand average height associations with sex. You have to
compare 6000 to 6000 or 3000 to 3000 - comparable numbers. In a sample of
6000, you are more likely to come across some 6ft people as
well as some 3ft people - extreme values - as compared to a smaller sample,
that could skew the average one way or the other. In such comparisons,
one asks: 'How representative is my sample of the entire
population?' Thus far, about unequal sample size.”
Here are the
main and very basic flaws in his arguments:
1. He
describes the Rigvedic data as “randomly chosen samples”, and gives, as
an example to be compared, a hypothetical case to show that “you
simply cannot compare a randomly chosen
sample of 3000 men with randomly chosen 6000 women”.
But the Rigvedic data compared
does not consist of “randomly chosen samples”: it consists of the total
database of:
a) The full 280
Hymns, 2368 verses in the Old Rigveda Books 2,3,4,6,7;
b) The full 62
Hymns, 873 verses in the Redacted Hymns in the Old Books 2,3,4,6,7;
c) The full 686
Hymns, 7311 verses in the New Rigveda Books 1,5,8,9,10.1
verses.
Whereas in the hypothetical case
he cites, there are indeed randomly chosen samples of 3000
men (out of millions and millions)
and 6000 women (out of millions and millions).
2. He tells us (in
his hypothetical case) “In a sample of 6000, you are more likely to
come across some 6ft people as well as some 3ft people - extreme values - as
compared to a smaller sample, that could skew the average one way or the other.”
But how does this
apply to the Rigvedic data given by us? Out of 280 hymns
in the Old Rigveda, all 280 out of the 280
are lacking in new words, meters and composer names;
while out of 62 Redacted Hymns, only 1 out of the
62 is lacking in new words, meters and composer
names, and out of 686 hymns in the New Rigveda,
only 2 out of the 686 are lacking in new words, meters
and composer names.
This could have been
compared to his “men and women” case only if more than
5900 out of 6000 men, were of 6 feet or 3 feet, while 0
out of 3000 women were of 6 feet or 3 feet, which, obviously, he
is not claiming is the case!
This also applies to
his argument that the lack of new words could be because of “different
choice of synonyms to ensure adherence to poetic metre” or “different
context/theme”: how would poetic meter or theme dictate
that these new words should appear (and sometimes in numbers as
high as 56 or 71 words in even a single hymn) in 61 out of 62 Redacted Hymns
and in 684 out of 686 New Rigveda hymns, but in 0 out of 280 Old
Rigveda hymns?
3. Fundamentally, his
case is comparing apples and oranges: i.e. “men and women”:
our case compares hymns and hymns. We don’t get different figures
because we are comparing two different objects,
we get different figures when we are comparing the same objects:
hymns. But yes, they are different objects
only in the sense that they are hymns in the Old Rigveda
and hymns in the Redacted-Hymns/New-Rigveda
respectively. But then that is what my article was all about (and that is
in fact what he is arguing against): that words in the New Rigveda
are a different object from words in the Old
Rigveda!
He also, for some
reason (perhaps he notices this above flaw in his argument) tries to tell us
that the full corpus of the words in the Rigveda does not represent the full
corpus of words in the Rigvedic language of that
time as a whole:
“The data are
obviously fragmentary, and we may never get corroborative proof
for all words, as we don't have any dictionary of the Vedic
language. The Rigvedic language lives only in the Rigveda, and its words
are only a subset of the words ever used in the Vedic language.”
“In a strict
sense, the word samples we have from the old and new books
are both biased samples. Their purpose being primarily
liturgical, they don't represent the 'actual' word distribution of the
Vedic language any more than Latin hymn books represent the entirety of
Latin vocabulary and word frequencies. And, the practical problem is that
there is no Rigvedic era dictionary of liturgical and general terms,
unlike the case for classical Sanskrit.”
But then again, here,
he fails to realize that, whether the words represent the entire Vedic
language or only a liturgical “subset” of the Vedic language,
nevertheless the words in all three groups
(Old Rigveda, Redacted Hymns and New Rigveda)
still do not represent apples, oranges and bananas, all of
them still represent the same object: liturgical hymns.
If the same principle
is applied to all ancient texts (including those in the Latin language),
all chrono-linguistic studies of the ancient vocabulary of any
language will have to be declared extremely flawed
(if not null and void) on the ground
that the available texts may not represent the entire corpus of words in the
language concerned.
In this particular
case of the Rigveda, it may be noted, the comparison is not only between
the vocabulary of two distinct parts of the text (Old Rigveda and
New Rigveda) – distinct as per the testimony of a long line of
Indologists from Oldenberg through Witzel to Proferes – when
entire text nevertheless has wrongly been treated as a single monolithic text
for the purpose of chrono-linguistic analysis till date; but also with the
vocabulary of the available texts of the Avesta and the Mitanni inscriptions.
Did the composers or writers of all these different sources enter into a common
conspiracy to stonewall the same items of extant
vocabulary, or to use the same items of vocabulary, in their
writings?
So it is clear that
the identification of the words so profusely present in the New
Rigveda, but so totally absent in the Old Rigveda,
as New Words cannot be dismissed, nor their value as chronological
evidence be under-valued, in this cavalier manner.
II. Can the New
Words Be Sifted Out?
In the second mail, he also
wrote:
“The
problem now is to prove that, given any word occurring in the new
Rigveda but not in the old Rigveda, it is a genuinely 'new' word,
and not a 'false positive' due to a combination of the following
(a)
Larger sample size.
(b)
Different context/theme.
(c)
Different choice of synonyms to ensure adherence to poetic metre.
Note
that the above three conditions are not mutually exclusive either.
This is where things
can go either way, and distinguishing a true positive from a false
positive becomes problematic, unless you have other lines of
corroborating evidence, as in (1) & (2) above.”
In the third mail, he wrote:
“The
data are obviously fragmentary, and we may never get corroborative proof
for all words, as we don't have any dictionary of the Vedic
language. The Rigvedic language lives only in the Rigveda, and its words
are only a subset of the words ever used in the Vedic language. Given the
nature of the problem, it is not about whether a given word is or is
not a new word. The words you identified can be grouped into
candidates about which certainty about 'newness' varies from the most reliable
to the least reliable.
I have seen the data
you presented, and I was wondering how certain one could be about the
reliability of individual data points based on first principles. No one is
saying that all the entries in your list are all wrong or all
right. That is an extreme position I never took - that would be just
like saying that the Puranas are either all history or all
myth.
Given these unknowns
(and unknowables), all I am saying is that some of your identifications are
bound to be true positives beyond any reasonable doubt, others very reliable,
some others less so in various grades ending in some that are false
positives. False discoveries are statistically inevitable in such
cases, and it helps not to take an 'all or none' position on the
entire dataset you have undoubtedly assembled after much painstaking study. Rather,
it is a resource for further detailed study of its individual data
points in terms of their reliability (some already proven beyond
doubt).”
In short, what he
suggests is that all the words in my list of new
words may not be new with the same degree of certainty: that
there could be many different degrees of certainty for the words actually being
new, ranging from what he calls “true positives” to “false
positives”. Therefore, he suggests that I must examine the “individual
data points” (i.e. each word individually), and that the new words
in my list “can be grouped into candidates about which certainty about
'newness' varies from the most reliable to the least reliable” rather than “take
an 'all or none' position on the entire dataset you [i.e. I] have
undoubtedly assembled after much painstaking study”, to check whether it is
really new or only seems to
be so because of its presence only in the New Rigveda.
I think this is
simply splitting straws, refusing to accept the data available, and setting
impossible tasks based on hypothetical assumptions (seeing that he also accepts
or admits that “we may never get corroborative proof for all words,
as we don't have any dictionary of the Vedic language. The Rigvedic
language lives only in the Rigveda” and that “the practical problem
is that there is no Rigvedic era dictionary of liturgical and general
terms, unlike the case for classical Sanskrit.”).
To begin with, the
division of the Rigveda into three groups of hymns
(Old Rigveda, Redacted Hymns and New Rigveda)
is practically set out by a long line of Indologists from Oldenberg
through Witzel to Proferes (see my main article above on the “chronological
gulf”). And all the new words listed by me are words which
only appear in the Redacted Hymns and the New Rigveda
but are completely missing in the Old Rigveda.
I have done my work
in meticulous detail. If anyone feels the need to check out each word
in the list, invent completely new criteria to determine
its “degree” of “certainty” for “newness”, and then sift
out words from the list as not being new, it is a
senseless task which I at least see absolutely no need to undertake myself,
but I welcome sincere efforts by anyone else who wants to undertake that
task!
It must be kept in mind
that even if anyone undertakes this task, and invents arbitrary and whimsical
criteria to sift out words from the list as not being new,
all they will manage to achieve “after much painstaking study” is to
slightly reduce the number of new words in the list to
their own satisfaction. But the end result will be the same: all these words,
with some reductions in numbers, will still be only in the Redacted
Hymns and New Rigveda but not in the Old Rigveda.
Therefore, I will
only conclude with my original remark that this chronological study
of the Rigvedic vocabulary will be of prime importance in the
study of Indo-Aryan, Indo-Iranian and Indo-European
history.