White Paper: How Ancient “Para-” Languages Illuminate Language Development and Relationships

Case studies: para-Munda (South Asia) and Para-Mongolic (Inner Asia)

Abstract

Scholars sometimes label ancient, poorly attested (or fully unattested) languages that are inferred from substratal effects or fragmentary records with the prefix “para-”: languages that are typologically or historically close to a known family but not direct ancestors of its attested members. Two of the most instructive examples are para-Munda—a hypothesized substratum in northern India showing Austroasiatic-like features—and Para-Mongolic—a sister branch to Mongolic that includes extinct languages such as Khitan and Tuyuhun. Studying these “para-” layers improves our reconstructions of language prehistory, refines family trees, exposes contact-driven change, and anchors linguistic hypotheses in archaeology and textual history. This white paper synthesizes the evidentiary bases, analytical methods, and theoretical payoffs of “para-language” research, with concrete implications for historical linguistics, typology, and areal studies.

1) What “para-languages” are—and why they matter

In historical linguistics, “para-X” denotes languages that are sister to (not descendants of) a known family X, or that represent substratal systems typologically close to X without being demonstrably X itself. Evidence typically comes from (i) substrate vocabulary in early texts of another language, (ii) place-names and ethnonyms, (iii) loanword phonology and morphology, and (iv) occasional deciphered fragments or bilingual glosses.

Para-Munda refers to an inferred substrate in early Indo-Aryan (Vedic) with Austroasiatic-like traits—especially certain prefixing patterns and semantic domains (flora, fauna, agriculture, crafts, music/dance, domestic life, religion). The proposal is associated with Kuiper and Witzel, who argued that parts of the northwest/north Indian lexicon and onomastics reflect a non-Indo-Aryan layer akin to but not identical with Munda.  Para-Mongolic is a proposed extinct sister branch of Mongolic comprising languages such as Khitan and Tuyuhun; it helps separate what belongs to Proto-Mongolic from traits shared more broadly across a higher node. 

Understanding these layers clarifies how today’s families formed, where innovations originated, and how contact reshaped phonology, morphology, and lexicon—often in ways invisible if we only compare surviving languages.

2) Evidentiary foundations

2.1 Para-Munda in South Asia

Key observations from Vedic and later Old Indo-Aryan corpora include:

Lexical clusters in everyday life domains (plants/animals, farming, tools, household, personal adornment, music/dance, some religious items), atypical of inherited Indo-European stock.  Structural hints such as prefixal morphology in certain Vedic items and unusual consonant clusters—suggestive of a non-IA source.  Geography: Witzel traces different substrate zones (e.g., a Kubhā-Vipāś layer in the northwest; an older “language X” in the Gangetic plain), implying multiple pre-Indo-Aryan systems later overwritten by Indo-Aryan expansion. 

These data are consistent with broader surveys (e.g., Masica’s The Indo-Aryan Languages), which recognize non-Indo-Aryan strata alongside Dravidian and Munda, while cautioning about attribution. 

2.2 Para-Mongolic in Inner Asia

For Inner Asia, evidence is richer thanks to epigraphy and Chinese transcriptions:

Khitan (Liao dynasty) and Tuyuhun are best treated as Para-Mongolic, not direct daughters of Proto-Mongolic—helping separate shared retentions from independent innovations within Mongolic vs. its sisters.  Comparative work (Janhunen; Vovin) and systematic corpora (Shimunek’s historical–comparative study of “Serbi-Mongolic”) refine segmental correspondences, numerals, and pronominal systems across Mongolic and its sisters. 

3) Methods: how “para-languages” are detected and modeled

Loanword stratigraphy in early texts Identify clusters of non-etymologizable vocabulary with shared phonotactics or morphology; map them to semantic fields typical of substrate borrowing (agriculture, tools, local ecology).  Onomastic archaeology Systematic analysis of personal and place-names in early sources to reveal phonological shapes alien to the recipient language.  Comparative reconstruction beyond the family Build correspondences among extinct or fragmentarily attested varieties (e.g., Khitan, Tuyuhun) to establish a higher-order node (Para-Mongolic) and distinguish inherited vs. contact features.  Contact diagnostics Look for typological “fingerprints” of contact (e.g., introduction of retroflexes into Indo-Aryan; prefixing tendencies; unusual consonant clusters).  Interdisciplinary triangulation Align linguistic inferences with archaeological horizons (e.g., BMAC for one Indo-Aryan contact layer), historical geography, and population movements; revise hypotheses as new epigraphic or computational findings arrive. 

4) What these cases teach us about language development

4.1 Contact can restructure phonology and morphology

The Vedic evidence shows that substrate interaction can introduce or strengthen features like retroflexion, alter morphological patterns, and seed specialized lexicon—changes later treated as “typical” of the family even though historically secondary. 

4.2 Family trees need sister branches, not only ancestors/descendants

Para-Mongolic demonstrates that what looks like “Mongolic” unity may mask a larger clade with extinct sisters. Recognizing a Para-Mongolic tier prevents misreconstructing Proto-Mongolic by excluding features that are actually shared at a higher node or due to contact. 

4.3 Areal layers can be stacked and diachronically staggered

Witzel’s mapping of multiple South Asian substrates implies serial layering—older systems (e.g., “language X” or Kubhā-Vipāś) overlaid by later Munda-like influence and then by Indo-Aryan. Sequence matters for explaining why some innovations cluster geographically or textually. 

4.4 Semantics of borrowing reflect ecology and economy

Substrate vocabularies concentrate in domains where incoming groups adopt local knowledge—crop varieties, tools, and environmental terms—while core grammatical vocabulary resists borrowing. This pattern recurs from Vedic India to Inner Asia. 

5) Analytical payoffs beyond the case studies

Sharper subgrouping: By separating family-internal innovations from sister-branch retentions or contact effects, we get cleaner phylogenies (e.g., within Mongolic).  Better reconstructions: Reassigning “odd” items to substrate layers prevents corrupting proto-language reconstructions with loans (e.g., Vedic agricultural terms).  Areal typology: South Asia’s classic “Sprachbund” features can be calibrated historically when we identify which strata contributed what, and when.  Historical geography: Onomastic and lexical distributions act as proxies for vanished populations and movements, complementing archaeology. 

6) Cautions and limits

Attribution risk: Some scholars contest the para-Munda label or particular etymologies; not every non-IE item is Austroasiatic, and some belong to other lost systems. Competing models (e.g., re-labelling the northwest substrate as Kubhā-Vipāś) show terminology is in flux and should track the evidence.  Circularity: Assuming a substrate to explain anomalies can become self-confirming; analyses must rely on replicable correspondences and independent diagnostics.  Patchy documentation: For Para-Mongolic, decipherment and interpretation of Khitan and Tuyuhun still evolve; conclusions should be framed as provisional and updated as corpora grow. 

7) Recommended workflow for “para-language” research

Corpus preparation Build lemmatized corpora of early recipient texts (e.g., Ṛgvedic, Atharvavedic) and align them with epigraphic datasets where possible. Annotate uncertain etymologies and semantic fields.  Phonotactic and morphological profiling Detect clusters with atypical phonotactics (e.g., complex clusters, retroflex–dental alternations) and morphological patterns (e.g., prefixing).  Comparative triangulation For Inner Asia, compare Khitan/Tuyuhun material with Mongolic, Turkic, Tungusic, and Old Chinese transcriptions; test whether similarities reflect inheritance (Para-Mongolic node) or contact.  Geolinguistic mapping Overlay lexical/onomastic findings on archaeological culture maps (e.g., BMAC for some Indo-Iranian interactions) and historical routes to propose movement/interaction scenarios.  Iterative hypothesis testing Treat labels (“para-Munda”, “Para-Mongolic”) as models, not facts; publish etymology lists with confidence scores and invite falsification.

8) Implications for broader linguistics and allied fields

For typology: Para-layers show how contact can package features (e.g., retroflexion, prefixing) into stable areal complexes without shared ancestry—crucial for avoiding false genealogical inferences.  For computational phylogenetics: Feature matrices should encode known substrate and sister-branch signals, or models will overfit family-internal trees. For epigraphy and historical studies: Identifying Para-Mongolic clarifies medieval Inner Asian polities’ language ecology (e.g., Liao, Tuyuhun realms), refining readings of titles, ethnonyms, and administrative vocabulary.  For South Asian prehistory: Distinguishing multiple substrate layers helps reinterpret migration narratives, settlement sequences, and the emergence of the South Asian linguistic area. 

9) Conclusions

Ancient “para-” languages are not mere footnotes; they are structural beams in the architecture of language history. Para-Munda sharpens our view of how Indo-Aryan was reshaped by older South Asian systems, while Para-Mongolic reframes the Mongolic family within a broader sisterhood and prevents mis-reconstruction of its proto-state. Methodologically, the payoff is twofold: cleaner genealogies and richer, testable narratives of contact and change. As epigraphic corpora expand and computational methods mature, these para-layers will become even more central to credible accounts of how languages develop and relate.

Key references (open-access where possible)

Witzel, Michael. “Substrate Languages in Old Indo-Aryan (Ṛgvedic, Middle and Late Vedic).” EJVS 5(1). (Foundational discussion of para-Munda/other substrates and their lexical domains.)  Witzel, Michael. “Autochthonous Aryans? The Evidence from Old Indian and Iranian Texts.” EJVS 7(3). (On substrate zones including Kubhā-Vipāś and cautions about ethnicity vs. language.)  Masica, Colin P. The Indo-Aryan Languages. (Survey with treatment of non-IA strata and South Asian areal phenomena.)  Hölzl, Andreas. “New Evidence on Para-Mongolic Numerals.” SUSA/JSFOu (2018). (Comparative evidence for Para-Mongolic and diffusion into neighboring families.)  Shimunek, Andrew. Languages of Ancient Southern Mongolia and North China. (Historical-comparative study of Serbi/Para-Mongolic corpora and their relation to Mongolic.)  Rykin, Pavel. “Mongolic Historical and Comparative Linguistics.” Central Asiatic Journal 64 (2021). (Overview situating Mongolic vs. Para-Mongolic debates.) 

Unknown's avatar

About nathanalbright

I'm a person with diverse interests who loves to read. If you want to know something about me, just ask.
This entry was posted in Graduate School, History, Musings and tagged , , , , . Bookmark the permalink.

Leave a comment