The Global Lexicostatistical Database: News and updates


NEWS

GENERAL INFORMATION

MISSION STATEMENT

GLD SPECIFICS

CONTRIBUTORS

PLANS

COLLABORATION

DOWNLOADS

 

10.30.2013. Two updates:

1) A new database in the North Caucasian section - Tsezic, for now, with only one language (Hunzib), soon to be expanded with more. Compiled and annotated by A. Kassian on the basis of both recent and older lexicographic material.

2) The Krongo-Kadugli database has been expanded with wordlists for the closely related Kadugli (proper) and Miri languages. Compiled and annotated by G. Starostin on the basis of field data published by Thilo Schadeberg and other authors.

 

10.19.2013. A wordlist for Kenuzi Nubian has been added to the Nubian database; this exhausts the list of all the languages in the Nile-Nubian subgroup. Compiled by G. Starostin.

 

10.13.2013. Two updates:

1) The Athapaskan database has been expanded with a new wordlist for the Tanacross language, compiled by A. Kassian.

2) The Sino-Tibetan section of the site has been expanded with a new database that contains five wordlists for three subdialects of Northern Tujia (Tasha Tujia, Duogu Tujia, Dianfang Tujia) and two subdialects of Southern Tujia (Boluo Tujia, Tanxi Tujia), based on fieldwork published in Chinese and European sources. Compiled by G. Starostin.

 

09.20.2013. Two updates:

1) Two more wordlists added to the Hmong database, for the Chuanqiandian Hmong and Diandongbei Hmong dialects, spoken in China. Compiled and annotated by G. Starostin on the basis of comparative Hmong-Mien lexical data, published in 1987.

2) Three wordlists added for different varieties of Miwok (Bodega Miwok, Central Sierra Miwok, Southern Sierra Miwok), based mostly on dictionaries and grammatical descriptions from the 1960s-1970s. Compiled and annotated by M. Zhivlov.

 

09.12.2013. A wordlist for the Khinalug language, compiled by A. Kassian based on F. Ganieva's dictionary and older sources, has been uploaded to the North Caucasian section of the site in its own database.

 

08.14.2013. Two updates:

1) A wordlist for the Katcha language (Krongo-Kadugli group, of disputable affiliation) has been added to the African section. Compiled and annotated by G. Starostin based on the published fieldwork of Thilo Schadeberg and Roland Stevenson.

2) A wordlist for the Klon language (Bring dialect), based on a variety of new and old sources, has been added by A. Kassian to the former Alor (now Alor-Pantar) database in the small New Guinean section of the site.

 

07.31.2013. Six new wordlists for different varieties of the Pomoan languages (Kashaya; Northern, Central, Northeastern, Southeastern, and Southern) have been added to the Pomo database by M. Zhivlov. The annotated lists are mostly based on Robert L. Oswalt's publications, with some additional sources also considered.

 

07.28.2013. It's been long in the making, but it's finally here: a reconstructed Swadesh wordlist for Proto-Yeniseian, compiled, annotated, and explained in detail by G. Starostin, based primarily on S. A. Starostin's Proto-Yeniseian reconstruction, but with numerous modifications through additional phonetic, semantic, and distributional analysis of cognate forms.

 

07.25.2013. The Nubian database is expanded with a list for Dongolawi (Dongolese) Nubian, culled by G. Starostin from Charles Armbruster's classic dictionary and cross-referenced with G. von Massenbach's earlier data.

 

06.10.2013. Large update:

1) A wordlist for the Dogrib language, compiled by A. Kassian, has been added to the Athapaskan database.

2) A wordlist for the Plains Miwok language, compiled by M. Zhivlov, has been added to the Miwok database.

3) The Hmong database has been expanded with a new wordlist for Qiandong Hmong, compiled by G. Starostin.

4) The Dravidian language family makes its first appearance in the GLD with a wordlist for Brahui, compiled by G. Starostin based on Denis Bray's classic dictionary.

 

05.11.2013. Two updates:

1) The Athapaskan database has been expanded with new wordlists for Central and Mentasta Ahtena dialects, compiled by A. Kassian on the basis of James Kari's dictionaries and additional sources.

2) The Benue-Congo section of the site now has a «Bantu-S» database with a wordlist for the Xhosa language, re-edited by G. Starostin from an older version by Ye. Chekmeneva. More Southern Bantu wordlists to be expected within the year.

 

04.28.2013. A wordlist for the moribund Konkow language has been added to the Maidu database. Compiled by M. Zhivlov.

 

04.17.2013. The Burushaski database has been completed with a wordlist for Hunza Burushaski (the third Burushaski dialect, Nagar, is not differentiated from Hunza on a lexicostatistical basis). Compiled by G. Starostin, based on H. Berger's data. Some inaccuracies in the Yasin wordlist corrected as well.

 

04.08.2013. A wordlist for modern Nobiin (= Fadidja-Mahas) has been added to the Nubian database, compiled by G. Starostin on the comparative basis of several recent and older sources.

 

03.28.2013. A wordlist for the nearly extinct Washo language isolate (sometimes tentatively grouped with Hokan, but such an affiliation is highly questionable) has been added to the American section of the site. Compiled by M. Zhivlov, based on William H. Jacobsen's research.

 

03.27.2013. The recent conference on «Comparative-Historical Linguistics of the XXIst Century», held in RSUH, Moscow, on March 20-22, features presentations from all the major contributors to the GLD project as well as numerous other specialists in the field. The program, materials, and even videos of the conference can be located on the «Meetings» page of the «Tower of Babel» site.

 

03.26.2013. The Athabaskan database (renamed from former Pacific Coast Athabaskan) has been expanded to include wordlists for four different dialects of the Tanaina language, based on dictionaries by J. Kari and other sources. Compiled by A. Kassian.

 

02.24.2013. Two more wordlists added to the Ekoid database, for the closely related Ekparabong and Balep dialects (extracted from D. Crabb's comparative wordlist).

 

02.13.2013. The first list for a «Nilo-Saharan» language uploaded today: Old Nubian, with 75 Swadesh items extracted from Gerald M. Browne's dictionary, opens the brand new Nubian database. Compiled by G. Starostin.

 

01.29.2013. Two updates:

1) The Hokan section of the site has been expanded with a database for the extinct Yana group, containing the wordlists for Northern Yana, Central Yana, and Yahi dialects, documented by E. Sapir in the early 20th century. Compiled by M. Zhivlov.

2) The Germanic database has been expanded with a wordlist for Old Norse, compiled by G. Starostin based on Cleasby's dictionary and cross-checked against earlier lists.

 

01.03.2013. A wordlist uploaded for the Yasin dialect of the Burushaski isolate, based on H. Berger's published materials (compiled by G. Starostin).

 

12.19.2012. Two new lists uploaded today:

1) A wordlist for the extinct Shasta language (of the small and completely extinct Shastan group), hypothetically belonging to the Hokan family; based on a selection of sources dating mostly to the 1950s / 1960s. Compiled by M. Zhivlov.

2) A wordlist for the click language Hadza, an isolate of Tanzania traditionally assigned to the "Khoisan" macrofamily, but without any sufficient basis. Compiled by G. Starostin mostly on the basis of relatively recent fieldwork by B. Sands, but adding data from numerous older sources as well. With the addition of this wordlist, all of the "Khoisan" languages / dialects for which sufficient amounts of data have been attested are now properly represented in the GLD, without a single exception.

 

12.14.2012. A wordlist for the isolated (possibly distantly related to the Central Khoisan family) language Sandawe, based on recent fieldwork publications as well as adding comparative data from several earlier sources, has been uploaded.

 

12.12.2012. The Pacific Coast Athapaskan database has been expanded with a wordlist for Taldash Galice, an extinct dialect, data on which was collected by H. Hoijer and H. Landar from the last living speaker in the 1960s-1970s.

 

12.04.2012. A small list for the extinct Kwadi language uploaded in the Central Khoisan section. Unfortunately, only a little over 50% of the entries could be filled in due to the extreme scarceness of data; nevertheless, the list was still included due to the importance of this link for Khoisan studies.

 

12.02.2012. Four lists altogether uploaded on this day — all of them, incidentally, on languages that are no longer living:

1) The Pacific Coast Athapaskan database has been expanded with a wordlist for the extinct Kato, based mainly on P. E. Goddard's fieldwork.

2) A new database on the Chimariko isolate, with data mainly taken from E. Sapir's field notes, added to the Hokan section.

3) The Yeniseian database is finally completed (except for the proto-wordlist) with lists for the long-extinct Arin and Pumpokol (the latter with some significant gaps due to scarceness of data), constructed from available XVIIIth century sources.

 

11.17.2012. Two updates:

1) The Pacific Coast Athapaskan database has been expanded with a wordlist for Mattole, based primarily on Li Fang-kuei's description from the 1930s.

2) The Kalahari Khoe database has been expanded with a wordlist for the Hiechware database, based on S. S. Dornan's old description. This completes the database as far as all attested dialects, apt for lexicostatistical analysis, are concerned.

 

11.14.2012. The Lezgian database has finally been capped off with a wordlist for Proto-Lezgian, reconstructed based on GLD standards by A. Kassian, with extensive notes justifying the details.

 

11.07.2012. Two updates:

1) The Yeniseian database is expanded with a wordlist for the long-extinct Kott, compiled mostly from M. Castrén's data, originally published in 1858, with the addition of materials from even earlier sources.

2) The Yuman database has been expanded with wordlists for Yavapai and Jamul Tiipay, compiled from several recent sources on these languages.

 

10.11.2012. Large update to the former West Khoe database, now retitled "Kalahari Khoe" and including seven more wordlists on minor Khoe languages of Botswana: Cara, ǀXaise, Danisi, Ts'ixa, Deti, Kua, Tsua. All the data have been extracted from publications based on fieldwork carried out by R. Vossen in the 1980s.

 

09.26.2012. Another wordlist added to the Ekoid Bantu database: this time, for the Ekajuk language, with limited information on dialectal variety.

 

09.25.2012. The Cocopa list has been added to the Yuman database (Hokan family).

 

08.21.2012. Three more wordlists added to the West Khoe database, for the ǂHaba, ǀGui, and ǁGana languages of Botswana.

 

08.09.2012. First two wordlists added to the Coast Salish database: Upriver Halkomelem and Island Halkomelem Salish, based on recent comprehensive dictionaries for these dialects and personal communication with the authors. Both wordlists have been compiled by Elena Barreiro, our newest contributor; more Salish data are expected in the near future.

 

08.03.2012. After a month-long break, finally the next update: a wordlist for the extinct Yugh dialect (closely related to Ket), based on H. Werner's and earlier sources and including comparative notes on Common Ket-Yugh. A few mistakes corrected in the proper Ket section of the database as well.

 

06.30.2012. Last couple of updates for June:

1) A wordlist for the Lezgi language (Gyune dialect), along with comparative notes on numerous other Lezgi dialects based on a variety of sources;

2) A wordlist for the Naro language (West Khoe subgroup), based on two dictionaries and R. Vossen's comparative notes.

 

06.21.2012. Another Hmong-Mien update: a list for Hmong Daw (White Hmong) has been added, based on E. Heimbach's detailed dictionary of this language.

 

06.09.2012. A new list added for the Tol (Eastern Jicaque) language (Jicaquean group, possibly of the Hokan family).

 

06.07.2012. Another update in the Lezgian group database: this time, with two wordlists for two different dialects of the Tabasaran language (Northern and Southern), based on a variety of old and recent sources.

New feature: The «Language Comparison» option on the main page of the site («Lists for specific languages») has been significantly upgraded. It is now possible not only to view any two different wordlists for any two languages side by side, but also to highlight phonetically similar forms betwe­en them (similarity is determined based on the same algorithm as the «objectively generated tree of lexical similarity», see here for details). This is particularly useful for determining the quantitative differences between the numbers of accidental look-alikes on lexicostatistical lists and the ave­rage numbers of true cognates that still preserve archaic phonological shapes.

 

06.01.2012. Another Khoisan update: new list uploaded for the Kxoe (Khwe) language (Central Khoisan family), based on a recent dictionary by Christa Kilian-Hatz and older works by O. Köhler.

 

05.26.2012. A new list added for the Highland Oaxaca Chontal language (Tequistlatecan group, possibly of the Hokan family).

 

05.12.2012. Two new lists added to the Ekoid database: Nkum and Nnam.

 

05.06.2012. The Lezgian group database has been expanded with five wordlists for five different dialects of the Aghul language (Keren, Koshan, Gequn, Fite, and Aghul proper), based on a variety of old and recent sources.

 

05.02.2012. Uploaded a list for the extinct Khoekhoe language !Ora, drawn from two short vocabularies published in 1920 and 1930. This comple­tes the Khoekhoe database, since not enough data are available on the remaining extinct members of the group to perform proper lexicostatistics.

 

04.29.2012. A list for the Gothic language, compiled with ample references not only to dictionaries, but to the existing text corpus as well (Ulfilas' Bible), initiates the new database for Germanic languages.

 

04.10.2012. A big day for updates: [1] The Lezgian group database has been expanded with three wordlists for three different dialects of Rutul (Mukhad, Ixrek, and Luchek), based, as usual, on a mix of old and recent sources. A few updates to other Lezgian wordlists have also been made.

[2] A list for Maidu (Maiduan group, Penutian family) has been compiled, based on F. Shipley's dictionary (1963).

[3] The Khoekhoe lexicostatistical database is initiated with a wordlist for Nama (Khoekhoegowab), compiled from the recent highly informative dic­tionary by W. Haacke & E. Eiseb (with references to the older Krönlein-Rust dictionary as well).

[4] Finally, the Sinitic database has been expanded with a list for Standard Chinese (= Pǔtōnghuà or Standard Mandarin). With the aid of the ac­companying comments, it is now possible to trace, in details, the evolution of Chinese basic lexicon from Early Zhou (XI-VIIIth centuries BC) to mo­dern times, by following the database.

 

03.26.2012. Two updates: [1] The Taa database (Peripheral Khoisan family) is completed (except for the proto-list) with a wordlist for Nǀuǁen (Nǀusan), extracted, like the list for Kakia, from D. F. Bleek's semi-reliable materials. It is the third and last dialect of Taa for which enough data exist to make it suitable for lexicostatistical purposes.

[2] The Hmong-Mien languages make their first appearance on the GLD site with a list for Xiangxi Hmong, based on cross-examination of data from one general and one comparative lexicographical source. New wordlists for other varieties of Hmong may be expected before the year is out.

 

03.13.2012. The Taa database (Peripheral Khoisan family) is expanded with a wordlist for Kakia (Masarwa), extracted from D. F. Bleek's materials: not a very reliable source, but the only one that exists for this presumably extinct dialect.

 

03.05.2012. Two new lists added to the Ekoid database: Nselle and Nta (dialects of the Nde-Nselle-Nta cluster; lexicostatistics based on available data shows practically no lexical discrepancies between the three).

New feature: The "Build a tree" procedure on the site now includes the option "Show lexicostatistical matrix", which yields all cognacy percentages between the languages in the database in the form of a standard table (which can be easily copy-pasted into a document).

 

02.29.2012. The Lezgian group database has been expanded with three wordlists for three different dialects of the Tsakhur language (Mishlesh, Mikik, and Gelmets), based on a variety of old and recent sources. A few updates to the Budukh wordlist as well.

 

02.28.2012. The database for Taa (one of the two branches of South Khoisan, along with !Kwi) is initiated with a wordlist for !Xóõ, the only sur­vi­ving member of this branch, extracted from the extensive dictionary by Anthony Traill and properly annotated.

 

02.23.2012. Two new wordlists added to the West Caucasian database: one for the «literary» Abzhuwa dialect of Abkhaz and a different one for the Bzyb dialect, although no lexical discrepancies in the 100-wordlist have been elicited (there are, however, significant phonetic differences between the two dialects).

 

02.21.2012. A new Hokan list uploaded, this time, for the moribund (if not already extinct) Eastern Pomo language of the Pomo group, based on data published by Sally McLendon.

 

02.13.2012. The !Kwi group (Peripheral Khoisan family) database has been expanded with a wordlist for the extinct language ǀHaasi (based on one single known recording, made by R. Story in 1937).

 

02.05.2012. New wordlist edited and uploaded for the Ket language, the only survivor of the Yeniseian family, based on data from G. Werner's dic­tionaries, with references to M. Castrén's earlier data from the XIXth century.

 

01.19.2012. The !Kwi group (Peripheral Khoisan family) database has been expanded with a wordlist for the extinct language ǀʼAuni (based on data collected by D. F. Bleek in 1911 and 1936).

 

01.17.2012. The Lezgian group database has been expanded with a wordlist for the Budukh language.

 

01.13.2012. A new wordlist edited and uploaded for the extinct Abipon (Guaicuruan group, presumably Mataco-Guaicuruan family), based on two vocabularies compiled in the late XVIIIth century.

 

01.07.2012. The !Kwi group (Peripheral Khoisan family) database has been expanded with a wordlist for the extinct language ǁXegwi (based on data mostly recorded in the 1950s).

 

01.02.2012. Two new databases added: (1) A wordlist for Seri (Seri group, presumably Hokan family); (2) Five new wordlists for the Lezgian group (North Caucasian family): Udi (Nidzh and Vartashen dialects), along with additional notes on Common Udi; Archi; Kryts (two dialects - Kryts «proper» and Alyk).

 

12.31.2011. We now have a Facebook group for The Global Lexicostatistical Database. Please join for quicker updates!

 

12.31.2011. The Sinitic group (Sino-Tibetan family) wordlists have been expanded by a list constructed for Late Middle Chinese on the basis of the (semi-)vernacular document, The Record of Linji (≈ IX-X centuries A.D.).

 

11.10.2011. The Ekoid group (Benue-Congo family) database has been expanded with lists from two more languages: Efutop and Nde.

 

10.31.2011. A list for the ancient extinct Hurrian language (Hurro-Urartian group and family) has been uploaded (unfortunately, only 66 out of 110 Swadesh meanings are recoverable from known sources).

 

10.24.2011. Lists for Abé and Abidji (Agneby group, Kwa family) have been uploaded in all formats.

 

10.20.2011. After more than a year in the making, the GLD finally goes public — with 67 different annotated Swadesh wordlists and 2 recon­st­ruc­ted proto-wordlists from 29 language groups of Eurasia, Africa, America, and Papua. New updates coming soon!

 

BACK TO MAIN PAGE                                   DATABASE LIST                              RUSSIAN VERSION

 

     © 2011 George Starostin (site design, data input coordination)
    © 2011 Phil Krylov (programming, technical support)