A new study from Wits University challenges the idea that all South-Eastern-Bantu speaking groups are a single genetic group.
The South-Eastern-Bantu (SEB) language family includes isiZulu, isiXhosa, siSwati, Xitsonga, Tshivenda, Sepedi, Sesotho and Setswana. Almost 80% of South Africans speak one of these as their first language and their origins can be traced to West-Central Africa farmers whose descendants over the past two millennia southwards, finally reaching Southern Africa.
Since then, settling and population movements and interaction with Khoe and San communities, as other SEB speakers, ultimately resulted in distinct Southern African languages such as isiZulu, isiXhosa and Sesotho. But despite these linguistic differences, these groups of people are treated as one group in genetic studies.
Genetic disease studies rely on understanding the genetic diversity of population. If two genetically distinct populations are treated as one, errors could occur when finding disease genes, especially for complex diseases like hypertension and diabetes.
Dr Dhriti Sengupta and Dr Ananyo Choudhury in the Sydney Brenner Institute for Molecular Bioscience (SBIMB) at Wits University were joint lead authors of the paper published in Nature Communications.
“South Eastern Bantu-speakers have a clear linguistic division – they speak more than nine distinct languages – and their geography is clear: some of the groups are found more frequently in the north, some in central, and some in southern Africa. Yet despite these characteristics, the SEB groups have so far been treated as a single genetic entity,” said Dr Choudhury.
The study found that SEB speaking groups are too different to be treated as a single genetic unit.
“So if you are treating say, Tsonga and Xhosa, as the same population – as was often done until now – you might get a completely wrong gene implicated for a disease,” said Sengupta.
The study aimed to find out whether the SEB speakers are indeed a single genetic entity or if they have enough genetic differences to be grouped into smaller units.
Genetic data from more than 5000 participants speaking eight different southern African languages were generated and analysed. Participants were recruited from research sites in Gauteng, Mpumalanga, and Limpopo province.
The study detected major variations in genetic contribution from the Khoe and San into SEB speaking groups; some groups have received a lot of genetic influx from Khoe and San people, while others have had a very little genetic exchange with these groups.
This variation ranged on average from about 2% in Tsonga to more than 20% in Xhosa and Tswana, suggesting that SEB speaking groups are too different to be treated as a single genetic unit.
“The study showed that there could be substantial errors in disease gene discovery and disease risk estimation if the differences between South-Eastern-Bantu speaking groups are not taken into consideration,” said Dr Sengupta.
The genetic data also show major differences in the history of these groups over the last 1000 years, with genetic exchanges occurring at different points in time. These genetic differences are distinctive enough to affect the outcomes of biomedical genetic research.
Dr Sengupta cautioned that ethnolinguistic identities are complex and broad conclusions extrapolated should not from the findings regarding genetic differences.
“Although genetic data showed differences [separation] between groups, there was also a substantial amount of overlap [similarity]. So while findings regarding differences could have huge value from a research perspective, they should not be generalised,” she said.
A common approach to identify if a genetic variant causes or predisposes a disease is to compare occurrence of many genetic variants in individuals with a disease (eg, hypertension or diabetes) against healthy individuals. If there is a difference in frequency in a variant between two sets, the genetic variant is assumed to be perhaps linked to the disease.
“However, this approach depends entirely on the underlying assumption that the two groups consist of genetically similar individuals. One of the major highlights of our study is the observation that Bantu-speakers from two geographic regions – or two ethnolinguistic groups – cannot be treated as if they are the same when it comes to disease genetic studies,” said Dr Choudhury.
Future studies, especially those testing a small number of variants, need to be more nuanced and have balanced ethnolinguistic and geographic representation, he said.
Professor Michèle Ramsay, director of the SBIMB and corresponding author of the study, says: “The in-depth analysis of several large African genetic datasets has just begun. We look forward to mining these datasets to provide new insights into key population histories and the genetics of complex diseases in Africa”.
Source: Wits University
Journal reference: Sengupta, D., et al. (2021) Genetic substructure and complex demographic history of South African Bantu speakers. Nature Communications.doi.org/10.1038/s41467-021-22207-y.