Abstract

The coordinated development of ethnic languages has always been an important task in the construction of ethnic culture. The rapid social and cultural innovation has had a certain impact on the language views of various ethnic groups, and the phenomenon of ethnic languages being endangered has become more and more serious. In order to alleviate this phenomenon, it is urgent to analyze and study the coordinated development of languages in multiethnic areas. However, in view of the current field of language development analysis, the analysis methods used are often traditional field analysis and manual calculation, which not only affects the efficiency of analysis work but also makes it impossible to conduct in-depth research. This paper studies the coordinated development of languages in multiethnic areas in combination with big data algorithms. On the basis of fully understanding the factors affecting language development, the cluster analysis in big data algorithms is used to study the language development of a multiethnic area. The experimental results show that the proportion of bilingual and multilingual people in this area is 27.01% and 20.11% in formal occasions. Combined with the actual investigation, the accuracy of these data is high. The popularization of big data algorithms in the current stage of language coordination development analysis can effectively eliminate interfering factors, make accurate judgments based on the most relevant information and data, and provide certain decisions and suggestions.

1. Introduction

With the rapid improvement of economic level and people’s material living standards, the process of social development is also accelerating. Modern civilization has undergone earth-shaking changes, and the development of traditional national culture has been impacted. Especially in multiethnic and multilanguage coexisting areas, the problem of incompatibility of cultural development has always existed. However, as a multiethnic country, the coordinated development of ethnic languages is an effective guarantee for social and cultural prosperity. Only by properly handling the relationship between the common languages of all ethnic groups can we promote the people of all ethnic groups to continuously exert the knowledge and charm contained in the language, making the languages of ethnic minorities attract widespread attention in order to avoid the path of decline. At this stage, most of the research on the coordinated development of languages in multiethnic areas uses traditional methods to analyze from theory or language. Although this method has profound academic value and significance, when facing the long history and complex and changeable language environment, the traditional methods cannot deeply explore the laws of language development and understand the various influencing factors of language development. In addition, the efficiency of research work is very important in the face of the urgent form of national language endangerment. Therefore, it is necessary to study the coordinated development of languages in multiethnic areas in combination with science and technology.

The development of today’s era is centered on scientific and technological strength. In the continuous progress and improvement of science and technology, big data algorithms have been continuously developed and researched and have exerted their own application value in many fields. For example, it can be seen in the fields of aerospace, communication, logistics, finance, and computer control. It promotes the continuous transformation and upgrading of various industries and fully caters to the characteristics of the intelligent era. It not only changes the inherent way of production and life but also facilitates people’s life. Applying it to the research on the coordinated development of languages in multiethnic areas can improve the quality and effect of analysis work and provide reliable technical support for the sustainable development of ethnic languages and cultures.

This paper uses big data algorithms to analyze multiethnic areas. The analysis results show that more than 50% of the three ethnic groups in each age group have a language ability of normal or above, and they use bilingualism in informal situations. The proportion of communicating with people in multiple languages is 18.97% and 13.22%, respectively. The proportion of bilingual or multilingual communication in formal occasions is 27.01% and 20.11%, and it generally takes 2 to 3 years to learn other national languages and reach the normal or proficient level, indicating that the ethnic language environment in the region is relatively good. However, it will take some time to realize the coordinated development of language.

In recent years, many scholars have conducted research on the development of ethnic languages. Donskikh believed that an absolute prerequisite for the emergence of scientific national languages was the coordination of national languages. The development of national languages not only greatly increased the number of readers belonging to the middle class but also promoted the prosperity and development of national literature [1]. Owojecho explored the adverse effects of poor implementation of language policy initiatives in multiethnic countries on the coordinated development and survival of ethnic languages. He also revealed the unhealthy attitudes of many groups towards the sustainability of ethnic languages [2]. Fatykhova et al. discussed the popularization and development of ethnic languages and ethnic media through quantitative analysis of Tatar-language mass media in Russia and the Republic of Tatarstan [3]. Iren and Rose studied the influence of multiethnic language development on school curriculum and collected data through questionnaires and interviews in order to understand language preferences in multiethnic areas [4]. After in-depth research by countless scholars, ethnic languages have developed into a relatively mature social and cultural analysis at the current stage. However, with the continuous development of science and technology, the analysis level of the coordinated development of languages in multiethnic areas has also been continuously improved. The demand for ethnic language analysis aided by big data algorithm technology has been significantly increased.

In order to gain an in-depth understanding of big data algorithms, this paper explores its related application research. Tawalbeh et al. discussed the application value of big data analytics in networked healthcare. They described a mobile cloud computing infrastructure for healthcare big data applications and concluded that big data algorithms can be used to design networked healthcare systems [5]. Bo et al. conducted an analysis case study of an intelligent pipeline monitoring system based on fiber optic sensors and big data algorithms to detect events threatening pipeline safety, and they built a working prototype to experimentally evaluate the event detection performance [6]. Zhang proposed an efficient big data analysis method for high-speed train control system based on fuzzy RDF model and uncertain reasoning and then analyzed the method. Finally, the experimental results showed that the big data analysis proposed by him had good efficiency and scalability [7]. Ranjan et al. believed that big data algorithms had superior analytical capabilities and usability and could access different data sources. They considered that it was possible to successfully make informed decisions through big data analysis [8]. These studies have carried out a good analysis of big data algorithms, but due to the rapid development of the times, previous studies cannot achieve good results in terms of analysis accuracy and breadth. There are not many studies that integrate it with the development of ethnic languages, so the application of big data algorithms to analyze the coordinated development of languages in multiethnic areas is urgent.

3. Coordinated Development of Languages in Multiethnic Areas

3.1. Language Coordinated Development Environment in Multiethnic Areas and Influencing Factors

The complex diversity of natural habitats and human backgrounds will inevitably attract ethnic groups from different cultural systems with different cultural precipitations to settle and live in the region [9]. It is not only a dynamic factor reflecting the complex and diverse living backgrounds and the diversified development and utilization of natural resources but also an important restricting factor for the harmonious development of the relationship between various ethnic groups and language ecology. In today’s multiethnic inhabited areas, various resource utilization methods such as middle farming, forestry, animal husbandry, and hunting can coexist. The development foundation of industrial and mining, transportation, hydropower, and other secondary industries is very solid. This is not only the result of the coordinated efforts of all ethnic groups but also the harmonious development of ethnic relations, group relations, and livelihood relations, as well as the foundation for sustainable development of multiethnic area planning, which constitutes the realistic needs of all ethnic groups to seek common survival and common prosperity.

In multiethnic areas, the use of language by the masses varies greatly due to various factors. This difference is manifested not only in everyday language but also in more formal occasions, such as the language of work and life and the language of study and life. This also results in differences in the use of multiple languages for communication among people who use different language categories, which is the phenomenon of bilingualism or multilingualism that is common in ethnic groups. There is no equivalence for different language groups, which in most cases is a nonreciprocal bilingual or multilingual phenomenon. For example, in Dafang County, where many ethnic groups live together, the residents of the relatively small populations of Lao, Bouyei, and Cai people generally speak Chinese while using all or part of their own language, and some also speak Yi and Miao. Many Han, Yi, and Miao residents have little or no proficiency in the aforementioned minority languages. The differences in the uses of national cultures and national languages, as well as the differences in the communication methods of multiple languages, have also gradually formed hierarchical characteristics in the development of national languages in the long history.

It is strongly related to the differences in the use of multilanguage mixed communication methods in ethnic inhabited areas, which are the hierarchical characteristics of ethnic languages in terms of social and cultural functions [10]. Each ethnic group has formed a variety of language ecological status quo on the basis of hierarchical characteristics. In previous studies, the linguistic ecology status of multiethnic groups is often divided by the scale of settlement and the proportion of the population using ethnic languages, mainly including three categories, as shown in Table 1.

The linguistic ecological distribution and characteristics of these three situational regions are summarized as follows.

3.1.1. Relatively Inhabited Areas of a Single Ethnic Group

In the settlement area where a single ethnic group is relatively concentrated, the language of the ethnic group has a great mediating role in the process of communication and interaction among the masses, which plays an important role. It can become the only media tool even in special times. In communication, Chinese, as the most common and widely used language, certainly has extremely high value and significance. But overall, the use of ethnic languages still dominates in smaller and relatively concentrated multiethnic areas.

In these areas, the minority population occupies an absolute advantage, and there is a living pattern of large mixed settlements and small settlements. The relative concentration of ethnic minorities is an important condition for the preservation of the national mother tongue. On this basis, almost everyone who grew up in the settlement area can use and understand the language of their own nationality proficiently. Even foreign people, as long as they live here, will understand the language of the ethnic group to a certain extent because of the necessity of environment and communication. Then, they can even use the language of this ethnic group to interact and communicate with others in social situations. For example, the Han and Bai nationalities in Bandi, Datun, and other places are all fluent in the local Yi and Miao languages to varying degrees, and some ethnic minorities in some areas even gave up their own languages and switched to Yi and Miao languages.

3.1.2. Multiethnic Scattered Areas

In some areas where many ethnic groups live, although the people’s understanding and use of the language of their own ethnic group is weaker than that in areas where a single ethnic group is relatively inhabited, most people can still say that they are relatively familiar with the ethnic language. In some relatively backward and remote villages, there are also people who can only communicate in their own language. In the early stage of children’s growth education, the national language is mainly used as a communication tool. But with the growth of children and the edification of education, most children will be more or less exposed to Chinese in the process of learning the national language, so they are proficient in multilingual mixing in daily life and study life.

3.1.3. Multiethnic Areas

In multiethnic areas, some people already use Chinese as the main language of communication, and they almost no longer use the language of their own ethnic group. In some residential areas with more advanced cultural integration, even older residents are not familiar with their own language. They do not know or even do not use it. The reasons for this situation can be roughly divided into two categories. The first reason is that they belonged to the Han nationality or other ethnic groups long before the settlement, while after the settlement they changed their national attributes in order to adapt to the local life. The second type of reason is also the most common reason, that is, they are the people of this ethnic group, but because of cultural integration and economic development, they gradually use Chinese as a medium in their development. After a long period of time, they eventually no longer use their own national language, resulting in a lack of understanding or inability to use the national language. Certainly, it is not in most cases that the elderly residents do not use or understand the national language at all. Most of the elderly can understand or use the national language to communicate and express the most basic, but this situation can already be judged that the national language is endangered.

Under the influence of economic and social factors, in some multiethnic areas, the language usage of many residents has changed. In many informal situations, the national language is rarely used, and even in the education of children’s growth, the national language is no longer used. The scope of using Chinese for communication has gradually expanded, and the number of ethnic language users has gradually declined. In some villages, as early as the 1950s, the phenomenon of national language endangerment began to appear, and it has developed till today, becoming a seriously endangered area of national language.

These endangered areas often communicated in a bilingual form in the early stage and gradually developed into a single Chinese language in the later stage. For example, in the endangered area of the representative ethnic language of the Red Fengluo ethnic group, the language of this ethnic group has been continuously divided by the impact of the times. Up to now, it can only survive in the memory of a very small number of seniors, and these seniors can only recall the most basic words or sentences, and they hardly use their native language to communicate in daily communication. In areas where other ethnic languages are well preserved and endangered, there are still some elderly people who use their native language to communicate with their peers. They only use their native language to carry out some simple communication but cannot carry out complex expressions.

Through some research, it can be concluded that the laws of language selection and language use in multiethnic areas are as follows:(1)Differences in conversation content affect language choice and use.In ethnic-inhabited areas, it is more convenient for people of all ethnic groups to use Chinese dialects to express in communication because Chinese dialects cover a wide range of concepts, making Chinese dialects gradually become the preferred language for people to communicate [11]. Even in villages where the native language of ethnic minorities is dominant, in the communication with relatives, friends, children, etc. of the ethnic group in private occasions, there is a phenomenon that the use of Chinese dialects is relatively high due to the content of the conversation. For example, when talking about national events or discussing what happened in the village and nearby villages, some pronunciations are difficult to express in their native language, but some words and sentences or words that can be replaced by their native language, they will use their native language to communicate. Chinese dialect is used in such process. Therefore, the frequency of use of Chinese dialects has increased significantly.(2)Language level of the conversation object affects the language choice of both parties.Compared with the use of Chinese dialects in the context of communication with most other ethnic groups, the correlation between the language choices of residents in multiethnic settlements and the language level of the conversation objects is more obvious. Because their own nations, friends, children, children, parents, etc. have strong mother tongue expression ability, the frequency of native language use of their nation will increase significantly when communicating with them. When communicating with other ethnic groups, 80% of people who are accustomed to using Chinese dialects will choose to use Chinese dialect to communicate. The mastery of language has a great relationship with the production and living environment in which it is located.

The cultural background of each nation is different from the historical development they experienced, creating a difference in their language concept. When suffering from the impact and challenge of foreign language and culture, even if you live in the same settlement area, you will make different choices. On this basis, we can also know that different ethnic groups living in the same area will also exist in different cultural development speeds and forms of existence. This is caused by the common role of many factors. Among them, the human and historical differences of the past of various nations are considered the most important factor. In addition, the role of the real environment and other subjective factors is also certainly important. For example, no matter how the times develop or progress, the attitudes and ways of people of all ethnic groups towards their own language and culture are always an important factor affecting the coordinated development of national languages.

3.2. Big Data Algorithm

Regarding the analysis of big data algorithms, it must first start with definition. Big data algorithm analysis is carried out in a large amount of sample information environment. These data and information are rich but not cumbersome, extensive, and reliable sources, and there are rules and associations between data. This article mainly introduces the two algorithm analysis theories of cluster analysis and association analysis in its classification.

3.2.1. Cluster Analysis

The clustering analysis algorithm refers to the definition and classification of each type of information contained in the sample information environment of the big data [12]. Then, the defined various data will be processed and analyzed. Data visualization or transformation is transformed by a special standard. As shown in Figure 1, it can be extracted in the information data with the same attributes, and it can be reliable to derive at one time to form a scientific basis for decision making. It can also organize organic fusion with the correlation algorithm across categories, to the greatest extent with complicated and internal rules or contact information, and then convert it into a higher level [13].

Its criteria function is expressed as [14]where is the sum of the mean square deviations of all objects in the dataset; is the data object; and is the mean of cluster .

The Euclidean distance formula is expressed as [15]

The commonly used similarity coefficients have the angle of the string method and the correlation coefficient method, which are represented by formulas (3) and (4) [16]:

Error square and standard are the most commonly used standard functions, which are indicated as [17]

In formula (5), as shown in formula (6) [18].

In the execution step, firstly, K objects are randomly selected from the dataset as initial cluster center points, and then the dataset is preliminarily divided. The division rule is to calculate the distance of each object to each cluster center, the cluster center points are , and then it is divided into the class with the closest distance. Finally, the center point of each cluster is calculated aswhere is the number of data objects in class .

The algorithm flowchart is shown in Figure 2.

In the determination of feature weights, it can be assumed that the attribute feature weight value assigned to the j-th dimension of the data is expressed as [19]where is the basis for measuring the importance of the attribute, and it is the ratio of the inter-class distance and the intra-class distance of the attribute:

The sum of the intra-class distances of all clusters on the -th attribute is , where is the mean of cluster on the -th attribute [20]:

The sum of the inter-class distances of all clusters on the -th attribute is , where is the mean of cluster on the -th attribute [21]:

The weighted Euclidean distance formula is derived according to the modified Euclidean distance formula, as shown in the following formula [22]:

The ratio of the intra-class distance to the inter-class distance is used as the criterion function for clustering, as shown in the following formula:

The smaller the value of is, the better the clustering effect will be. One of the ways used to compare the quality of clusters is to calculate the size of the criterion function. The purpose is to find the clustering result that minimizes the value, which is the optimal one.

The intraclass distance of the dataset is 1, as shown in Figure 3. The minimum value of the average distance between any object in a cluster and all other objects in the same cluster is taken as the intraclass distance of the cluster [23, 24]. The intra-class distance of the entire sample data is the maximum value of the intra-class distance of all subsequent data [25]:

It can be seen from the criterion function that the smaller the intra-class distance is, the smaller the is and the smaller the value will be. If all types of maximum internal distances meet the requirements of intra-class tightness, then other clusters must also meet the requirements.

The inter-class clustering of the dataset, as shown in Figure 4, marks the inter-class distance with the distance between the two nearest data objects belonging to different clusters; then [26]

According to the value of the criterion function, it can be known that the larger (k) is, the smaller the value of J is. If the minimum distance between any two classes satisfies the evacuation requirement, the other classes must also satisfy the requirement.

High density collection is expressed as

In formula (16), represents the density at the sample point . Among them, represents the weighted Euclidean distance, that is, the density at the sample point represents the number of sample points whose weighted Euclidean distance to point is less than or equal to the specified radius in the entire dataset. The value of is related to the average distance between the two data objects in the dataset.

It is supposed that the number of objects in the high-density set is , and now select the points in the high-density area that are farthest apart as the initial center points. The data point with the highest density in the set is selected as the first cluster center.

After the first initial cluster center is selected, the second cluster center is the data object farthest from the first cluster center in the high-density point set. Then, the third cluster center is picked, as shown in Figure 5.

The feature weight value of the target data object is initialized as [27]

According to the principle of hierarchical agglomerative clustering, the two clusters with the closest inter-class distance are merged, and the centroid of the newly synthesized cluster is marked as the average of all objects in the two clusters, which is expressed as [28]

If it is not less than , the number of clusters at this time is the optimal number of clusters , and the corresponding division is the optimal clustering result.

3.2.2. Association Analysis

The actual meaning of the so-called correlation analysis refers to the analysis method of data correlation obtained by trying to carry out correlation analysis from different angles and comprehensively judging seemingly unrelated data or information. As shown in Figure 6, through associating different types and different levels of information, the clustered data and between different categories can be more closely related, and it is easier to provide data analysts with a reliable source of reference data information, which saves time spent on complicated data analysis processes.

4. Big Data Algorithm Experiment

This paper uses the cluster analysis in the big data algorithm to analyze the language coordination development in a multiethnic inhabited area. The ethnic attributes of the inhabited people in this area were divided into three types: Miao, Dai, and Tujia. The languages of the three ethnic groups are relatively intact, and the proportion of the population is relatively uniform. There were 54 Miao people, 61 Dai people, and 59 Tujia people, with a total of 174 people. The language use ability of these 174 people was divided into four levels: proficient, normal, slightly understood, and poor, according to the most commonly used listening and speaking abilities in daily communication, as shown in Table 2. The experiment used the algorithm to carry out statistics and analysis from the characteristics of residents’ language use and the multilingual understanding approach, as shown in Figures 7 and 8.

It can be seen from Table 2 that there were 49 Miao people with normal or above language ability, accounting for 90.74%. There were 56 Dai people with normal and above language ability, accounting for 91.80%. Totally 54 people, accounting for 91.53%, have language ability at the normal level or above, which indicates that the language ability of multiethnic people living in the same settlement is generally high, and the experimental validity of this paper is strong.

4.1. Analysis of Language Use Characteristics

The language use characteristic analysis experiment in this paper was mainly based on the language use characteristics of different age groups (5–20 years old, 21–40 years old, 40–59 years old, and 60 years old and above) and different educational levels (illiterate, elementary to high school, college, and undergraduate and above) of people in multiethnic areas. The distribution of the number of experimental samples is shown in Tables 3 and 4, and the analysis results are shown in Figures 7 and 9.

It can be seen from Figure 7 that more than 50% of the Miao, Dai, and Tujia people in each age group have the ability to use language at the normal level or above. With the increase of age, the ability to use language has also been improved to a certain extent. On the whole, the retention of ethnic languages in the environment of the entire settlement is ideal, which is one of the reasons why language skills of all age groups are better. In addition, in the younger group, this phenomenon also did not degenerate, which also shows that whether the national language can develop harmoniously is related to the whole language environment.

From Table 4 and Figure 9, it can be seen that the education level of the people in the entire settlement area is relatively high, and a large part of the proportion of illiterate people is because children do not meet the age of school education, so they are included in this category. The rest are older seniors who have no formal education due to historical development and economic conditions. However, it can be seen that there is no direct correlation between language proficiency and educational attainment. Even among the groups with the lowest or highest educational level, the difference in the proportion of people with ethnic language proficiency above the normal level is not very large. The proportion of the population with the lowest education level among the three ethnic groups who can use the ethnic language normally or proficiently is 50%, 67%, and 60%, respectively, while the most educated group among the three ethnic groups can use the ethnic language normally or proficiently, the proportion of which is 50%, 33%, and 40%. The ratio here is slightly lower because the more educated groups have been using Chinese for a long time due to learning reasons, so their ability to use their own language will be relatively weak.

4.2. Multilingual Usage

The analysis of multilingual usage mainly focuses on the usage of ethnic languages in the inhabited areas (national language, Chinese, bilingual, and multilingual) and the time it takes to learn other ethnic languages (less than half a year, half a year, one year, two years, and three years). The analysis results are shown in Figures 8 and 10.

From Figure 10, it can be seen that the three ethnic groups have relatively coordinated development of ethnic languages in the inhabited areas, and the proportions of using bilingual or multilingual communication in informal situations are 18.97% and 13.22%, respectively. The proportion of bilingual and multilingual communication in formal occasions is 27.01% and 20.11%. The proportion of ethnic languages used in formal and informal occasions is 8.62% and 36.21%, respectively. The usage ratios are 44.25% and 31.61%, respectively. Relatively speaking, the use of Chinese is high and ethnic languages are rarely used in formal occasions. This also shows that in order to reduce the occurrence of endangered ethnic languages and promote the balanced development of various languages, it is necessary to increase the popularization of ethnic languages. Even in some formal occasions, such as schools and other places, bilingual or multilingual modes can be used to communicate.

It can be seen from Figure 8 that it takes a long time to learn other national languages, and the number of people who can reach the normal and proficient level is relatively small, but it is affected by the environment of settlement. The people of the three major ethnic groups can also use the languages of other ethnic groups to a level of normal or above within two or three years, which shows that the coordinated development of languages in multiethnic areas still needs some time to adapt.

5. Conclusions

The ability of ethnic languages and cultures to blend and coexist is the basis for the harmonious development of society. This paper conducts a profound study on the coordinated development of languages in multiethnic settlements through big data analysis algorithms. Through the analysis of data, this article also provides development suggestions with scientific value. The coordinated development of ethnic languages still needs to make certain efforts and changes. It is believed that with the continuous improvement of science and technology, language and cultural construction will become more and more perfect.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The author declares that there are no conflicts of interest.