AVERAGE WORD LENGTH AND TEXT REDUNDANCY VARIABILITY: FRENCH TEXTS CASE STUDY

Keywords: text entropy, text redundancy, word length, information capacity, quantitative linguistics

Abstract

The redundancy and average word length correlation in French texts have been researched. This correlation has been evaluated on the basis of analysis of entropy, redundancy and average word length for literary, scientific, and publicistic texts. It has been revealed that the variability of text redundancy correlates well with the variability of average word length, if calculating the average word length of an individual text we exclude the length of words belonging to the exponential tail of entropy curve. In this regard it is proposed to distinguish between two average word lengths of text: the average length of a word belonging to the exponentially decaying tail of entropy and the average length of a word not belonging to the exponential tail of entropy.

References

1. Alontseva, N. V., Ermoshin, Y. A. (2019). Problem of language redundancy оn the example of a scientific text. RUDN Journal of Language Studies, Semiotics and Semantics, 10 (1), 129–140. DOI: 10.22363/2313-2299-2019-10-1-129-140. [in English].
2. Arapov, M. V. (1988). Kvantitativnaya lingvistika [Quantitative linguistics]. Moscow: Nauka. [in Russian].
3. Baker, S. J. (1951). A linguistic law of constancy: II. The Journal of General Psychology, 44, 113–120. [in English].
4. Barthes, R. (1972). Le degré zéro de l'écriture [Writing Degree Zero]. Paris: Seuil. [in French].
5. Clavel, B. (1974). Pirates du Rhône [Fishermen of the Rhône]. Paris: Robert Laffont. [in French].
6. Derrida, J. (1996). Le monolinguisme de l’autre où la prothèse de l’origine [Monolingualism of the Other or the Prosthesis of Origin]. Paris: Galilée. [in French].
7. Dubois, J., Edeline, F. Klinkenberg, J.M., Minguet, P., Pire, F., Trinon, H. (1970). Rhétorique générale [A General Rhetoric]. Paris: Larousse. [in French].
8. Fulda, A. (2017). Emmanuel Macron, un jeune homme si parfait [Emmanuel Macron, a young man so perfect]. Paris: Plon. [in French].
9. Gavalda, A. (2013). Billie [Billie]. Paris: Le Dilettante. [in French].
10. Gillette, M., Wit, E.J.C. (1999). What is Linguistic Redundancy? A Technical Report. University of Chicago, U.S.A. Retrieved from: http://www.math.rug.nl/~ernst/linguistics/redundancy3.pdf. [in English].
11. Grudeva, E.V. (2008). Izbytochnost teksta: istoriya voprosa i metodika issledovaniya [Redundancy of the text: the history of the issue and the methodology of the research]. Izvestiya Rossijskogo gosudarstvennogo pedagogicheskogo universiteta imeni A.I. Gercena [News of the Russian A.I. Herzen State Pedagogical University], 59, 106–114. [in Russian].
12. Grudeva, E.V. (2010). Izbytochnost yazyka i izbytochnost teksta: nekotorye razmyshleniya [Redundancy of the language and redundancy of the text: some reflexions]. Acta linguistica Petropolitana. Trudy Instituta lingvisticheskih issledovanij [J. of the Institute for Linguistic Studies], 6 (2), 73–89. [in Russian].
13. Grzybek, P., Standlober, E., Kelih, E., Antic, G. (2005). Quantitative Text Typology: The Impact of Word Length. C. Weihs and W. Gaul (Eds.). Classification – The Ubiquitous Challenge. Heidelberg: Springer, 53–64. [in English].
14. Guerrero, F.G. (2005). A new look at the classical entropy of written English. IEEE Transactions of Information Theory. preprint arXiv:0901.4784. Retrieved from: https://www.researchgate.net/publication/45883885_A_New_Look_at_the_Classical_Entropy_of_Written_English. [in English].
15. Kalimeri, M., Constantoudis, V., Papadimitriou, C., Karamanos, K., Diakonos, F.K., and Papageorgiou, H. (2012). Entropy analysis of word-length series of natural language texts: Effects of text language and genre. International Journal of Bifurcation and Chaos, 22(9). DOI:10.1142/S0218127412502239. [in English].
16. Kalimeri, M., Constantoudis, V., Papadimitriou, C., Karamanos, K., Diakonos, F.K., and Papageorgiou, H. (2015). Word-length entropies and correlations of natural language written texts. Journal of Quantitative Linguistics, 22 (2), 101–118. [in English].
17. Köhler, R. (2005). Synergetic linguistics. Quantitative Linguistics. Köhler, R., Altmann, G., Piotrowski, R.G.(eds.). An International Handbook. Berlin/New York: de Gruyter. 760–774. [in English].
18. Laine, M., Feldman J.-Ph. (2018). Transformer la France [To transform France]. Paris: Plon. [in French].
19. Martinet, A. (1991). Éléments de linguistique générale [Elements of General Linguistics]. Paris: Armand Colin. [in French].
20. Mikros, G. K., Hatzigeorgiu, N., and Carayannis, G. (2005). Basic quantitative characteristics of the modern greek language using the hellenic national corpus. Journal of Quantitative Linguistics, 12 (2–3), 167–184. DOI: 10.1080/09296170500172478. [in English].
21. Miller, G.A., Newman, E.B., Friedman, E.A. (1958). Length-frequency statistics for written English. Information and Control, 1, 370–389. [in English].
22. Newman, E. B., Waugh, N. C. (1960). The redundancy of texts in three languages. Information and Control, 3, 141–153. https://doi.org/10.1016/S0019-9958(60)90731-2. [in English].
23. Popescu, I.-I., Naumann, S., Kelih E., Rovenchak, A. et al. (2013). Word length: aspects and languages. Issues in quantitative linguistics. Köhler, R., Altmann, G. (eds), 3, 224–281. [in English].
24. Raatz, U., Kelein-Braley, C. (2002). Introduction to the language and the C-Test. University Language Testing and the C-Test. J. A. Coleman, R. Grotjahn, & U. Raatz (Eds.). Bochum: AKS-Verlag, 75–86. [in English].
25. Shannon, C. E. (1948) A mathematical theory of communication. The Bell System Technical Journal, 27 (3), 379–423. [in English].
26. Shannon, C. E. (1951). Prediction and entropy of printed English. Bell System Technical Journal (BSTI), 30, № 1, 50–64. [in English].
27. Strauss, U., Grzybek, P., Altmann, G. (2007). Word Length and Word Frequency. Contributions to the Science of Text and Language. Text, Speech and Language Technology. Grzybek, P. (eds), 31. Dordrecht: Springer, 277–294. [in English].
28. Zipf, G. K. (1949). Human behaviour and the principle of least effort. Cambridge: Addison-Wesley Press. [in English].

Abstract views: 687
PDF Downloads: 213
Published
2020-10-05
How to Cite
Marinashvili, M. (2020). AVERAGE WORD LENGTH AND TEXT REDUNDANCY VARIABILITY: FRENCH TEXTS CASE STUDY. Scientific Journal of Polonia University, 38(1-2), 67-75. https://doi.org/10.23856/3849