STATISTICAL ANALYSIS OF COLLOCATIONS OF THE CONCEPT JOY IN R. IVANYCHUK’S TEXT CORPUS

The paper includes a review of scientific works on the importance of corpus and quantitative methods, the problem of connectivity and the ways of collocation study. The article deals with the study of collocations of the emotion JOY in writer’s Text Corpus by the means of statistical methods in modern linguistics. From the point of view of language system described collocations are presented in various structural-semantic forms in author’s idiolect. Meanwhile statistical research represents a list of collocations organized according to absolute and relative frequency and association measures such as T-score and MI-score.


Introduction
The artistic language reflects not only the linguistic competence of the author, and the advantages of using one or another language constructs and words over, but also features of the national language (Kulchystkyi, 2017). Quantitative analysis is used to study author's style to avoid methodological mistakes frequently caused by researcher's subjectivity in giving examples for a suggested hypothesis.

Corpus and statistical approach in linguistic research
Development of Text Corpus leads to increased efficiency in linguistic processing of large text databases. Text corpus provides great opportunities for conducting various linguistic studies of the language system. A corpus is "a collection of pieces of language text in electronic form" (Sinclair, 2004: 19), meanwhile, text is "natural language used for communication, whether it is realized in speech or in writing" (Biber & Conrad, 2009: 5). Corpus linguistic research offers strong support for the idea that language variation is systematic and can be described using empirical, quantitative methods. Text corpus is used to perform statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules. Text corpus is electronically processed in text analytical tools and possesses useful statistical information such as number of word types, frequency, co-occurrences (Biber & Conrad, 2009).
Statistical methods are important and reliable tool for linguistic data analysis in modern linguistics. In addition, quantitative methods ensure reliability of results, allow to reveal language units and text structure properties, the research that would be impossible without statistical studies. The fact that language itself is a complex system subordinated to the laws of statistics proves the necessity of using statistical methods in linguistics (Perebyinis, 1967).

Collocation as lingual / language system unit
The object of our study is a collocation in R. Ivanychuk's Text Corpus. There are different approaches to definition of the term 'collocation'. Sometimes the 'collocation' is used as a synonym of a word combination, sometimes as a special type of a set phrase. In corpus linguistics the 'collocation' is the word combination used in the text together more often, than used at random probability separately, in other words collocations are understood as statistically determined set phrases. S. Evert suggests the following definition: "A collocation is a word combination whose semantic and syntactic properties can't be fully predicted on the basis of information about its constituents and which therefore should be added to the dictionary (lexicon)" (Evert, 2004: 17). The text corpus and tools of corpus linguistics make possible to identify and expand the lexical fund of set phrases of various types and peculiarities of their use (Zakharov, 2015).
It is known that "the language system is probabilistic, and frequency in a text is an illustration of grammatical probability" (Halliday, 1991: 31). This suggests that words in speech are subordinated to grammatical rules of language and aren't used arbitrarily in a language flow.
One of the main approaches of working with corpus data is to study collocations is concordance -Text Corpus lines representing the word in context. Concordance lines are the source of information about patterns of usage of word (node) and the connection between other words (collocate).

Statistics to study collocations in Text Corpus
In computational linguistics the term 'collocation' is defined as 'statistically stable word combination" (Khokhlova, 2010: 8). The most basic corpus-based statistics are the absolute frequency and the relative frequency of some phenomenon. The absolute frequency (co-occurrences) is a number of times that a value appears, the sum of the absolute frequencies is equal to the total number of word types in Text Corpus. At the same time the relative frequency is an estimate of the probability of a given phenomenon in the language.
In addition, collocations are studied by means of mathematical criteriastatistical association measures, which are based on probability theory and mathematical statistics. Association measures are mathematical formulas determining the strength of association between two or more words based on their occurrences and co-occurrences in a text corpus. It is known that T-score extracts most frequent collocations. On the contrary, the MI-score allows to reveal low-frequency multiword terms and proper names. These measures play an important role in the automatic extraction of collocations.
Lexical association measures are applied to a key word (node) occurrence and context statistics extracted from the corpus for all collocation candidates and result in their association scores. On the top of the list are word combinations that are assumed to have the greatest association with each other and, consequently, be the most probable collocation candidates. The frequency of joint occurrence of a key word (node) and its collocate is taken into consideration (Zakharov, 2015).
Statistical methods allow to obtain reliable statistics data of lexical unit compatibility based on Text Corpus, to study lexical units in context, to obtain data on frequency of words, lemmas, grammatical categories, co-occurrences of lexical units, compatibility peculiarities. In addition, search results can be ranked by different parameters and we are able to set threshold values making possible to obtain meaningful information (Khokhlova, 2010: 66). The co-occurrence is associated with the frequency of individual components of the collocation.

Statistical research of collocations with the concept JOY in R. Ivanychuk's Text Corpus
Author's vocabulary research allows to describe the lexical arsenal of writer's idiolect and will make possible to identify his texts among others. Words marking emotion are one of the key factors in the comprehension of author's language and his personality.
Valuable information about idiolect specificity is represented by word frequency analysis. Collocation study provides important information about author's style peculiarities but collocation research in fiction is insufficiently studied in Ukrainian linguistics.
In this study word combinations with the concept JOY are described and extracted by means of the Ukrainian corpora 'GRAK' and Collocation tool of the NoSketch Engine system, association measures such as T-score and MI are used to study collocations.
In R. Ivanychuk's idiolect the collocations were described and analyzed by means of absolute and relative frequency, association measures T-score and MI-score. In our research the following frequencies are taken into consideration: ≥ 2 frequency for MI-score, and ≥12 frequency for for T-score. The absolute frequency of lemma JOY is 278 word types (node) and the relative frequency is 2,15*10 -4 in R. Ivanychuk's Text Corpus.
Collocations extracted by MI-score are the less frequent combinations, in their turn, they are individual set phrases illustrating the author's idiolect and can be served as indicator of writer's text attribution: затеплилася радість, підленька радість, скритна радість, незмірна радість, притлумлювати радість etc.

Conclusions and suggestions
Author's idiolect was studied from the point of view of language system (lexical and semantic structure) and by means of statistical parameters. As a result, it turned out that high frequency collocations are Prep + JOY/ JOY + Prep and Conj + JOY / JOY + Conj; collocations Adj + JOY present numerous attributes; verbal and noun collocations V + JOY / JOY + V / V+ Prep + JOY / JOY + N / N + JOY demonstrate a big variety of collocates. Nominal and verbal constructs can possess direct and figurative meaning, meanwhile, among metaphors somatic metaphors are observed.