Теория и методика обучения и воспитания (по областям и уровням образования) | Мир педагогики и психологии №6 (35) Июнь 2019

УДК 378

Дата публикации 30.06.2019

Подходы к оцениванию и тестированию в преподавании иностранного языка в неязыковом вузе

Кузнецова Татьяна Николаевна
преподаватель английского языка, Департамент языковой подготовки, Финансовый Университет при Правительстве РФ, РФ, г. Москва, TaNKuznetsova@fa.ru

Аннотация: Понимание значения оценивания и тестирования и их влияние на процесс обучения важно как для составителей тестов, так и для тех, кто использует тестирование в образовательном контексте. Данная статья рассматривает различные методы оценивания и тестирования с точки зрения их влияния на принятие решений в зависимости от целей обучения в неязыковом вузе. В статье анализируется практическое применение данных методов и приводятся критерии, необходимые для создания корректных и валидных тестовых заданий. Особое внимание уделяется текущему и итоговому контролю в качестве основного инструмента оценивания в образовательном контексте и важности его соответствия целям и содержанию обучения иностранному языку. Автор дает обобщенную характеристику общепринятых форматов тестовых заданий, таких как задания альтернативного выбора, самостоятельно сконструированного ответа и персонализированного ответа.
Ключевые слова: оценивание, тестирование, валидность, тесты языковых компетенций, текущее тестирование, итоговое тестирование, обратное влияние тестирования на обучение, коммуникативная компетенция, изучаемый язык, тесты достижений, задания множественного выбора, задания на установление соответствий

Approaches to Assessment and Testing in Foreign Language Teaching at a Non-Language University

Kuznetsova Tatiana Nikolayevna
Lecturer of English, Department of Foreign Languages, Financial University under the Government of the RF

Abstract: Understanding of assessment and testing and their implications on learning are important both to those involved in test design and those who use testing in educational contexts. This article reviews different types of assessment and testing methods with regard to decisions they help to make and the objectives of foreign language teaching at a non-language university. The article also deals with practical application of these methods and examines criteria for developing reliable and valid tests. Much attention is given to formative and summative assessment as most common assessment tools in educational context and the importance of their relevance to the second language learning objectives and course content. The article looks at selected response, constructed response, and personal response as generally accepted testing formats and their practical classroom application.
Keywords: assessment, testing, validity, proficiency testing, formative testing, summative testing, washback, communicative competence, target language, achievement tests, multiple choice tests, matching tests

Teaching foreign languages at a non-language university involves using different types of assessment and testing methods which largely depend on the teaching objectives and the students’ stage in learning. There are various approaches towards assessment, with a general agreement among most researchers into this field being that it is beneficial for monitoring learners’ progress and achievement in reaching the learning objectives.

Understanding of testing and its implications on learning is important both to those involved in test design, and those who use testing in educational contexts.

Mihai [13, p.24] views assessment as a “combination of all formal and informal judgments that occur inside and outside a classroom”.

In Russia nowadays, alongside with the traditional assessment system, testing is becoming increasingly accepted due to the objective character of its results and their interpretation, meeting the demand for unbiased assessment of learners’ progress and achievement.

In educational setting, given the institutional constraints and varying levels of learners’ proficiency, achievement tests are widely accepted as being based on the material covered and directly relating to the curriculum. They are typically used for formative and summative assessment.

Development and improvement of controlling and measuring tools have been a subject of heated discussion in foreign language pedagogy since the 1980-s. These issues have been addressed by such renowned methodologists as Alderson et al, Bachman, Brown, Carr, Hughes, McNamara, Weir [1, 2, 8, 11, 14].

Approaches to assessment and testing can be evaluated from different perspectives.

Depending on the purpose of tests, and decisions they help to make, Brown
[5, p.p.390-391] defines proficiency, diagnostic, placement, achievement, and aptitude as curriculum-related testing types.

At a non-language university, these types can be fit into the curriculum at different stages of learning and teaching.

Proficiency testing involves assessing the overall language ability and level of competence across the four skills (listening, speaking, reading, writing) and is not related to any curriculum. However, it can be used when decisions have to be made if a learner’s level of proficiency is sufficient for getting instruction in the target language, be it a university in their country or abroad. Proficiency testing is becoming increasingly important in Russian non-language universities as more and more of them introduce instruction of many disciplines in the target language and due to their growing involvement in the academic mobility programmes.

Placement testing at a non-language university allows to measure learners’ ability with the aim of streaming them into the groups of appropriate level of competence. The main purpose here, as Brown [ibid] indicates, is to allocate them to such groups which are “neither too easy nor too difficult, but appropriately challenging”.

Diagnostics tests, as the term implies, are used to diagnose specific areas of language. They identify learners’ weaknesses which should be addressed in further instruction. As Carr [8, p.7] remarks they can be also used after placement tests to reconfirm learners’ streaming to appropriate levels.

Achievement tests are designed to show the learners’ progress within a certain language programme and are based on the material covered. They are normally administered at the end of the course of study and show how the course objectives have been met and measure and grade the performance of individual learners.

Aptitude tests help to predict a learner’s success before they are exposed to the second language.

Given institutional constraints of a non-language university, placement and achievement tests are more common as being relevant to the teaching/learning situation. Achievement tests are typically used for formative and summative assessment.

Baranovskaya and Shaforostova [3, p.32] stress the importance of feedback as an essential part of both formative and summative assessment. This feedback informs the learners of their strengths and weeknesses, their grasp of topic areas and skills, thus guiding them in further learning and making teaching more responsive by adjusting teaching methods to learners’ needs.

Formative and summative assessment deal with the time of test administration and objectives of testing. Mihai [13, p.26] concludes that formative assessment focuses on the process of learning while summative assessment measures the learning outcomes.

Mihai [13, p.28] stresses the importance of basing both types of testing on “content area standards” which strongly links “assessment with instruction” providing a teaching beneficial for the students. Carr [8, p.12] echoes him by saying that formative testing “shapes” the language programme while summative testing assesses a particular learner’s performance on this programme.

Formative assessment is obviously the most common form of assessment that occurs in the language classroom on a regular basis. It is concerned with the process of learning, i.e. how much students have learned and how much they still have to learn. It can take different shapes such as informal observation, questions and answers sessions, classroom discussions, interviews, portfolios, projects, presentations, etc.

Formative assessment provides ongoing feedback on the learners’ problem areas and allows the teacher to promptly address them by modifying the learning activities and making changes in their teaching techniques. As it occurs during the learning period, it involves assessing learners’ performance on a regular basis and creates the environment of meeting students’ individual learning needs. Therefore, as Carr [ibid] points out, “formative assessments are used to shape and form what is being taught”. Moreover, learners get better awareness of the assessment criteria and, while working on eliminating their language gaps, increase their self-efficiency. These make their learning activities more meaningful and increase their intrinsic motivation.

In a non-language university context, formative assessment occurs regularly throughout the semester. It can take different shapes: using grammar tests and vocabulary dictations, assessing the four language skills by a classroom observation, assigning specific communicative tasks to certain students or groups of students within the same class, etc. All these allow the teacher to assess the learners’ progress through a definite period of time and work on eliminating their possible language gaps. These assessments are normally not graded.

Conversely, summative assessment measures success of a language programme in the academic area and typically occurs at the end of a module, course strand or semester. It focuses on the learning outcomes, on how the learners did on a language programme, measuring the level of acquired competences and skills. The learners’ performance in summative assessment is graded and might inform further educational decisions.

Summative assessment typically includes tasks which enable to control learner’ achievement across the language systems and the four language skills as defined by CEFR [9] and are based on the material covered. When language systems are checked, testing is appropriate as learners’ performance on the test can be promptly measured and communicated to students. The four skills can also be assessed by testing.

Summative assessment produces more objective results when it is administered on an integrated basis and includes tasks that cover all language systems and communication skills. The results inform the teacher of the learners’ problem areas that have to be addressed and allows to align the programme to better meet their needs. By evaluating the learners’ performance on the test, the teacher can make recommendations to individual learners on how to overcome their language gaps. On the other hand, consistently high test scores of certain learners allow the teacher to assign them more complex and challenging tasks thus making the learning process more flexible and individual.

Mikhai [13, p.28] concludes that both formative and summative assessment “increase in effectiveness” when they are based on the material covered. In educational setting, these methods “strongly link assessment with instruction”.

From evaluation perspective and interpretation of test results, approaches to testing can be viewed as norm-referenced vs. criterion referenced, objective vs. subjective, discrete vs. integrative as well as communicative testing and performance testing.

In norm-referenced testing, results of an individual test-taker are compared to the other test-takers’ performance. The major limitation here is that the examiners get information on how well test-takers performed with regard to other test-takers, but not, as Carr [8, p.10] observes, “in absolute way”. It means that it is not their overall ability and competence in language skills that were measured, but their performance with regard to other test-takers.

Conversely, criterion-referenced tests, measure test-takers knowledge and skills “in absolute way” in that they are related to certain educational standards, not to the level of other test-takers. Such assessment establishes the relationship between learners’ performance and body of knowledge to be mastered [13, p.26].

Another dividing line is between objective and subjective testing.
As the names imply, objective tests are those that can be scored objectively as most selected-response questions with a single correct answer while subjective tests “involve human judgment” mostly in assessing writing and speaking [8, p.12]. The former is becoming increasingly popular in computerized assessment while the latter deals with extended-response questions such as essays, reports, presentations, etc. Subjective tests can generate more than just one correct answer and more ways of formulating it. The distinction between objective and subjective testing is getting somewhat blurred as the recent developments in testing methodology have contributed to the growing objectivity of ‘subjective” testing. Improving rating scales and assessment criteria to provide objective scoring and grading is one of the main concerns of educational authorities and examination boards.

Discrete-point vs. integrative testing is the next “important distinction between test types” [8, p.26].

Decrete-point testing is based on the assumption that a language system consists of independent elements that can be tested separately. According to Hughes
11, p.19], discrete point tests are designed to check specific language items. Such tests are constructed with the aim of checking language systems and skills item by item. A large number of points can be tested such as grammar patterns, vocabulary items, topic areas, etc. Such formats as True/False statements, multiple choice tasks, gap fill exercises, etc. can be used. As such tests are designed to test specific language areas, they are practical to test precisely the language material covered. Another advantage here is that such tests can yield high reliability by testing a large number of discrete items, and are practical in that the tests results can be checked quickly and communicated to learners. However, this method of testing fails to evaluate overall learners’ competence as each test question tests only one language point. As Carr points out [8, p.15], all the advantages of descrete-point testing come “at the price of authentic language use”. Discrete-point tests fail to evaluate what a learner can really “do with the language” [ibid].

Thus, the drawbacks of this testing approach prompted the development of integrated testing, which test the target language in context and provide a good opportunity to assess the learners’ competence while they have to incorporate all their knowledge and skills to perform a certain task. In contrast to testing separate language items, integrative testing puts them back together. However, this method is not devoid of certain drawbacks as well. The main concern with this approach seems to be in evaluating the learners’ performance objectively as such tests are more difficult to score. Carr [8, p.16] suggests creating clear scoring rubrics and instructing raters in how to apply them as a way of coping with this challenge.

The limitations of discrete-point and integrated testing are in that they only measure test-takers’ competence rather than their performance caused the need for communicative language testing [14, p.17]. By the mid-1980s, the methodology of language testing had begun to focus on designing communicative language tests. This means that the need for communicative language testing has been perceived, and much research in this field has been done since then.

The purpose of communicative language testing is to provide the teacher with information about the learners’ ability to interact in the target language in specific contexts. Consequently, communicative language tests are designed to measure learners’ ability to communicate in real-life situations. They test the four language skills of listening, speaking, reading, and writing, and are developed on the basis of communicative competence. Canale and Swain [7, p.4] break communicative competence into the following four areas: linguistic competence (knowledge of linguistic forms), sociolinguistic competence (the ability to use language appropriately in contexts), discourse competence (coherence and cohesion), and strategic competence (knowledge of verbal and non-verbal communicative strategies).

Apart from testing the above mentioned communicative competence, this testing approach also allows to measure learners’ ability to use the target language in authentic situations. Under this approach, success in learning/teaching a second language is achieved only when a learner can communicate in the target language by being exposed to authentic listening, speaking, reading and writing. Therefore, test designers have to prepare tasks that are close to real-life situations outside the classroom.

Brown [5, p.43] identifies five requirements to be followed when constructing a reliable and valid communicative test: meaningful communication, authentic situation, unpredictable language input, creative language output, integrated language skills.

Communication is meaningful to students when it draws on their personal experience thus contributing to the natural language use. Making use of authentic situations can increase the likelihood that meaningful communication will occur.

Authentic situations offer students the opportunity to encounter and use the target language receptively and productively in the real world and demonstrate their language ability and competence.

In communicative assessment, unpredictable language input happens as it is normally not possible to predict what speakers will say. Creative language output occurs as in authentic situations when language output is largely dependent on a speaker’s input.

Integrated language skills, i.e. a communicative test will elicit the learners’ use of language skills integratively, as is the case in real life communication [4, p.21].

Weir [13, p. II] stresses that “to measure language proficiency ... account must now be taken of: where, when, how, with whom, and why language is to be used, and on what topics, and with what effect." All these create challenges to test developers.

Brown [5, p. 11] views performance-based assessment as an effective tool of testing learners in the process of performing “actual or simulated real-world tasks”. Learners get exposed to authentic materials, such as newspaper or magazine articles, blogs, videos, films, radio, lectures, etc. and do problem-solving tasks on their basis. These can be simulations, role plays, case studies, discussions, etc. Teacher assess students by observing how they negotiate meaning and interact within groups. Learners are free to use whatever language they have at the task stage. This method of assessment is, however, time consuming both for learners as they need quite much time to prepare and for teachers who have to design very specific tasks and guide students at the various stages of their performance. In the context of foreign language teaching at a non-linguistic university, performance-based tasks are primarily used in informal classroom assessment.

Fundamental principles for designing second language assessment include validity, reliability, practicality, and washback.

A test as a measuring tool is said to be valid, when it “tests what it is supposed to test” [1, p.170] in that it measures a learner’s ability and competence in a particular area. Weir [14, p.12] considers a test to be valid if the test scores adequately reflect a learner’s ability and communicative competence. Most researchers into this field view validity as integration of content, criteria and construct.

Content validity characterizes a test from the point of view of the content area of testing. A test is valid when [11, p.22] “its content constitutes a representative sample of the language skills, structures, etc.” This aspect of validity ensures the accuracy of testing in terms of its relevance to test specification and test content.

Criterion-referenced validity is correlated to a certain outside criteria which determines a test validity. Construct validity is defined by a model of measured quality.

A test has a construct validity [11, p.22] when it “measures just the ability it is supposed to measure”. For example, a reading test should only test reading competence, otherwise it may not be considered valid. Bachman [2, p.39] agrees with him by pointing out that a test has construct validity only when it “only reflects the area of language ability we want to measure” and hardly anything else.

Bachman [2, p.38] defines reliability as “consistency in measurement”. A test is considered reliable if its results are consistent, i.e. when they yield relatively comparable scores with learner groups with similar ability. Test reliability is measured by a correlation coefficient. The closer this coefficient is to 1, the higher reliability of the test.

Practicality refers to practical issues at all stages of test design, administration and assessment [6, p.19]. A test can be considered practical when it is not expensive and time-consuming, relatively easy to administer and assess. Bachman [2, p.20] broadens the scope of practicality by viewing it as the relationship of the resources needed to produce and administer testing and resources available. These are human and material resources, and time. In his opinion, a test is practical when its “design, development, and use do not require more resources than are available”. Consequently, practicality is unique for a specific testing situation.

Washback is another important principle in second language assessment. Hughes [11, p.1] looks at washback as the impact of testing on learning and teaching which can be both negative and positive. Positive washback happens when tests are carefully designed and based on clear specifications. At a non-language university, positive washback occurs when tests check the material covered and fully correspond to the course objectives. Quality of controlling and measuring materials is extremely important at all stages of test development, and, more so, for high-stake situations, like the end-of-course exams.

Language testing researchers identify three main testing techniques: selected response, constructed response and personal response.

Selected response tests ask test-takers to choose the correct answer from a certain number of options and include multiple choice, matching tasks and true-false statements. As learners do not produce any language themselves, this type of testing is generally accepted for measuring receptive skills such as listening and reading. They can also provide good feedback on learners’ ability in grammar and vocabulary. Practicality is the most obvious benefit of this testing technique as such tests can be easily administrated and scored.

Multiple choice tests are widely used by teachers in their teaching and testing practice. Their basic structure consists of a stem and a number of options to choose the correct answer from. This format is not so stressful for learners because it does not demand any creative activity as open-ended or short-answer questions. The downside here is that a learner who may do well, for example, on a grammar multiple choice test may fail “to produce the correct form when speaking or writing” [11, p.60]. Thus, performance on a multiple choice test puts the accuracy of measuring a learner’s ability in doubt. Another concern with multiple choice testing is the possibility of guessing as we never know how many responses might be the result of guesswork. Therefore, this testing technique often falls under criticism because real-life language use is not multiple choice.

Matching requires students to match words, phrases, or sentences from one list to another. The advantage of this testing technique is in low guessing factor. Brown [5, p.197] points out that matching procedures are mostly used to test vocabulary. He considers a gap fill vocabulary task as a type of matching with a communicative touch as words have to be put into a certain context. The disadvantage that comes with matching is that poorly constructed matching tasks, instead of testing comprehension, often resemble a “puzzle-solving process” [5, p.198].

True-false statements ask students to respond to the test questions. Test accuracy can be a factor as there exists a 50% chance of correct guessing. Most language testing researchers believe that this problem can be minimized providing that true-false questions are carefully constructed and a large number of true-false statements are incorporated into the test. True/false activities are mostly intended to check the learners’ reading and listening competences.

Constructed response tests, conversely, are good tools to test learners’ productive skills such as speaking and writing. These can be short answers, interviews, performance assessments. Such assessments are also appropriate for observing the interaction of productive and receptive skills such as listening and speaking in an interview.

Short answers format is an alternative to multiple choice questions as the learners have to give their own responses. If test questions are properly constructed, the learner will, in high probability, produce short, well-structured answers. A variety of questions can be included into the test providing more opportunities to assess the learner’s performance. However, this testing format involves writing answers and, therefore, may generate vocabulary, grammar, and spelling mistakes.

Besides that, this testing format is very time-consuming for the teacher as all the questions have to be carefully thought out and constructed in order to avoid ambiguity. It is also recommended to communicate assessment criteria to the learners.

Interviews allow to assess learners’ proficiency in oral communication by asking certain questions and receiving answers. It enables the interviewer to assess the learner’s proficiency in oral communication as well as vocabulary range and grammar accuracy. However, this method does not appear to be practical in terms of time. Provided there are no time constraints, this assessing technique can generate reliable results, although not completely devoid of certain subjectivity.

Performance assessment tests communicative competence and language use in almost authentic situations and may include tasks such as a role play, simulation, interview, group discussions, report writing, etc. Such assessment allows a teacher/examiner to observe leaners’ performance in a real-life situation. However, such assessment is time-consuming and requires well-thought evaluating and rating criteria to, if not totally avoid, but minimize a possible subjectivity and inconsistencies of raters.

Personal response assessments require students to produce natural language to communicate what they wish to communicate. Thus, students’ responses are unique and allow the teacher to assess each learner individually in the course of teaching/learning. Conferences, portfolios and self- and peer assessments are the most common types. However, such assessments are time-consuming and quite subjective and, consequently, they are not normally rated.

The potential of testing as a controlling and measuring tool, given its objective character and its positive effect on language teaching/learning, can be only realized when language teachers and test makers get full awareness about the principles of designing a valid test. Testing the four language skills in different situations, as Galaczi [10], points out, is crucial in increasing test accuracy.

Each method of assessment and testing has its own benefits and drawbacks. No one method can be considered ultimate and being able to objectively test all language systems and skills. The task of language teachers and test developers is to design such testing and measuring materials that are based on the educational content, and correspond both to the course objectives and methods of instruction and, therefore, more likely to yield reliable results.

Список литературы

1. Alderson, C., Clapham, C., Wall, D. (1995) Language Test Construction and Evaluation. - Cambridge: Cambridge University Press
2. Bachman, L. (1990) Uses of Language Tests (pp 58-60 in Fundamental Considerations in Language Testing. - Oxford: Oxford University Press
3. Baranovskaya, T., & Shaforostova, V. (2017). Assessment and Evaluation Techniques. Journal of Language and Education, 3(2), 30-38. https://doi.org/10.17323/2411-7390-2017-3-2-30-38
4. Brown, D. (2014) Principles of Language Learning and Teaching, 6th edition. – White Plains: Pearson Education, Inc.
5. Brown, D. Heekyeong, L. (2015) Teaching by Principles: An Interactive Approach to Language Pedagogy, 4th edition. – White Plains: Pearson Education, Inc.
6. Brown, D. (2018) Language assessment: Principles and Classroom Practices, 3rd edition. – White Plains: Pearson Education, Inc
7. Canale, M., Swain, M. (1980) Theoretical Bases of Communicative Approach to Seacond Language Teaching and Testing. – Applied Linguistics, 1, 1, 1-47
8. Carr, N. (2011) Designing and Analysing Language Tests. – Oxford: Oxford University Press
9. Council of Europe: Common European Framework of Reference for Langugages https://www.coe.int/en/web/common-european-framework-reference-languages retrieved 26/05/2019
10. Galaczi, E. Benefits of testing the four skills (reading, listening, writing and speaking), 27/06/2018 https://www.cambridgeenglish.org/blog/benefits-of-testing-the-four-skills/ retrieved 26/05/2019
11. Hughes, A. (2013) Testing for Language Teachers. - Cambridge: Cambridge University Press
12. McNamara, T. (2000) Language Study Series: Language Testing. - Oxford: Oxford University Press
13. Mihai, F. (2010) Assessing English Language Learners in the Content Areas: a research-into-practice guide for Educators. – Ann Arbor: University of Michigan Press
14. Weir, C. (2005) Language Testing and Validation – New York, NY: Palgrave Macmillan

← Предыдущая статьяПроблемы формирования профессионально-важных ценностных ориентаций сотрудников полиции, проходящих повышение квалификации в образовательных организациях системы МВД России

Следующая статья →Закономерности формирования грамматической системности у детей в процессе усвоения речи