Testing Objectives and the Curriculum
Testing objectives for English language teaching are often problematic, particularly when there is no explicit curriculum. Without a clear description of the aims you are trying to achieve, it is not possible to know whether or not your have got there.
In the absence of clear curriculum objectives, testing becomes a hit and miss affair. Most often, tests are constructed from an unsystematic selection of some of the structures, words and topics which have been used in teaching throughout the previous weeks, months, or over the whole academic year. These items are then presented in the test as objects in themselves, with, possibly, little thought as to how and why they were dealt with in the classroom in the first place. Then, there will be a reading passage and perhaps a listening test…quickly search through course books that you didn’t use for suitable material…and there is your exam.
Exam content is selected according to a feeling that, “Yes, they should be able to do this. It’s about their level” or “They did this in class. Let’s see if they can still do it” rather than any commonly accepted information gathering principles.
Effective testing objectives have to be based on the curriculum for a number of reasons. The main one, assuming that the main aim of testing is to gain a variety of information about how well the language training or education in your institution is going, is that tests need to be based on what you are trying to achieve. To do that, of course, you need a clear, public, agreed statement of what, you are trying to achieve, and why, and how you propose going about it.
In writing the Lise preparatory year exemption examination last year, we were in the position of not having a curriculum suitable for generating examination aims. As an interim measure, we decided to create an exam specification which could later be used as the basis for part of our English language curriculum. The specification would be based on what we had been doing in years 6,7 and 8, but expressed in terms of objectives. We felt that this would not only give us a useable basis for an exam, but would also allow us to assess whether or not our implicit objectives as realised in the test were really what we wanted to aim at. In other words, we were working backwards. Not going from aim to method but instead writing down our methods and seeing what aim was implicit in them.
We divided our test into five sections, being the now traditional Use of English, Reading, Writing, Listening and Speaking. Most weighting was given to Use of English, as we felt this represented the reality of the teaching. We first created a trial exam, which was held in April and then a real exam, which was held towards the end of June. I will take each part in turn.
Use of English
These are all classic Cambridge First Certificate style question types. The skills and knowledge they test are formal and abstract. They involve recognition of formal categories in the language.
2. Example question….
In the trial exam, this was found to be a difficult question, and one which did not discriminate well. We felt that this might be due to the fact that the texts selected used lexical items that were too specialised (economics) and so in the real exam we used more general texts. The facility index did increase to a satisfactory level, but discrimination power was still somewhat unsatisfactory despite a small increase.
3. Example question…
You are too young to drive.
This proved very difficult. In both the trial and the real exam, the analysis showed that the questions were on the whole sound, but that the skill had just not been acquired. After our analysis of the trial exam, we had decided to keep the test, but make the questions more achievable. Increases in item facility were, however, minimal, and were accompanied by a small decrease in discrimination power.
It is apparent in scanning the data that the questions in which the students did best were those where the keyword was a recognised ‘trigger’ for a particular structure type (such as “enough”, “unless”, “been”). In these cases, these words are habitually associated with the structures required. The most difficult questions used more general items as keywords (e.g. “can’t” instead of the trigger phrase “can’t have”). Students recognise the ‘trigger’ phrase, which calls to mind the structure to be used, rather than understanding the full meaning of the original sentence and then trying to re-express it in a different way. This suggests to me that the successfully completed keyword transformations in the exam were more the result of memorised knowledge about the structures rather than an ability to actually manipulate the structures of English.
I believe that this question type, and question 1 as well, requires a degree of abstracting ability that is not fully there in 12 and 13 year olds, which does not reflect their level of cognitive development.
4. Example question…
We have learned lots of things in English last year.
This was an error correction question. Here, the skill tested was primarily one of recognition. The students had to look at a text in which each sentence possibly contained an extra word. They had to decide firstly whether or not a sentence was wrong, then isolate the extra word and report it.
In the trial exam, this question proved easy enough, but interestingly, did not discriminate very well. In the real exam, we replaced the text with individual sentences and also made sure that each sentence had an extra word instead of giving the students the extra burden of first deciding whether a sentence was right or wrong before correcting it. There was a small increase in facility, but most interestingly a very large increase in the discrimination power of the test. The better students seemed more certain about the task.
These observations in general suggest to me that in grade 8 students are still thrown by uncertainty, and while they are aware of correct and incorrect usage in English, they are not confident producers of usage. It is also probably the case that these kinds of highly abstract question are not an ideal vehicle for 12 and 13 year olds. It should be born in mind that FCE and its question types are intended for older students. We are currently discussing whether we change our Use of English aims in grade 8 or decide to keep them but alter the way we try to achieve them. It is almost sure that the question types themselves will go through some serious changes.
Three definite and recognised reading subskills were tested here. Exam specifications were in terms of the kind of reading subskill that we felt should be in the syllabus. The test analyses show that we will have to somehow integrate a text level element into the specifications. The test in the trial exam was carefully selected, and the results showed a reasonable discrimination power, despite the fact the tasks were quite easy. The test in the real exam was much too easy, to the extent that we gained little worthwhile information regarding students reading ability.
Instead of just referring to subskills, I think it is necessary to refer also to the type of text to be read, and the type of situation the information is to be applied to. In other words, the range of text types and levels of difficulty together with a range of ‘real world’ task types needs to be specified.
In curriculum terms, we also need to think about the place of more complex reading skills amongst our aims. Particularly, inferential skills.
We are very happy with our writing test. The analyses of both exams showed that both the trial and the real exams had an almost equal level of difficulty (which in the context of our marking scheme, is more indicative of consistency in evaluation) and the discrimination power of this part of the test was very high.
We used the following band and scale descriptors, and standardisation procedures to increase the reliability of what is inevitably a subjective process; scores rely on marker judgements rather than objectively measurable features of the text. Standardisation procedures involved all markers marking a representative sample of the answers according to their interpretation of the descriptors, and then discussing and agreeing on variations in marking for each one. In this way, a common interpretation was strengthened.
Our curriculum will have to specify a wider range of text types. There is more to write in this world than just letters!
As with the reading test the real exam was much to easy. The analyses of the exam showed a reasonable item facility and acceptable discrimination on the trial exam, but a greatly increased facility and no discrimination on the real exam. This is almost certainly due to hasty selection of material; there is no doubt that we were more careful in the selection and discussion of the listening material in the trial exam. In the trial exam, material was chosen after lengthy discussion in the preparation commission. With pressures of other work, and of time, this intense discussion was absent when it came to the real exam.
It also suggests that the specifications for listening material above are not sufficient to give a clear picture of the kind of task required. Exam writers were not entirely clear as to how these specifications were supposed to translate into reality. Again, as with reading, I feel that we need to mention situation type and text level, as well. There was a certain amount of ‘Oh, I think they will be able to do this’ in the selection of the task.
Our speaking tasks and methods of evaluation are heavily based on the Cambridge PET speaking exams. The exam analyses showed that item facility and discrimination power were consistent over the two exams. The discrimination power, indeed, was very high. All in all, we feel that our oral tests gave an accurate and trustworthy result.
One reason for this is, I feel, the care that was taken in setting up the aims of the test, and standardising evaluation. The Lise administration had to be convinced that our testing was going to work, as the approach we took was different from the traditional one by one interview. We therefore had to specify very clearly the categories of evaluation:
These categories were expanded into precise speaking descriptors, which as you will see are firmly based on the PET criteria, an examination the students are familiar with.
Part of the consistency also arises from the careful preparation for evaluation. As with the writing section of the exam, markers went through a standardisation process, which involved watching a video of students performing tasks similar to the exam tasks, and then evaluating them according to the criteria. Standardisation revealed differences of opinion (as it should!) and also allowed these differences to be reconciled.
The objectives were very clear in the markers’ minds, and make an admirable basis for curriculum objectives, too.
Another contributor to this consistency was standardisation of task. Students went into the speaking test in pairs, and discussed with an interlocutor. The first task, after some general introductory “nerve settling” exchanges, was a fairly controlled pair work activity, where students might be ask to discuss which of a limited set of options would be the most suitable for a particular purpose, or to order lists according to importance. The issue here was to keep the task achievable, and not to have to rely too much on student inspiration and creativity…we are after all testing oral ability, not creativity! The second task was to discuss a picture with the interlocutor on a one to one basis.
The interlocutor plays no part in the evaluation. The two judges for each pair sit behind the students taking the test, but are able to make eye contact with the interlocutor when necessary (as when, for example, they feel they need more from one of the students in order to make a judgement).
Before discussing the relationship with the curriculum in more depth, two immediate points spring to mind. First, the only area of the exam, which actually improved between the trial and the real exam, was the grammar section. This section was subject to careful and systematic development as a result of the analysis of the trial exam. This shows to me how useful a careful process of studying and interpreting the simple statistical data derived from an item analysis can be.
Secondly, testing objectives seem to be most effective when they create a clear image of what is to be achieved in the teacher’s mind. This is, in my opinion, what lies behind the success of the productive skills tests. The testing aim and evaluation methods were very clear in the examiners minds, and they were therefore confident in the execution of these tests.
I believe that the relatively disappointing results of the receptive skills tests stem from a lack of co-ordination and mutual agreement on the nature of receptive skills, and the absence of a clear group perception of the aims of their receptive skills work on the part of the English language departments in the school. I do not mean by this that our teachers are ignorant of what receptive skills teaching is all about (they know it very well!), but simply that we do not have a clear, practical and agreed set of aims for the school as a whole.
Currently, at METU College, we are in the middle of an extremely large curriculum renewal and development programme. This programme covers every level from grade 6 and every department, not just English. There is much being discussed and debated, and this is all very much ‘work in progress’. The basic model we are following is hierarchical…
…with the inspiration for each level being the level above. The school mission statement outlines the reason for the school’s existence, what it believes about education and how it proposes educating its students. Similar, subject philosophies are a statement of what the various departments believe, propose to do and why. General aims are wide ranging, general and inspirational, with the specific aims being more worldly and measurable. The ‘Agreed essential learning activities’ are those activities which are considered essential to the achievement of the curriculum. And unit plans, are, of course, descriptions of what it to be done in each unit of work.
The higher levels of this hierarchy are inspirational rather than concrete and measurable. They are a statement on out part of what we want our students to be like, to know and to be able to do by the end of their education with us. They guide us in setting up and describing our more concrete, mundane and measurable objectives. Our testing objectives are firmly related to these aims and objectives at all levels: specified in unit plans; present in the specifications of benchmark examinations such as the Lise prep exemption exam; in procedures established to monitor and evaluate the curriculum as a whole.
During this process, many questions have been raised as to why we took this approach. In particular, the following have been asked:
Here are some answers:
A curriculum is primarily a statement of purpose. A statement of aims and how you propose achieving them. Everybody, at every level in any activity needs their own aims. This is why programmes such as that produced by the Ministry of Education are not enough, and why our teachers need to be deeply involved in the process of development. A teacher implementing a programme, which is not HER programme, is not a teacher. She is a robot. A successful programme requires all involved to make the aims their own by contributing to their formulation. Only in this way will all teachers involved in a programme, fully understand it, fully agree with it and be fully able to implement it. This is something the Ministry accepts, by the way. In conversations with members of the Board of Education and other senior members of the Ministry, I have realised that the Ministry is now looking on its programmes as guiding frameworks rather than prescriptive toolkits.
The most obvious example of a ‘ready made’ curriculum is a coursebook. An interestingly, the difference between a successful and an unsuccessful coursebook in a school depends not so much on the quality of the book as on the degree to which the teachers and the students like the book. In other words, the degree to which they feel it is helping them achieve their aims. Course books which are really quite dreadful (in my opinion, of course!) such as the old ‘Access to English’ series, or coursebooks which are methodologically suspect, such as the original ‘Streamline’ series, have all been used with success simply because they were liked. Where coursebooks fail, it is normally because they don’t fit in with the teacher’s own aims. And, in the same way that coursebooks cannot provide a teacher’s own aims, neither can any other imported ready made curriculum.
In a similar way, I feel it is a mistake to look for aims within ELT or applied linguistics. Aims, goals and objectives are nothing more or less than a characterisation and specification of what people involved in an activity want to achieve. So, people and schools can have aims. School subjects do not have aims. ‘ELT’, as a subject, does not have aims. It just has content, categories and parameters. In a very real sense, to say ‘The aim of ELT is x, y and z’ is a meaningless statement. Better to say ‘We are teaching English to our students in order to help them ……”.
Applied linguistics is a field which applies discoveries and theories from linguistics to a number of practical fields, including language teaching. It has much to say and should be taken very seriously, because we have learned an immense amount about what we know and don’t know about language teaching over the past century. But Applied Linguistics can only give us useful advice about the journey. It cannot be a source of aims and objectives, though it can help us characterise and specify them. Applied Linguistics can inform the process of curriculum development, but it cannot guide it. It can NOT tell us where to go and why we are doing it. We, as teachers, have to make that decision for ourselves.
The testing objectives we drew up during the process of developing the exemption examination have done a great deal to help us understand where we are going with our teaching. They have helped us to realise that good teaching objectives as well as good testing objectives have to be concrete and real enough to produce a rich, real, concrete picture of what we are trying to achieve in our minds. Not too abstract. We are currently producing a profile of the ‘ideal’ 8th grade student from which we will establish some of our curriculum objectives. We feel that such a profile will help us establish curriculum and testing objectives that are real, applicable, laden with meaning and practical.
Language and web services throughout Turkey
Phone: 0532 662 3936