Stages and Activitiesin Language Test Development

来源 :当代学术论坛 | 被引量 : 0次 | 上传用户:ghostraider
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Abstract: Test development is the entire process of creating and using a test. The process is organized into three stages: design, operationalization, and administration. While test development is generally linear, with development progressing from one stage to the next, the process is also an iterative one, in which the decisions that are made and activities that are completed at any stage may lead us to reconsider and revise decisions and repeat activities that have been performed at another stage. Organizing test development in this way helps us monitor the usefulness of the test throughout the development process and produce a useful test.
  Keywords: stages, activities, language, test, development
  
  1. Introduction
  
  Test development is the entire process of creating and using a test, beginning with its initial conceptualization and design, and culminating in one or more achieved tests and the results of their use. The amount of time and effort we put into developing language tests will, of course, vary depending upon the situation. At one extreme, with low-stakes tests, the processes might be quite informal, as might be the case if one teacher were preparing a short test to be used as one of a series of weekly quizzes to assign grades. At the other extreme, with high-stakes tests, the processes might be highly complex, perhaps involving extensive trailing and revision, as well as coordinating the efforts of a large test development team. This might be necessary if a test were to be used to make important decisions affecting, a large number of people. We would again point out that although the amount of time and effort that goes into test development may vary, depending on the use for which the test is intended, the qualities of usefulness need to be carefully considered and this consideration should not be sacrificed in either low-stakes or high-stakes situations.
  We organize test development conceptually into three stages: design, operationalization, and administration. We say “conceptually” because the test development process is not strictly sequential in its implementation. In practice, although test development is generally linear, with development progressing from one stage to the next, the process is also an iterative one, in which the decisions that are made and the activities completed at one stage may lead us to reconsider and revise decisions, and repeat activities, that have been done at another stage. While there are many ways to organize the test development process, we have discovered over the years that this type of organization gives a better chance of monitoring the usefulness of the test throughout the development process and hence producing a useful test.
  
  2. Stages and activities in language test development
  
  Stage 1: Design
  In the design stage we describe in detail the components of the test design that will enable us to insure that performance on the test tasks will correspond as closely as possible to language use, and that the test scores will be maximally useful for their intended purposes. Design is in general a linear process, but in some cases some activities are iterative, that is, will need to be repeated a number of times. For example, there are certain parts of the process, such as considering qualities of usefulness and resource allocation and management, that are recurrent and will need to be considered and thought about throughout the process.
  The product of the design stage is a design statement, which is a document that includes the following components:
  1. a description of the purpose (s) of the test,
  2. a description of the TLU (target language use) domain and task types,
  3. a description of the test takers for whom the test is intended,
  4. a definition of the construct (s) to be measured,
  5. a plan for evaluating the qualities of usefulness, and
  6. an inventory of required and available resources and a plan for their allocation and management.
  The purpose of this document is to provide us with a principled basis for developing test tasks, a blueprint, and tests. It is important to prepare this document carefully, for this enables us to monitor the subsequent stages of development.
  There are six activities involved in the design stage, corresponding to the six components of the design statement, as indicated above. These are described briefly below.
  Describing the purpose (s) of the test
  This activity makes explicit the specific uses for which the test is intended. It involves clearly stating the specific inferences about language ability or capacity for language use we intend to make on the basis of test results, and any specific decisions which will be based upon these inferences. The resulting statement of purpose provides a basis for considering the potential impact of test use.
  Identifying and describing tasks in the TLU domain
  This activity makes explicit the tasks in the TLU domain to which we want our inferences about language ability to generalize, and describes TLU task types in terms of distinctive characteristics. It provides a set of detailed descriptions of the TLU task types that will be the basis for developing actual test tasks. These descriptions also provide a means for considering the potential authenticity and interactiveness of test tasks.
  Describing the characteristics of the language users/test takers
  This activity makes explicit the nature of the population of potential test takers for whom the test is being designed. The resulting description provides another basis for considering the potential impact of test use.
  Defining the construct to be measured
  This activity makes explicit the precise nature of the ability we want to measure, by defining it abstractly. The product of this activity is a theoretical definition of the construct, which provides the basis for considering and investigating the construct validity of the interpretations we make of test scores. This theoretical definition also provides a basis for the development, in the operationalization stage, of test tasks. In language testing, our theoretical construct definitions can be derived from a theory of language ability, a syllabus specification, or both.
  Developing a plan for evaluating the qualities of usefulness
  The plan for evaluating usefulness includes activities that are part of every stage of the test development process. A plan for assessing the qualities of usefulness will include an initial consideration of the appropriate balance among the six qualities of usefulness and setting minimum acceptable levels for each, and a checklist of questions that we will ask about each test task we develop. Assessing usefulness in pretesting and administering will include collecting feedback. This will deal with a range of information, both quantitative, such as test scores and scores on individual test tasks, and qualitative, such as observers’ descriptions and verbal self-reports from students on the test taking process. Finally, the plan will include procedures for analyzing the information we have collected. This will include procedures such as the descriptive analysis of test scores, estimates of reliability, and appropriate analyses of the qualitative data.
  Identifying resources and developing a plan for their allocation and management
  This activity makes explicit the resources (human, material, time) that will be required and that will be available for various activities during test development, and provides a plan for how to allocate and manage them throughout the development process. This activity further provides a basis for considering the potential practically of the test, and for monitoring this throughout the test development process.
  Stage 2: Operationalization
  Operationalization involves developing test task specifications for the types of test tasks to be included in the test, and a blueprint that describes how test tasks will be organized to form actual tests. Operationalization also involves developing and writing the actual test tasks, writing instructions, and specifying the procedures for scoring the test. By specifying the conditions under which language use will be elicited and the method for scoring responses to these tasks, we are providing the operational definition of the construct.
  Developing test tasks and a blueprint
  In developing test tasks, we begin with the descriptions of the TLU task types provided in the design statement, and modify these, again taking into consideration the qualities of usefulness, to produce test task specifications. These comprise a detailed description of the relevant task characteristics, and provide the basis for writing actual test tasks. We would note that the particular task characteristics that are included and the order in which they are arranged in the test task specifications are likely to vary somewhat from one testing situation to another.
  A blueprint consists of characteristics pertaining to the structure, or overall organization, of the test, along with test task specifications for each task type to be included in the test. The blueprint differs from the design statement primarily in terms of the narrowness of the focus and the amount of detail included. A design statement describes the general parameters for the design of a test, including its purpose, the TLU domain for which it is designed, the individuals who will be taking the test, what the test is intended to measure, and so forth. A blueprint, on the other hand, describes how actual test tasks are to be constructed, and how these tasks are to be arranged to form the test.
  Writing instructions
  Writing instructions involves describing fully and explicitly the structure of the test, the nature of the tasks the test takers will be presented, and how they are expected to respond. Some instructions are very general and apply to the test as a whole. Other instructions are closely linked with specific test tasks.
  Specifying the scoring method
  Specifying the scoring method involves two steps:
  1. defining the criteria by which the quality of the test takers’ responses will be evaluated and
  2. determining the procedures that will be followed to arrive at a score.
  Stage 3: Test administration
  The test administration stage of test development involves giving the test to a group of individuals, collecting information, and analyzing this information, for two purposes:
  1. assessing the usefulness of the test, and
  2. making the inferences or decisions for which the test is intended.
  Administration typically takes place in two phases: try-out and operational testing.
  Try-out involves administering the test for the purpose of collecting information about the usefulness of the test itself, and for the improvement of the test and testing procedures. The revisions made on the basis of feedback obtained from a tryout might be fairly local, and might consist of minor editing. Or the analysis of the results of the try-out might indicate that a more global revision is required, perhaps involving returning to the design stage and rethinking some of the components in the design statement. In major testing efforts, tests or test tasks are almost always tried out before they are actually used. In classroom testing, try-outs are often omitted, although we strongly recommend giving the test to selected students or fellow teachers in advance, since this can provide the test developer with information that can be useful in improving the test and test tasks before operational test use.
  Operational test use involves administering the test primarily in order to accomplish the specified use/purpose of the test, but also for collecting information about test usefulness. In all cases of test development, we administer and score the test and then analyze the results, as appropriate to the demands of the situation.
  Procedures for administering tests and collecting feedback
  Administering a test involves preparing the testing environment, collecting test materials, training examiners, and actually giving the test. Administrative procedures need to be developed for use in both try-out and operational test use. Collecting feedback involves obtaining qualitative and quantitative information on usefulness from test takers and test users. Feedback is collected first during tryouts and later during operational test use.
  Procedures for analyzing test scores
  Describing test scores: using descriptive statistics to characterize the quantitative characteristics of test scores.
  Reporting test scores: using statistical procedures for determining how to report test scores most effectively both to test takers and other test users.
  Item analysis: using various statistical procedures for analyzing and improving the quality of individual test tasks, or items.
  Estimating reliability of test scores: using a number of statistical procedures for estimating the consistency of test scores across different specific conditions of test use.
  Investigating the validity of test use: includes a number of logical considerations and empirical procedures, both quantitative and qualitative, for investigating the validity of inferences made from test scores under specific conditions of test use.
  Archiving
  Archiving involves building up a large pool, or bank, of test tasks so as to facilitate the development of subsequent tests. Archiving makes it possible to make the test potentially more adaptable or appropriate to specific kinds of test takers. Typically, archiving procedures are designed to allow easy retrieval of tasks and important information about the task. Archiving also facilitates the maintaining of test security. Finally, archiving procedures may be used to facilitate the selection of tasks with particular characteristics.
  
  3. Conclusion
  
  This specific set of procedures is for developing useful language tests. Whatever the situation might be, we strongly believe that careful planning of the test development process in all language testing situations is crucial, for three reasons. First, and most importantly, we believe that careful planning provides the best means for assuring that the test will be useful for its intended purpose. Second, careful planning tends to increase accountability: the ability to say what was done and why. As teachers we must expect that test users (students, parents, and administrators) will be interested in the quality of our tests. Careful planning should make it easier to provide evidence that the test was prepared carefully and with forethought. Third, we favor careful planning because it increases the amount of satisfaction we experience. When we have a plan to do something that we value, and complete it, we feel rewarded. The more careful the plan (the more individual steps it contains) the more opportunities we create to feel rewarded. The less careful the plan, the fewer the rewards. At the extreme—no plan at all except the completion of the test—there is only one reward: the completed test.
  
  参考文献:
  [1] Lyle F. Bachman & Adrian S. Palmer, 1996, Language Testing in Practice [M], Oxford, U. K., Oxford University Press.
其他文献
摘 要:本文在综合研究国内外劳动力转移理论基础上构建了衡量新疆农村剩余劳动力转移影响因素的指标体系,运用计量经济学的方法,对影响新疆农村剩余劳动力转移的主要因素进行实证研究。实证研究表明:二元经济结构系数与新疆农村剩余劳动力转移规模相关性不高,城市化水平和第三产业发展及职业介绍结构发展对新疆农村剩余劳动力转移有显著影响作用,最后提出加快新疆农村剩余劳动力转移的建议。  关键词:新疆;剩余劳动力;影
期刊
摘 要:农村基层组织人员协助政府工作和管理村公共事务,非法占有公共财物、挪用公款、索取他人财物或非法收受他人财物的行为常有发生,其行为是否属于刑法第九十三条第二款规定的“其他依照法律从事公务的人员”,刑法规定不明确。在处理涉及农村基层组织人员职务犯罪案件时,决定当事人的行为应如何适用法律规定是当前司法实践中出现的新问题、新情况。因此,对农村基层组织人员涉嫌职务犯罪涉及的法律适用问题进行研究和分析,
期刊
摘要:写生是学习绘画的主要训练方法,也是绘画创作过程中一个极其重要的环节。通过写生可获得第一手的创作素材,真实而又自然。在写生中可以把研究生活。消化传统,酝酿创作三个环节结合在一起。写生与自然、生活、情感的关系十分密切,走进自然、体验生活、表达情感等一系列主体见之于客体的活动,都能在写生中实现。自然生活是写生的源泉,而情感则是写生的灵魂,无情则无谓之感人。从艺术创作中的主客体关系的角度看,写生是一
期刊
媒体形象通常指媒体的社会形象,是公众对媒体所持有的觀点和看法,是媒体消费对于媒体的知觉性概念,是由媒体外在和内在的特征和风格构成的。良好的、成功的媒体形象是媒体组织的无形资源和资本,是市场经济条件下媒体市场竞争中媒体机构制胜的法宝。在频道内容日益同质化的今天,频道包装和宣传则成为吸引观众注意力的必然。    一、中国电视媒体形象包装应重视文化内涵    媒体形象是“其外在传播形式与内在媒介素养的综
期刊
摘 要:科技的进步,信息和交通的高速发展,世界各国在相互交往中融合,时代要求法律进行修改以服务经济的发展。在刑事诉讼法修改提上人大议事日程时,检察权的改革也引起了社会各界的关注。本文从世界各国检察权的发展规律和司法实践去探讨我国检察权配置的重点和方向。  关键词:检察权;审判监督;侦查监督;公益诉讼;职务犯罪侦查    “尊重保障人权”宪法修正案的生效,“构建和谐社会”和“科学的发展观”理念的提出
期刊
加强干部建设、激发队伍活力,是做好各项国税工作的根基所在、动力所在,组织税收收入、强化税收征管、规范税收执法、优化纳税服务、加速税收信息化建设,都需要一支素质过硬、作风优良、敬业进取的干部队伍为保障。但改革的进一步深化、时代的变迁、社会的进步,时刻对税务干部的思想、作风和行为产生潜移默化的影响,从而产生这样那样的不和谐因素,需要我们正确去解决、去引导,激发出每个干部的工作热情。    一、国税队伍
期刊
摘要:本文通过对新疆锡伯族的介绍,着重介绍了锡伯族的弓箭文化。在此基础上试分析文化和思维方式的关系,并提出弓箭文化是锡伯族传统文化中的一个重要组成部分,其文化的性质随着民族的发展发生变迁,构成了锡伯民族战胜困难、顽强拼搏的精神支柱,对于民族的发展产生了较大的影响。直至今日挖掘和弘扬这一文化对于锡伯民族的发展和社会主义精神文明的建设也都具有着积极的影响。  关键词:锡伯族;弓箭文化;思维方式    
期刊
Abstract:Overlooked by practioners and most of the instructors, summary is in effect beneficial in that learners have access to promoting thinking capability as well as comprehensive level such as voc
期刊
摘要:本文阐述了矿产资源安全的概念,分析了我国矿产资源的基本情况和特点,以及资源安全危机所面临的国内和国外危机;分析了问题的原因,并提出了解决问题的对策,对解决我国矿产资源安全危机有重要意义。  关键词:矿产资源;安全危机;对策    矿产资源是社会发展的必需品,是经济发展的基石,保证稳定的矿产资源供应,是各个国家发展战略的优先领域。当前,矿产资源安全作为国家经济安全的重要组成部分,对国家经济安全
期刊
2007年9月6日傍晚,以色列空军第69战斗机中队18架F-16I战斗机,幽灵般地越过边界,沿着叙利亚的海岸线超低空飞行,成功躲过叙利亚军队苦心经营多年的防空体系,对叙方纵深100千米内的所谓“核设施”目标实施了毁灭性突击。此次以色列成功的军事行动举世瞩目,令叙利亚最高层大为震惊,也让俄罗斯颜面扫地,并对广泛购买装备俄制防空武器的国家产生了极大冲击和潜在影响。人们不禁要问,以军如何以F-16I等非
期刊