Appropriate Instruments for Educational and Political Constituencies

Jeff Palmer
14 min readApr 16, 2024

--

The standards and responsibilities in this section describe test development activities necessary to produce psychometrically and legally defensible high stakes tests efficiently and to a high standard of quality. These activities may be completed by either the state or the vendor. If responsibility for any portion of an activity is assigned to the vendor, the RFP (Request for Proposal) and resulting contract should so provide. If the state decides to retain responsibility for the activity, the state may seek advice from the vendor but should clearly indicate that expectation in the RFP and resulting contract.

For each activity or portion of an activity assigned to a vendor, the RFP and resulting contract should describe in detail what is expected of the vendor, any special conditions or limitations, and the compensation to be paid. If a state requests changes or delegates additional responsibilities to the vendor after the contract has been signed, the state will likely need to renegotiate the price.

A policy for quality assurance for all test instruments shall be developed and implemented. The policy should include specific activities, timelines, and locus of responsibility for evaluating the quality of each assessment instrument, including but not limited to, item quality, graphics quality, print quality, forms quality, equating and scaling accuracy, and quality of ancillary materials (e.g., measuring instruments, lab equipment). It shall be described in the proposal and incorporated into the contract.

In test development, quality assurance is a split responsibility between the state and the vendor. The vendor maintains primary responsibility for the quality of the work it produces, but the state ensures that the vendor’s product matches the state’s intent. The vendor must have detailed quality control procedures developed prior to contract initiation and be prepared to staff and implement them efficiently and effectively. The vendor should also clearly articulate the expectations for interaction with state staff during this process so the state can adequately schedule and staff required reviews and signoffs to avoid unnecessary delays. All deliverables should be thoroughly checked by knowledgeable staff before being sent in draft form to the state. Following established quality control procedures will decrease the likelihood of errors and increase the vendor’s reputation for delivering a quality product. The vendor may consider seeking outside review and/or certification of its quality control procedures to enhance their usefulness and credibility. It is also advisable for the vendor to formally document all reviews and signoffs and to have multiple checks where feasible and appropriate.

The state shall develop and implement a quality assurance policy that serves to monitor the vendor. This policy may incorporate techniques such as signing off on all page proofs, checking final copies of test booklets before distribution to schools, and sampling score reports to ensure typographic quality.

It is important at all stages of development to check work that has been completed to ensure quality, accuracy, adherence to professional standards, and satisfaction of all program policies, administrative rules and legislative statutes. When tests will be used to make high-stakes decisions, multiple checks and rechecks are imperative to detect and correct any errors. While, vendors maintain primary responsibility for the quality of their work, vendor checks do not entirely replace the need for quality reviews by states, especially in high-stakes situations. Such quality reviews should be systematic, including proper training for staff and adequate enforcement mechanisms, and must be implemented in an effective and timely way.

Equating and scaling accuracy are especially important with additional uses of data that depend on sound vertical scaling and annual equating of forms to measure Annual Yearly Progress. It is especially important that equating and scaling studies be conducted on the appropriate populations to which the data will be generalized. In most cases, this means within-state samples for the equating and scaling studies. If states are using publishers’ off-the-shelf standardized tests intact, it is appropriate to use the publisher’s national equating and scaling groups. If the states are augmenting publisher’s off-the-shelf tests with state-specific items, thus creating new scales that measure state standards more accurately, it is important that the samples for the equating and scaling studies be taken from inside the state, or from a source that can be proven sufficiently similar for statistical purposes.

If requested by states, vendors shall be prepared to explain why the test is an appropriate instrument for its intended uses to educational and political constituencies, the press or in court. In cases where such activities are extensive, provisions defining compensation should be included in the contract.

The state has primary responsibility for defending its programs and decisions but can be aided substantially by vendor expertise and knowledge. Vendors may be most knowledgeable about their products and may have consultants who can provide a national perspective on the issues of concern to various constituencies. The vendor may also have key information required for defending testing activities in a legal forum. All such involvement should occur only when specifically requested by authorized state staff. Costs for such vendor activities should be explicitly included in the RFP and contract or should be dealt with in a separate addendum as the need arises. Vendors should have adequate and appropriate staff to handle such activities.

The state retains primary responsibility for responding to questions regarding the testing system and instruments. The state must be prepared to answer such questions from its legislature, the public, and/or the media. To this end, states must employ some staff with substantive knowledge of psychometric issues and a familiarity with the particular state testing system.

States may request assistance from developers in explaining tests to important political and educational constituencies, such as state legislators, the press, education organizations, and parent groups. In cases of legal challenge, states may need assistance in court explaining how the test development process satisfied all necessary psychometric and legal standards. Contracts should indicate whether, or under which circumstances, such assistance would require additional compensation.

State agencies must also be proactive when it comes to communicating with the legislature. Longstanding relationships with legislators and legislative staffs will enable the agency to help shape policy that is sound and easily justifiable, rather than having to react to legislation that is thrust upon them.

The following documents shall be provided by vendors to states: a) an annual management plan, including a schedule for all tasks required to carry out the plan; b) test development and construction specifications; c) written status reports at regular, agreed-upon intervals as provided for in the contract. In between, updates should be provided as needed through a medium determined by the state (e.g., via phone or email). Depending on the complexity of the program and contract specifications, monthly or quarterly planning meetings with the vendor should be held to discuss current progress, upcoming tasks, problems, and mid-course corrections.

Regular communication between state and vendor staff is a key component of a successful testing program. Detailed documentation of program activities is also important for creating a historical record, informing policymakers, and providing supporting evidence in the event of a legal challenge. Communication and documentation should not be left to chance but rather should be part of the project plan from the outset.

Statewide assessments are typically moving targets in the early years of development because priorities change, laws change, the learning curve for state staff is steep, and timelines may be ambitious. Therefore, regular communication between agency staff and vendor staff is essential as the details of the program are worked out. Planning meetings are useful for problem-solving and consensus building while written reports provide useful documentation of decisions made and work completed.

The state has responsibility for devising its own plan for test development, which should include those areas that the vendor cannot feasibly control. These may include the timely identification and convening of educator review panels; item selection or review; and collecting and providing all necessary information to the vendor for the development of supplementary test materials (e.g., test aids and administrative manuals).

Whether or not a state chooses to delegate test development activities to a vendor, there are some key functions that state staff must perform. For example, the state is in the best position to seek nominations and to constitute educator committees for such functions as reviewing items, selecting items, recommending passing standards, or scoring open-ended responses. States are also responsible for setting testing dates and procedures and for communicating this information to the vendor in time for inclusion in test administration manuals and other related testing materials. These decisions must be communicated to the vendor with sufficient lead time, as defined in the contract, to allow for typesetting, proofing, printing and distribution of these materials to the schools. Assessments developed for state testing programs must be consistent with all relevant professional standards for Educational and Psychological Testing.

The test Standards reflect a consensus among the membership of the sponsoring organizations and they cover major aspects of testing such as validity, reliability, setting passing standards, opportunity to learn, item development, bias reviews, equating, accommodations, English language learners (ELL), scoring, reporting, and documentation. Professional judgment is required in applying relevant standards and should reflect the goals of state policymakers to the maximum extent consistent with best professional practice.

The state should develop and implement a policy for monitoring the vendor’s work for consistency with all relevant professional standards. Options include the use of the state Technical Advisory Committee, employing external psychometric consultants hired directly by the state, or hiring an independent evaluator to provide periodic critical reviews of testing program activities.

While it is primarily the responsibility of the vendor to ensure that assessments developed for state testing programs are consistent with the test Standards, states also have a responsibility to be familiar with relevant standards and to institute procedures for systematically monitoring the degree to which state assessments are consistent with those standards. In particular, states are in the best position to investigate acceptable applications of the standards that are most consistent with the goals of state policymakers and to advocate for changes by the vendor when warranted. Closely monitoring consistency with the test Standards also allows the state to anticipate and correct any problems early in the process, and to be in a stronger position in the event of a legal challenge to the testing program.

A technical manual which includes all relevant psychometric information for each assessment in the testing program shall be developed. The technical manual shall be completed within 6 months of the first live test administration and shall be revised annually thereafter, a copy of which shall be available within 6 months of each successive test administration. The content and timeliness of the technical manual shall be described in the proposal and incorporated into the contract.

A technical manual is essential in documenting that each assessment instrument meets all professional standards for psychometric and legal defensibility. The technical manual also is an organized repository of psychometric information that should be available to users of test data, including researchers and the public. The technical manual should include, but is not limited to, information on: purpose, test blueprint, test development, validity, reliability, accommodations and testing ELLs, security, administration, scoring, equating and scaling, setting performance standards, opportunity to learn, reporting, and appropriate use and interpretation of test data. Appendices should include related materials such as relevant state statutes, administrative regulations, state standards, sample items, committee rating forms, state and district performance summaries by ethnic group, and other relevant information.

The state bears two responsibilities with respect to the technical manual: First, to provide the vendor with all information required for the creation of the technical manual, and second, to review and approve the final version of the technical manual.

It is important for the state and the vendor to work together on the Technical Manual. While the vendor takes major responsibility for the initial draft, the state has an important responsibility for furnishing necessary information to the vendor. For example, if the state has identified and coordinated educator review committees, the state must provide the vendor with a written description of the selection procedures and relevant information about the participants for inclusion in the appropriate chapter of the technical manual. The state should also undertake a thorough review of the draft technical manual with an eye to accuracy and usability by its intended audiences. The state should also provide the final signoff prior to publication and should assist the vendor in revising it as appropriate for subsequent testing cycles.

For high-stakes testing programs, a technical advisory committee (TAC) should be established to guide program activities. States may wish to establish additional committees for special purposes, as needed, such as for hand scoring or item development issues.

Guidance from both in-state and out-of-state experts on a TAC can provide troubleshooting assistance, a check on adherence to professional psychometric standards, support for psychometrically necessary but politically unpopular decisions, and information on research and options from other programs. TAC members can also be available for questions and comment as issues arise between regularly scheduled meetings. In addition to the permanent TAC, states may find it useful to further establish similar ad hoc advisory committees in areas that present special challenges or are otherwise of special interest to their assessment program, such as hand scoring for a state that makes extensive use of open-response items on its assessment.

Responsibility for meeting arrangements may be delegated to the vendor, provided that authority for appointment or removal of TAC members remains with the state. However, the state should realize the expertise that the vendor possesses in terms of creating the TAC. The vendor can, in many cases, arrange meetings with much more ease administratively than can states. In addition, the vendors are knowledgeable about the preeminent experts that would be available to serve on the TAC.

Tasks, timelines, and the party responsible for each should be clearly delineated in contract documents for each assessment to be developed. Provisions should be included in the contract establishing conditions under which the state may add, delete, or modify contract requirements through contract amendments or change orders.

To facilitate communication and coordination between agency and vendor staff, it is important for both parties to know ahead of time what tasks must be completed by what deadline and by whom, for the assessments being developed to be ready for implementation by the intended date. The tasks should be described in enough detail so that contract managers can monitor performance on a regular basis and take necessary steps to correct any deficiencies before major problems develop. The vendor should know who in the state is authorized to accept items, forms, policies, reports etc. There needs to be a clear system of sign-offs and chain of command, including allowance for emergencies (e.g., what to do if the testing director is out sick during the most crucial time of approving proofs of test booklets or score reports or rubrics.)

New testing programs should expect some unanticipated delays and difficulties and should include specific contract provisions for handling such situations. The goal must be to ensure that all tests are sound meet professional standards for psychometric and legal defensibility, even if that means revising timelines or task assignments to correct deficiencies. Unanticipated problems with item development, field testing, scoring or analyses can jeopardize the defensibility of a test if extra time is not taken to repeat steps or take appropriate corrective action. In the long run, failure to do so may adversely affect the viability of the entire state testing program.

The contract should apportion the financial responsibility for such delays based on the actions of the parties, including but not limited to, whether the delay resulted from vendor nonperformance, the state failing to meet agreed upon deadlines, or the state having made changes.

Written policies should be developed and implemented for: (a) test security, (b) test specifications, © item reviews (including but not limited to: sensitivity, differential performance, opportunity to learn, and psychometric quality), (d) accommodations for students with disabilities, (e) testing English language learners, (f) maintenance of confidential testing information, and (g) appeals.

Written state policies serve the dual function of communicating procedures and ensuring that no critical steps are left out. They also ensure that important decisions, such as whether a reader or calculator will be provided to students with disabilities, are made early in the process when decisions about test purpose, use, and content can be modified accordingly. This is also an ideal time, in consultation with agency counsel and a representative of the state attorney general’s office, to ensure that proposed test instruments will satisfy all applicable federal and state laws. The state may request that the vendor provide consultation, but final policy decisions should be the responsibility of the state.

This is an area in which the state can both receive assistance from the vendor as well as provide assistance to the vendor. Vendors or independent consultants can help the state craft sound policies, based on their experience in the business. The state must be wary, however, of relying exclusively on advocacy groups to help them craft their policies.

The state can, and should, aid the vendors in the area of test security. To this end, the state should: monitor schools or districts; spot-check schools and districts to ensure that policies are being followed; communicate policy expectations to schools and districts; implement policy for the reporting and investigation of test security breaches; and work to enact rules and legislation that adequately deal with test security breaches. Sanctions for security breaches should include civil penalties such as the loss of license or credentials for teachers and administrators as well as criminal penalties for the most egregious cases. In addition, test scores in schools or districts where there is solid evidence of cheating should be invalidated; this is particularly true in accountability systems where consequences are tied to school performance.

Detailed policies and procedures for ensuring the security and integrity of each assessment instrument and the secure destruction of unneeded documents shall be developed and implemented. In addition, the vendor must bear some of the responsibility for preventing cheating and for catching cheaters. The vendor must follow state test security policy in situations where one exists.

A variety of procedures may be appropriate depending on the purpose(s) and use(s) of each assessment. Such procedures might include, but are not limited to, confidentiality agreements, signoff and storage requirements for test materials, numbering and sealing test booklets, procedures for returning test materials, directions for administration, training for test administrators, test preparation guidance, analyses for monitoring potential cheating and for investigating irregularities, ethics standards for educators, and sanctions for those who violate security policies. Contracts should also consider what is done with answer sheets at the end of a contract, how and at what cost a contractor will fulfill an agency request for answer sheets, the state’s legal responsibilities for records storage, and the states policies and laws related to parental access to such things as test booklets and answer sheets. Many states have found it useful to include test security violations and sanctions in legislation or administrative regulations.

In cases where the state retains substantial responsibility for test item drafting or review through educator committees, those assigned to such committees should reflect the goals and philosophy of state policymakers and be knowledgeable about scientific (i.e., evidence-based) research on curriculum and instruction. States may also wish to involve external content experts in reviewing test items.

States often retain substantial control over the initial drafting or, at least, review of test items through panels of local educators and curriculum staff or consultants. In some cases, states have discovered that such committees do not always reflect the academic goals or philosophy of policymakers, nor do they necessarily apply an accurate knowledge of rigorous research on curriculum and instruction. Given their critical function, education policymakers should ensure that such committees reflect their philosophy and possess a sound grasp of the best research on curriculum and teaching. Alternatively, states may wish to establish item review procedures to ensure that the work of item writing committees is reflective of policymakers’ academic philosophy as well as scientific research. Policymakers may also find it useful to involve external content experts with national perspectives and expertise to review items.

The vendor has an affirmative responsibility to communicate to the state when items or item types do not reflect the best educational research. In addition, vendors must make efforts to correct situations where inappropriate standards are likely to lead to assessments that do not reflect the best educational research.

Vendors’ national consultants can provide states with comprehensive information about appropriate use of items types and applications being used in other states. Vendors’ consultants also follow educational research closely and can provide states with guidance on the latest scientific research. Where states are able to revise their standards prior to assessment implementation, vendors can provide professional advice regarding options consistent with best practice.

Jeff C. Palmer is a teacher, success coach, trainer, Certified Master of Web Copywriting and founder of https://Ebookschoice.com. Jeff is a prolific writer, Senior Research Associate and Infopreneur having written many eBooks, articles and special reports.

--

--

Jeff Palmer
Jeff Palmer

Written by Jeff Palmer

Jeff C. Palmer is a teacher, success coach, trainer, Certified Master of Web Copywriting and founder of https://Ebookschoice.com.