Testing And Assessment: An Employer's Guide To Good Practices
A document by the
U.S. Department of Labor
Employment and Training Administration
1999
[an Acrobat-pdf version of this information is also available, 212k]
Table of Contents
Forward
Personnel Assessment
Understanding the Legal Context of Assessment-Employment Laws and Regulations with Implications for Assessment
Understanding Test Quality-Concepts of Reliability and Validity
Assessment Tools and Their Uses
How to Select Tests-Standards for Evaluating Tests
Administering Assessment Instruments
Using, Scoring, and Interpreting Assessment Instruments
Issues and Concerns with Assessment
A Review-Principles of Assessment
Appendix
Sources of Additional Information on Personnel Assessment
Glossary of Assessment Terms
Foreword
PURPOSE of the GUIDE
In today's competitive marketplace and complex legal environment, employers face the challenge of attracting, developing, and retaining the best employees. Michael Eisner, CEO of the Disney Corporation, recognized the impact of personnel decisions on a business' bottom-line when he remarked, "My inventory goes home every night." This guide is to help managers and human resource (HR) professionals use assessment practices that are the right choices for reaching their organizations' HR goals. It conveys the essential concepts of employment testing in easy-to-understand terms so that managers and HR professionals can
- Evaluate and select assessment tools/procedures that maximize chances for getting the right fit between jobs and employees
- Administer and score assessment tools that are the most efficient and effective for their particular needs
- Interpret assessment results in an accurate manner
- Understand the professional and legal standards to be followed when conducting personnel assessment.
FORMAT of the GUIDE
This Guide is structured around a set of assessment principles and their applications. The information is organized so that readers from a variety of backgrounds will find the information presented in a clear and useful manner.
- Each chapter covers a critical aspect of the assessment process. The issues involved in each aspect are outlined at the beginning of each chapter.
- Thirteen principles of assessment are explained in the Guide. The last chapter (Chapter 9) summarizes the main points of the principles, serving as a review of the material discussed in the Guide.
- Appendix A offers a list of resource materials for those interested in more information on a particular topic, and Appendix B is a glossary for quick clarification of terms and concepts.
| The Guide is designed to provide accurate and important information regarding testing as part of a personnel assessment program. It gives general guidelines and must not be viewed as legal advice. |
Acknowledgments
Testing and Assessment: An Employer's Guide to Good Practices (Guide) was produced and funded by the Skills Assessment and Analysis Program in the U.S. Department of Labor, Employment and Training Administration, Office of Policy and Research (OPR) under the direction of Gerard F. Fiala, Administrator. The Skills Assessment and Analysis Program is directed by Donna Dye, Personnel Research Psychologist, who provided technical direction and support for this Guide.
The Guide was prepared under Department of Labor Grants with the North Carolina Employment Security Commission, Southern Assessment Research and Development Center and National O*NET Consortium; the New York Department of Labor; and the Utah Department of Employment Security. The Guide was completed under the direction of David Rivkin. Mr. Rivkin also served as editor of the Guide. Authors of this Guide were Syed Saad, Gary W. Carter, Mark Rothenberg, and Enid Israelson. Grateful acknowledgment is made to Phil Lewis, Patrice Gilliam-Johnson, Jonathan Levine, and Brenda Dunn for their contribution. Thanks are also given to Ann Kump, Helen Tannenbaum, Don Kreger, Kristin Fiske, and Marilyn Silver whose valuable suggestions were very much appreciated. Grateful acknowledgment is also made to Suzan Chastain, Department of Labor, Office of the Solicitor, Division of Civil Rights, and Hilary R. Weinerand and Cynthia Misicka of the Equal Employment Opportunity Commission for consultant review and insights into the final preparation of this Guide.
[back to Testing page] [back to Job page]
CHAPTER 1
Personnel Assessment
Personnel assessment is a systematic approach to gathering information about individuals. This information is used to make employment or career-related decisions about applicants and employees.
Assessment is conducted for some specific purpose. For example, you, as an employer, may conduct personnel assessment to select employees for a job. Career counselors may conduct personnel assessment to provide career guidance to clients.
Chapter Highlights
1. Personnel assessment tools: tests and procedures
2. Relationship between the personnel assessment process and tests and procedures
3. What do tests measure?
4. Why do organizations conduct assessment?
5. Some situations in which an organization may benefit from testing
6. Importance of using tests in a purposeful manner
7. Limitations of personnel tests and procedures-fallibility of test scores.
Principles of Assessment Discussed
Use assessment tools in a purposeful manner Use the whole-person approach to assessment. |
1. Personnel assessment tools: tests and procedures
Any test or procedure used to measure an individual's employment or career-related qualifications and interests can be considered a personnel assessment tool. There are many types of personnel assessment tools. These include traditional knowledge and ability tests, inventories, subjective procedures, and projective instruments. In this guide, the term test will be used as a generic term to refer to any instrument or procedure that samples behavior or performance.
Personnel assessment tools differ in
- Purpose, e.g., selection, placement, promotion, career counseling, or training
- What they are designed to measure, e.g., abilities, skills, work styles, work values, or vocational interests
- What they are designed to predict, e.g., job performance, managerial potential, career success, job satisfaction, or tenure
- Format, e.g., paper-and-pencil, work-sample, or computer simulation
- Level of standardization, objectivity, and quantifiability-Assessment tools and procedures vary greatly on these factors. For example, there are subjective evaluations of resumes, highly structured achievement tests, interviews having varying degrees of structure, and personality inventories with no specific right or wrong answers.
All assessment tools used to make employment decisions, regardless of their format, level of standardization, or objectivity, are subject to professional and legal standards. For example, both the evaluation of a resume and the use of a highly standardized achievement test must comply with applicable laws. Assessment tools used solely for career exploration or counseling are usually not held to the same legal standards.
2. Relationship between the personnel assessment process and tests and procedures
A personnel test or a procedure provides only part of the picture about a person. On the other hand, the personnel assessment process combines and evaluates all the information gathered about a person to make career or employment-related decisions. Figure 1 on page 1-3 highlights the relationship between assessment tools and the personnel assessment process.
3. What do tests measure?
People differ on many psychological and physical characteristics. These characteristics are called constructs. For example, people skillful in verbal and mathematical reasoning are considered high on mental ability. Those who have little physical stamina and strength are labeled low on endurance and physical strength. The terms mental ability, endurance and physical strength are constructs. Constructs are used to identify personal characteristics and to sort people in terms of how much they possess of such characteristics.
Constructs cannot be seen or heard, but we can observe their effects on other variables. For example, we don't observe physical strength but we can observe people with great strength lifting heavy objects and people with limited strength attempting, but failing, to lift these
| Tests, inventories, and procedures are assessment tools that may be used to measure an individual's abilities, values, and personality traits. They are components of the assessment process. |
- observations
- resume evaluations
- application blanks/questionnaires
- biodata inventories
- interviews
- work samples/performance tests
- achievement tests
- general ability tests
- specific ability tests
|
- physical ability tests
- personality inventories
- honesty/integrity inventories
- interest inventories
- work values inventories
- assessment centers
- drug tests
- medical tests
|
|
|
\|/
|
Assessment process
Systematic approach to combining and evaluating all the information gained from testing and using it to make career or employment-related decisions.
|
Figure 1. Relationship between assessment tools and
the assessment process.
objects. Such differences in characteristics among people have important implications in the employment context. Employees and applicants vary widely in their knowledge, skills, abilities, interests, work styles, and other characteristics. These differences systematically affect the way people perform or behave on the job.
These differences in characteristics are not necessarily apparent by simply observing the employee or job applicant. Employment tests can be used to gather accurate information about job-relevant characteristics. This information helps assess the fit or match between people and jobs. To give an example, an applicant's score on a mechanical test reflects his or her mechanical ability as measured by the test. This score can be used to predict how well that applicant is likely to perform in a job that requires mechanical ability, as demonstrated through a professionally conducted job analysis. Tests can be used in this way to identify potentially good workers.
Some tests can be used to predict employee and applicant job performance. In testing terms, whatever the test is designed to predict is called the criterion. A criterion can be any measure of work behavior or any outcome that can be used as the standard for successful job performance. Some commonly used criteria are productivity, supervisory ratings of job performance, success in training, tenure, and absenteeism. For example, in measuring job performance, supervisory ratings could be the criterion predicted by a test of mechanical ability. How well a test predicts a criterion is one indication of the usefulness of the test.
4. Why do organizations conduct assessment?
Organizations use assessment tools and procedures to help them perform the following human resource functions:
- Selection. Organizations want to be able to identify and hire the best people for the job and the organization in a fair and efficient manner. A properly developed assessment tool may provide a way to select successful sales people, concerned customer service representatives, and effective workers in many other occupations.
- Placement. Organizations also want to be able to assign people to the appropriate job level. For example, an organization may have several managerial positions, each having a different level of responsibility. Assessment may provide information that helps organizations achieve the best fit between employees and jobs.
- Training and development. Tests are used to find out whether employees have mastered training materials. They can help identify those applicants and employees who might benefit from either remedial or advanced training. Information gained from testing can be used to design or modify training programs. Test results also help individuals identify areas in which self-development activities would be useful.
- Promotion. Organizations may use tests to identify employees who possess managerial potential or higher level capabilities, so that these employees can be promoted to assume greater duties and responsibilities.
- Career exploration and guidance. Tests are sometimes used to help people make educational and vocational choices. Tests may provide information that helps individuals choose occupations in which they are likely to be successful and satisfied.
- Program evaluation. Tests may provide information that the organization can use to determine whether employees are benefiting from training and development programs.
5. Some situations in which an organization may benefit from testing
Some situations include the following:
- Current selection or placement procedures result in poor hiring decisions.
- Employee productivity is low.
- Employee errors have serious financial, health, or safety consequences.
- There is high employee turnover or absenteeism.
- Present assessment procedures do not meet current legal and professional standards.
6. Importance of using tests in a purposeful manner
Assessment instruments, like other tools, can be extremely helpful when used properly, but counter-productive when used inappropriately. Often inappropriate use stems from not having a clear understanding of what you want to measure and why you want to measure it. Having a clear understanding of the purpose of your assessment system is important in selecting the appropriate assessment tools to meet that purpose. This brings us to an important principle of assessment.
Principle of Assessment
Use assessment tools in a purposeful manner. It is critical to have a clear understanding of what needs to be measured and for what purpose. |
Assessment strategies should be developed with a clear understanding of the knowledge, skills, abilities, characteristics, or personal traits you want to measure. It is also essential to have a clear idea of what each assessment tool you are considering using is designed to measure.
7. Limitations of personnel tests and procedures-fallibility of test scores
Professionally developed tests and procedures that are used as part of a planned assessment program may help you select and hire more qualified and productive employees. However, it is essential to understand that all assessment tools are subject to errors, both in measuring a characteristic, such as verbal ability, and in predicting performance criteria, such as success on the job. This is true for all tests and procedures, regardless of how objective or standardized they might be.
- Do not expect any test or procedure to measure a personal trait or ability with perfect accuracy for every single person.
- Do not expect any test or procedure to be completely accurate in predicting performance.
There will be cases where a test score or procedure will predict someone to be a good worker, who, in fact, is not. There will also be cases where an individual receiving a low score will be rejected, who, in fact, would actually be capable and a good worker. Such errors in the assessment context are called selection errors. Selection errors cannot be completely avoided in any assessment program.
Why do organizations conduct testing despite these errors? The answer is that appropriate use of professionally developed assessment tools on average enables organizations to make more effective employment-related decisions than use of simple observations or random decision making.
Using a single test or procedure will provide you with a limited view of a person's employment or career-related qualifications. Moreover, you may reach a mistaken conclusion by giving too much weight to a single test result. On the other hand, using a variety of assessment tools enables you to get a more complete picture of the individual. The practice of using a variety of tests and procedures to more fully assess people is referred to as the whole-person approach to personnel assessment. This will help reduce the number of selection errors made and will boost the effectiveness of your decision making. This leads to an important principle of assessment.
Principle of Assessment
Do not rely too much on any one test to make decisions. Use the whole-person approach to assessment. |
[back to Testing page] [back to Job page]
CHAPTER 2
Understanding the Legal Context of Assessment-Employment Laws and Regulations with Implications for Assessment
The number of laws and regulations governing the employment process has increased over the past four decades. Many of these laws and regulations have important implications for conducting employment assessment. This chapter discusses what you should do to make your practices consistent with legal, professional, and ethical standards.
Chapter Highlights
1. Title VII of the Civil Rights Act (CRA) of 1964, as amended in 1972; Tower Amendment to Title VII
2. Age Discrimination in Employment Act of 1967 (ADEA)
3. Equal Employment Opportunity Commission (EEOC) - 1972
4. Uniform Guidelines on Employee Selection Procedures - 1978; adverse or disparate impact, approaches to determine existence of adverse impact, four-fifths rule, job-relatedness, business necessity, biased assessment procedures
5. Title I of the Civil Rights Act (CRA) of 1991
6. Americans with Disabilities Act (ADA) - 1990
7. Record keeping of adverse impact and job-relatedness of tests
8. The Standards for Educational and Psychological Testing - 1985; The Principles for the Validation and Use of Personnel Selection Procedures - 1987
9. Relationship between federal, state, and local employment laws.
Principles of Assessment Discussed
Use only assessment instruments that are unbiased and fair to all groups. |
The general purpose of employment laws and regulations is to prohibit unfair discrimination in employment and provide equal employment opportunity for all. Unfair discrimination occurs when employment decisions are based on race, sex, religion, ethnicity, age, or disability rather than on job-relevant knowledge, skills, abilities, and other characteristics. Employment practices that unfairly discriminate against people are called unlawful or discriminatory employment practices.
The summaries of the laws and regulations in this chapter focus on their impact on employment testing and assessment. Before you institute any policies based on these laws and regulations, read the specific laws carefully, and consult with your legal advisors regarding the implications for your particular assessment program.
1. Title VII of the Civil Rights Act (CRA) of 1964 (as amended in 1972); Tower Amendment to Title VII
Title VII is landmark legislation that prohibits unfair discrimination in all terms and conditions of employment based on race, color, religion, sex, or national origin. Other subsequent legislation, for example, ADEA and ADA, has added age and disability, respectively, to this list. Women and men, people age 40 and older, people with disabilities, and people belonging to a racial, religious, or ethnic group are protected under Title VII and other employment laws. Individuals in these categories are referred to as members of a protected group. The employment practices covered by this law include the following:
ï recruitment ï transfer ï performance appraisal ï disciplinary action |
ï hiring ï training ï compensation ï termination |
ï job classification ï promotion ï union or other membership ï fringe benefits. |
Employers having 15 or more employees, employment agencies, and labor unions are subject to this law.
The Tower Amendment to this act stipulates that professionally developed workplace tests can be used to make employment decisions. However, only instruments that do not discriminate against any protected group can be used. Use only tests developed by experts who have demonstrated qualifications in this area.
2. Age Discrimination in Employment Act of 1967 (ADEA)
This Act prohibits discrimination against employees or applicants age 40 or older in all aspects of the employment process. Individuals in this group must be provided equal employment opportunity; discrimination in testing and assessment is prohibited. If an older worker charges discrimination under the ADEA, the employer may defend the practice if it can be shown that the job requirement is a matter of business necessity. Employers must have documented support for the argument they use as a defense.
ADEA covers employers having 20 or more employees, labor unions, and employment agencies. Certain groups of employees are exempt from ADEA coverage, including public law enforcement personnel, such as police officers and firefighters. Uniformed military personnel also are exempt from ADEA coverage.
3. Equal Employment Opportunity Commission (EEOC)-1972
The EEOC is responsible for enforcing federal laws prohibiting employment discrimination, including Title VII, the U. S. Environmental Protection Agency (EPA), the ADEA, and the ADA. It receives, investigates, and processes charges of unlawful employment practices of employers filed by an individual, a group of individuals, or one of its commissioners. If the EEOC determines that there is "reasonable cause" that an unlawful employment practice has occurred, it is also authorized to sue on behalf of the charging individual(s) or itself. The EEOC participated in developing the Uniform Guidelines on Employee Selection Procedures.
4. Uniform Guidelines on Employee Selection Procedures-1978; adverse or disparate impact, approaches to determine existence of adverse impact, four-fifths rule, job-relatedness, business necessity, biased assessment procedures
In 1978, the EEOC and three other federal agencies-the Civil Service Commission (predecessor of the Office of Personnel Management) and the Labor and Justice Departments-jointly issued the Uniform Guidelines on Employee Selection Procedures. The Guidelines incorporate a set of principles governing the use of employee selection procedures according to applicable laws. They provide a framework for employers and other organizations for determining the proper use of tests and other selection procedures. The Guidelines are legally binding under a number of civil rights laws, including Executive Order 11246 and the Civil Rights Requirements of the National Job Training Partnership Act and the Wagner Peyser Act. In reviewing the testing practices of organizations under Title VII, the courts generally give great importance to the Guidelines' technical standards for establishing the job-relatedness of tests. Also, federal and state agencies, including the EEOC, apply the Uniform Guidelines in enforcing Title VII and related laws.
The Guidelines cover all employers employing 15 or more employees, labor organizations, and employment agencies. They also cover contractors and subcontractors to the federal government and organizations receiving federal assistance. They apply to all tests, inventories and procedures used to make employment decisions. Employment decisions include hiring, promotion, referral, disciplinary action, termination, licensing, and certification. Training may be included as an employment decision if it leads to any of the actions listed above. The Guidelines have significant implications for personnel assessment.
One of the basic principles of the Uniform Guidelines is that it is unlawful to use a test or selection procedure that creates adverse impact, unless justified. Adverse impact occurs when there is a substantially different rate of selection in hiring, promotion, or other employment decisions that work to the disadvantage of members of a race, sex, or ethnic group.
Different approaches exist that can be used to determine whether adverse impact has occurred. Statistical Techniques may provide information regarding whether or not the use of a test results in adverse impact. Adverse impact is normally indicated when the selection rate for one group is less than 80% (4/5) that of another. This measure is commonly referred to as the four-fifths or 80% rule. However, variations in sample size may affect the interpretation of the calculation. For example, the 80% rule may not be accurate in detecting substantially different rates of selection in very large or small samples. When determining whether there is adverse impact in very large or small samples, more sensitive tests of statistical significance should be employed.
When there is no charge of adverse impact, the Guidelines do not require that you show the job-relatedness of your assessment procedures. However, you are strongly encouraged to use only job-related assessment tools.
If your assessment process results in adverse impact, you are required to eliminate it or justify its continued use. The Guidelines recommend the following actions when adverse impact occurs:
- Modify the assessment instrument or procedure causing adverse impact.
- Exclude the component procedure causing adverse impact from your assessment program.
- Use an alternative procedure that causes little or no adverse impact, assuming that the alternative procedure is substantially equally valid.
- Use the selection instrument that has adverse impact if the procedure is job related and valid for selecting better workers, and there is no equally effective procedure available that has less adverse impact.
Note that for the continued use of assessment instruments or procedures that cause adverse impact, courts have required justification by business necessity as well as validity for the specific use. The issue of business necessity is specifically addressed in Title I of the Civil Rights Act of 1991 (see next section).
An assessment procedure that causes adverse impact may continue to be used only if there is evidence that
- It is job-related for the position in question.
- Its continued use is justified by business necessity.
Demonstrating job-relatedness of a test is the same as establishing that the test may be validly used as desired. Chapter 3 discusses the concept of test validity and methods for establishing the validity or job-relatedness of a test.
Demonstrating the business necessity of using a particular assessment instrument involves showing that its use is essential to the safe and efficient operation of the business and there are no alternative procedures available that are substantially equally valid to achieve the business objectives with a lesser adverse impact.
Another issue of importance discussed in the Uniform Guidelines relates to test fairness. The Uniform Guidelines define biased or unfair assessment procedures as those assessment procedures on which one race, sex, or ethnic group characteristically obtains lower scores than members of another group and the differences in the scores are not reflected in differences in the job performance of members of the groups.
The meaning of scores on an unfair or biased assessment procedure will differ depending on the group membership of the person taking the test. Therefore, using biased tests can prevent employers from making equitable employment decisions. This leads to the next principle.
Principle of Assessment
Use only assessment instruments that are unbiased and fair to all groups. |
Use of biased tools may result in unfair discrimination against members of the lower scoring groups. However, use of fair and unbiased tests can still result in adverse impact in some cases. If you are developing your own test or procedure, expert help may be advisable to make sure your procedure is fair to all relevant groups. If you are planning to purchase professionally developed assessment tools, first evaluate the fairness of those you are considering by reading the test manuals and consulting independent reviews.
5. Title I of the Civil Rights Act of 1991
Title I of the CRA of 1991 reaffirms the principles developed in Title VII of the CRA of 1964, but makes several significant changes.
As noted previously, the Act specifically requires demonstration of both the job-relatedness and business necessity of assessment instruments or procedures that cause adverse impact. The business necessity requirement, set forth in Title I of the CRA of 1991, is harder to satisfy in defending challenged practices than a business purpose test suggested by the Supreme Court earlier.
Another important provision relates to the use of group-based test score adjustments to maintain a representative work force. The Act prohibits score adjustments, the use of different cut-off scores for different groups of test takers, or alteration of employment-related test results based on the demographics of the test takers. Such practices, which are referred to as race norming or within-group norming, were used by some employers and government agencies to avoid adverse impact.
The Act also makes compensatory and punitive damages available as a remedy for claims of intentional discrimination under Title VII and the ADA.
6. Americans with Disabilities Act (ADA) - 1990
Under the ADA, qualified individuals with disabilities must be given equal opportunity in all aspects of employment. The law prohibits employers with 15 or more employees, labor unions, and employment agencies from discriminating against qualified individuals with disabilities. Prohibited discrimination includes failure to provide reasonable accommodation to persons with disabilities when doing so would not pose undue hardship.
A qualified individual with a disability is one who can perform the essential functions of a job, with or without reasonable accommodation.
- Disability is defined broadly to include any physical or mental impairment that substantially limits one or more of an individual's major life activities, such as caring for oneself, walking, talking, hearing, or seeing. Some common examples include visual, speech, and hearing disabilities; epilepsy; specific learning disabilities; cancer; serious mental illness; AIDS and HIV infection; alcoholism; and past drug addiction. Noteworthy among conditions not covered are current illegal use of drugs, sexual behavior disorders, compulsive gambling, kleptomania, and pyromania.
- Essential functions are the primary job duties that are fundamental, and not marginal to the job. Factors relevant to determining whether a function is essential include written job descriptions, the amount of time spent performing the function, the consequences of not requiring the function, and the work experiences of employees who hold the same or similar jobs.
- Reasonable accommodation is defined as a change in the job application and selection process, a change in the work environment or the manner in which the work is performed, that enables a qualified person with a disability to enjoy equal employment opportunities. Under this Act, qualified individuals with disabilities must be provided reasonable accommodation so they can perform the essential job functions, as long as this does not create undue hardship to the employer.
- Undue hardship is defined as significant difficulty or additional expense and is determined based on a number of factors. Some factors that are considered are the nature and net cost of the accommodation, the financial resources of the facility, the number employed at the facility, the effect on resources and operations, the overall financial resources of the entire organization, and the fiscal relationship of the facility with the organization. An accommodation that is possible for a large organization may pose an undue hardship for a small organization.
The ADA has major implications for your assessment practices.
- In general, it is the responsibility of the individual with a disability to inform you that an accommodation is needed. However, you may ask for advance notice of accommodations required, for the hiring process only, so that you may adjust your testing program or facilities appropriately. When the need for accommodation is not obvious, you may request reasonable documentation of the applicant's disability and functional limitations for which he or she needs an accommodation.
- Reasonable accommodation may involve making the test site accessible, or using an alternative assessment procedure. Administering employment tests to individuals with disabilities that require those individuals to use their impaired abilities is prohibited unless the tests are intended to measure one of these abilities. For example, under the ADA, when a test screens out one or more individuals with a disability, its use must be shown to be job-related for the position in question and justified by business necessity.
- One possible alternative procedure, if available, would be to use a form of the test that does not require use of the impaired ability. Another possibility is to use a procedure that compensates for the impaired ability, if appropriate. For example, allowing extra time to complete certain types of employment tests for someone with dyslexia or other learning disability, or providing a test with larger print or supplying a reader to a visually impaired individual where appropriate, would be considered reasonable accommodation.
- The ADA expressly prohibits making medical inquiries or administering medical examinations prior to making a job offer. Before making medical inquiries, or requiring medical exams, you must make an offer of employment to the applicant. You may make medical inquiries or require medical exams of an employee only when doing so is work-related and justified by business necessity. All medical information you obtain about your applicants and employees is strictly confidential and must be treated as such. Access to and use of this information is also greatly restricted. For a more detailed discussion of medical examinations see Chapter 4.
Your organization should develop a written policy on conducting testing and assessment of individuals with disabilities. This will help ensure compliance with the provisions of the ADA.
If you need assistance in complying with the ADA, there are several resources you may contact.
- The Job Accommodation Network: (800) 526-7234
- Industry-Labor Council on Employment and Disability: (516) 747-6323
- The American Foundation for the Blind: (202) 408-0200, (800) 232-5463
- The President's Committee on Employment of People with Disabilities: (202) 376-6200
- Disability and Business Technical Assistance Centers: (800) 949-4232.
7. Record keeping of adverse impact and job-relatedness of tests
The Uniform Guidelines and subsequent regulations 2 require that all employers maintain a record of their employment-related activities, including statistics related to testing and adverse impact. Filing and record-keeping requirements for large employers (those with over 100 employees) are generally more extensive than those for employers with 100 or fewer employees. To learn more about the specific requirements, refer to EEOC regulations on record-keeping and reporting requirements under Title VII, and the ADA, 29 CFR part 1602, and the Uniform Guidelines.
8. The Standards for Educational and Psychological Testing - 1985; The Principles for the Validation and Use of Personnel Selection Procedures-1987
There are two resource guides published by major organizations in the testing field that will help you set up and maintain an assessment program. The principles and practices presented in these publications set the standards for professional conduct in all aspects of assessment.
- The Standards for Educational and Psychological Testing. This publication was developed jointly by the American Psychological Association (APA), the National Council on Measurement in Education (NCME), and the American Educational Research Association (AERA). The Standards are an authoritative and comprehensive source of information on how to develop, evaluate, and use tests and other assessment procedures in educational, employment, counseling, and clinical settings. Although developed as professional guidelines, they are consistent with applicable regulations and are frequently cited in litigation involving testing practices.
- The Principles for the Validation and Use of Personnel Selection Procedures. This publication was developed by the Society for Industrial and Organizational Psychology (SIOP). Like the Standards, the Principles are also an excellent guide to good practices in the choice, development, evaluation, and use of assessment tools. However, their main focus is on tools used in the personnel assessment context. The Principles explain their relationship to the Standards in the following way:
The Standards primarily address psychometric issues while the Principles primarily address the problems of making decisions in employee selection, placement, promotion, etc. The major concern of the Standards is general; the primary concern of the Principles is that performance on a test . . . is related to performance on a job or other measures of job success.
Compatibility of the Standards and the Principles with the Uniform Guidelines
The Uniform Guidelines were intended to be consistent with generally accepted professional standards for validating and evaluating standardized tests and other selection procedures. In this regard, the Guidelines specifically refer to the Standards.
It is strongly encouraged that you develop familiarity with both the Standards and the Principles in addition to the Uniform Guidelines. Together, they can help you conduct personnel assessment in a manner consistent with legal and professional standards.
9. Relationship between federal, state, and local employment laws
Some states and localities have issued their own fair employment practices laws, and some have adopted the federal Uniform Guidelines. These state and local laws may be more stringent than corresponding federal laws. When there is a contradiction, federal laws and regulations override any contradictory provisions of corresponding state or local laws. You should become thoroughly familiar with your own state and local laws on employment and testing before you initiate and operate an assessment program.
[back to Testing page] [back to Job page]
CHAPTER 3
Understanding Test Quality-Concepts of Reliability and Validity
Test reliability and validity are two technical properties of a test that indicate the quality and usefulness of the test. These are the two most important features of a test. You should examine these features when evaluating the suitability of the test for your use. This chapter provides a simplified explanation of these two complex ideas. These explanations will help you to understand reliability and validity information reported in test manuals and reviews and use that information to evaluate the suitability of a test for your use.
Chapter Highlights
1. What makes a good test?
2. Test reliability
3. Interpretation of reliability information from test manuals and reviews
4. Types of reliability estimates
5. Standard error of measurement
6. Test validity
7. Methods for conducting validation studies
8. Using validity evidence from outside studies
9. How to interpret validity information from test manuals and independent reviews.
|
Principles of Assessment Discussed
Use only reliable assessment instruments and procedures.
Use only assessment procedures and instruments that have been demonstrated to be valid for the specific purpose for which they are being used.
Use assessment tools that are appropriate for the target population.
|
1. What makes a good test?
An employment test is considered "good" if the following can be said about it:
- The test measures what it claims to measure consistently or reliably. This means that if a person were to take the test again, the person would get a similar test score.
- The test measures what it claims to measure. For example, a test of mental ability does in fact measure mental ability, and not some other characteristic.
- The test is job-relevant. In other words, the test measures one or more characteristics that are important to the job.
- By using the test, more effective employment decisions can be made about individuals. For example, an arithmetic test may help you to select qualified workers for a job that requires knowledge of arithmetic operations.
The degree to which a test has these qualities is indicated by two technical properties: reliability and validity.
2. Test reliability
Reliability refers to how dependably or consistently a test measures a characteristic. If a person takes the test again, will he or she get a similar test score, or a much different score? A test that yields similar scores for a person who repeats the test is said to measure a characteristic reliably.
How do we account for an individual who does not get exactly the same test score every time he or she takes the test? Some possible reasons are the following:
- Test taker's temporary psychological or physical state. Test performance can be influenced by a person's psychological or physical state at the time of testing. For example, differing levels of anxiety, fatigue, or motivation may affect the applicant's test results.
- Environmental factors. Differences in the testing environment, such as room temperature, lighting, noise, or even the test administrator, can influence an individual's test performance.
- Test form. Many tests have more than one version or form. Items differ on each form, but each form is supposed to measure the same thing. Different forms of a test are known as parallel forms or alternate forms. These forms are designed to have similar measurement characteristics, but they contain different items. Because the forms are not exactly the same, a test taker might do better on one form than on another.
- Multiple raters. In certain tests, scoring is determined by a rater's judgments of the test taker's performance or responses. Differences in training, experience, and frame of reference among raters can produce different test scores for the test taker.
These factors are sources of chance or random measurement error in the assessment process. If there were no random errors of measurement, the individual would get the same test score, the individual's "true" score, each time. The degree to which test scores are unaffected by measurement errors is an indication of the reliability of the test.
Reliable assessment tools produce dependable, repeatable, and consistent information about people. In order to meaningfully interpret test scores and make useful employment or career-related decisions, you need reliable tools. This brings us to the next principle of assessment.
Principle of Assessment
Use only reliable assessment instruments and procedures. In other words, use only assessment tools that provide dependable and consistent information. |
3. Interpretation of reliability information from test manuals and reviews
Test manuals and independent review of tests provide information on test reliability. The following discussion will help you interpret the reliability information about any test.
| The reliability of a test is indicated by the reliability coefficient. It is denoted by the letter "r," and is expressed as a number ranging between 0 and 1.00, with r = 0 indicating no reliability, and r = 1.00 indicating perfect reliability. Do not expect to find a test with perfect reliability. Generally, you will see the reliability of a test as a decimal, for example, r = .80 or r = .93. The larger the reliability coefficient, the more repeatable or reliable the test scores. Table 1 serves as a general guideline for interpreting test reliability. However, do not select or reject a test solely based on the size of its reliability coefficient. To evaluate a test's reliability, you should consider the type of test, the type of reliability estimate reported, and the context in which the test will be used. |
Table 1. General Guidelines for
Interpreting Reliability Coefficients |
| Reliability coefficient value |
Interpretation |
| .90 and up |
excellent |
| .80 - .89 |
good |
| .70 - .79 |
adequate |
| below .70 |
may have limited applicability |
|
4. Types of reliability estimates
There are several types of reliability estimates, each influenced by different sources of measurement error. Test developers have the responsibility of reporting the reliability estimates that are relevant for a particular test. Before deciding to use a test, read the test manual and any independent reviews to determine if its reliability is acceptable. The acceptable level of reliability will differ depending on the type of test and the reliability estimate used.
The discussion in Table 2 should help you develop some familiarity with the different kinds of reliability estimates reported in test manuals and reviews.
Table 2. Types of Reliability Estimates
|
Test-retest reliability indicates the repeatability of test scores with the passage of time. This estimate also reflects the stability of the characteristic or construct being measured by the test.
Some constructs are more stable than others. For example, an individual's reading ability is more stable over a particular period of time than that individual's anxiety level. Therefore, you would expect a higher test-retest reliability coefficient on a reading test than you would on a test that measures anxiety. For constructs that are expected to vary over time, an acceptable test-retest reliability coefficient may be lower than is suggested in Table 1.
|
|
Alternate or parallel form reliability indicates how consistent test scores are likely to be if a person takes two or more forms of a test.
A high parallel form reliability coefficient indicates that the different forms of the test are very similar which means that it makes virtually no difference which version of the test a person takes. On the other hand, a low parallel form reliability coefficient suggests that the different forms are probably not comparable; they may be measuring different things and therefore cannot be used interchangeably.
|
|
Inter-rater reliability indicates how consistent test scores are likely to be if the test is scored by two or more raters.
On some tests, raters evaluate responses to questions and determine the score. Differences in judgments among raters are likely to produce variations in test scores. A high inter-rater reliability coefficient indicates that the judgment process is stable and the resulting scores are reliable.
Inter-rater reliability coefficients are typically lower than other types of reliability estimates. However, it is possible to obtain higher levels of inter-rater reliabilities if raters are appropriately trained.
|
|
Internal consistency reliability indicates the extent to which items on a test measure the same thing.
A high internal consistency reliability coefficient for a test indicates that the items on the test are very similar to each other in content (homogeneous). It is important to note that the length of a test can affect internal consistency reliability. For example, a very lengthy test can spuriously inflate the reliability coefficient.
Tests that measure multiple characteristics are usually divided into distinct components. Manuals for such tests typically report a separate internal consistency reliability coefficient for each component in addition to one for the whole test.
Test manuals and reviews report several kinds of internal consistency reliability estimates. Each type of estimate is appropriate under certain circumstances. The test manual should explain why a particular estimate is reported.
|
5. Standard error of measurement
Test manuals report a statistic called the standard error of measurement (SEM). It gives the margin of error that you should expect in an individual test score because of imperfect reliability of the test. The SEM represents the degree of confidence that a person's "true" score lies within a particular range of scores. For example, an SEM of "2" indicates that a test taker's "true" score probably lies within 2 points in either direction of the score he or she receives on the test. This means that if an individual receives a 91 on the test, there is a good chance that the person's "true" score lies somewhere between 89 and 93.
The SEM is a useful measure of the accuracy of individual test scores. The smaller the SEM, the more accurate the measurements.
When evaluating the reliability coefficients of a test, it is important to review the explanations provided in the manual for the following:
- Types of reliability used. The manual should indicate why a certain type of reliability coefficient was reported. The manual should also discuss sources of random measurement error that are relevant for the test.
- How reliability studies were conducted. The manual should indicate the conditions under which the data were obtained, such as the length of time that passed between administrations of a test in a test-retest reliability study. In general, reliabilities tend to drop as the time between test administrations increases.
- The characteristics of the sample group. The manual should indicate the important characteristics of the group used in gathering reliability information, such as education level, occupation, etc. This will allow you to compare the characteristics of the people you want to test with the sample group. If they are sufficiently similar, then the reported reliability estimates will probably hold true for your population as well.
For more information on reliability, consult the APA Standards, the SIOP Principles, or any major textbook on psychometrics or employment testing. Appendix A lists some possible sources.
6. Test validity
Validity is the most important issue in selecting a test. Validity refers to what characteristic the test measures and how well the test measures that characteristic.
- Validity tells you if the characteristic being measured by a test is related to job qualifications and requirements.
- Validity gives meaning to the test scores. Validity evidence indicates that there is linkage between test performance and job performance. It can tell you what you may conclude or predict about someone from his or her score on the test. If a test has been demonstrated to be a valid predictor of performance on a specific job, you can conclude that persons scoring high on the test are more likely to perform well on the job than persons who score low on the test, all else being equal.
- Validity also describes the degree to which you can make specific conclusions or predictions about people based on their test scores. In other words, it indicates the usefulness of the test.
It is important to understand the differences between reliability and validity. Validity will tell you how good a test is for a particular situation; reliability will tell you how trustworthy a score on that test will be. You cannot draw valid conclusions from a test score unless you are sure that the test is reliable. Even when a test is reliable, it may not be valid. You should be careful that any test you select is both reliable and valid for your situation.
A test's validity is established in reference to a specific purpose; the test may not be valid for different purposes. For example, the test you use to make valid predictions about someone's technical proficiency on the job may not be valid for predicting his or her leadership skills or absenteeism rate. This leads to the next principle of assessment.
Principle of Assessment
Use only assessment procedures and instruments that have been demonstrated to be valid for the specific purpose for which they are being used. |
Similarly, a test's validity is established in reference to specific groups. These groups are called the reference groups. The test may not be valid for different groups. For example, a test designed to predict the performance of managers in situations requiring problem solving may not allow you to make valid or meaningful predictions about the performance of clerical employees. If, for example, the kind of problem-solving ability required for the two positions is different, or the reading level of the test is not suitable for clerical applicants, the test results may be valid for managers, but not for clerical employees.
Test developers have the responsibility of describing the reference groups used to develop the test. The manual should describe the groups for whom the test is valid, and the interpretation of scores for individuals belonging to each of these groups. You must determine if the test can be used appropriately with the particular type of people you want to test. This group of people is called your target population or target group.
Principle of Assessment
Use assessment tools that are appropriate for the target population. |
Your target group and the reference group do not have to match on all factors; they must be sufficiently similar so that the test will yield meaningful scores for your group. For example, a writing ability test developed for use with college seniors may be appropriate for measuring the writing ability of white-collar professionals or managers, even though these groups do not have identical characteristics. In determining the appropriateness of a test for your target groups, consider factors such as occupation, reading level, cultural differences, and language barriers.
Recall that the Uniform Guidelines require assessment tools to have adequate supporting evidence for the conclusions you reach with them in the event adverse impact occurs. A valid personnel tool is one that measures an important characteristic of the job you are interested in. Use of valid tools will, on average, enable you to make better employment-related decisions. Both from business-efficiency and legal viewpoints, it is essential to only use tests that are valid for your intended use.
In order to be certain an employment test is useful and valid, evidence must be collected relating the test to a job. The process of establishing the job relatedness of a test is called validation.
7. Methods for conducting validation studies
The Uniform Guidelines discuss the following three methods of conducting validation studies. The Guidelines describe conditions under which each type of validation strategy is appropriate. They do not express a preference for any one strategy to demonstrate the job-relatedness of a test.
- Criterion-related validation requires demonstration of a correlation or other statistical relationship between test performance and job performance. In other words, individuals who score high on the test tend to perform better on the job than those who score low on the test. If the criterion is obtained at the same time the test is given, it is called concurrent validity; if the criterion is obtained at a later time, it is called predictive validity.
- Content-related validation requires a demonstration that the content of the test represents important job-related behaviors. In other words, test items should be relevant to and measure directly important requirements and qualifications for the job.
- Construct-related validation requires a demonstration that the test measures the construct or characteristic it claims to measure, and that this characteristic is important to successful performance on the job.
The three methods of validity-criterion-related, content, and construct-should be used to provide validation support depending on the situation. These three general methods often overlap, and, depending on the situation, one or more may be appropriate. French (1990) offers situational examples of when each method of validity may be applied.
First, as an example of criterion-related validity, take the position of millwright. Employees' scores (predictors) on a test designed to measure mechanical skill could be correlated with their performance in servicing machines (criterion) in the mill. If the correlation is high, it can be said that the test has a high degree of validation support, and its use as a selection tool would be appropriate.
Second, the content validation method may be used when you want to determine if there is a relationship between behaviors measured by a test and behaviors involved in the job. For example, a typing test would be high validation support for a secretarial position, assuming much typing is required each day. If, however, the job required only minimal typing, then the same test would have little content validity. Content validity does not apply to tests measuring learning ability or general problem-solving skills (French, 1990).
Finally, the third method is construct validity. This method often pertains to tests that may measure abstract traits of an applicant. For example, construct validity may be used when a bank desires to test its applicants for "numerical aptitude." In this case, an aptitude is not an observable behavior, but a concept created to explain possible future behaviors. To demonstrate that the test possesses construct validation support, ". . . the bank would need to show (1) that the test did indeed measure the desired trait and (2) that this trait corresponded to success on the job" (French, 1990, p. 260).
Professionally developed tests should come with reports on validity evidence, including detailed explanations of how validation studies were conducted. If you develop your own tests or procedures, you will need to conduct your own validation studies. As the test user, you have the ultimate responsibility for making sure that validity evidence exists for the conclusions you reach using the tests. This applies to all tests and procedures you use, whether they have been bought off-the-shelf, developed externally, or developed in-house.
Validity evidence is especially critical for tests that have adverse impact. When a test has adverse impact, the Uniform Guidelines require that validity evidence for that specific employment decision be provided.
The particular job for which a test is selected should be very similar to the job for which the test was originally developed. Determining the degree of similarity will require a job analysis. Job analysis is a systematic process used to identify the tasks, duties, responsibilities and working conditions associated with a job and the knowledge, skills, abilities, and other characteristics required to perform that job.
Job analysis information may be gathered by direct observation of people currently in the job, interviews with experienced supervisors and job incumbents, questionnaires, personnel and equipment records, and work manuals. In order to meet the requirements of the Uniform Guidelines, it is advisable that the job analysis be conducted by a qualified professional, for example, an industrial and organizational psychologist or other professional well trained in job analysis techniques. Job analysis information is central in deciding what to test for and which tests to use.
8. Using validity evidence from outside studies
Conducting your own validation study is expensive, and, in many cases, you may not have enough employees in a relevant job category to make it feasible to conduct a study. Therefore, you may find it advantageous to use professionally developed assessment tools and procedures for which documentation on validity already exists. However, care must be taken to make sure that validity evidence obtained for an "outside" test study can be suitably "transported" to your particular situation.
The Uniform Guidelines, the Standards, and the SIOP Principles state that evidence of transportability is required. Consider the following when using outside tests:
- Validity evidence. The validation procedures used in the studies must be consistent with accepted standards.
- Job similarity. A job analysis should be performed to verify that your job and the original job are substantially similar in terms of ability requirements and work behavior.
- Fairness evidence. Reports of test fairness from outside studies must be considered for each protected group that is part of your labor market. Where this information is not available for an otherwise qualified test, an internal study of test fairness should be conducted, if feasible.
- Other significant variables. These include the type of performance measures and standards used, the essential work activities performed, the similarity of your target group to the reference samples, as well as all other situational factors that might affect the applicability of the outside test for your use.
To ensure that the outside test you purchase or obtain meets professional and legal standards, you should consult with testing professionals. See Chapter 5 for information on locating consultants.
9. How to interpret validity information from test manuals and independent reviews
To determine if a particular test is valid for your intended use, consult the test manual and available independent reviews. (Chapter 5 offers sources for test reviews.) The information below can help you interpret the validity evidence reported in these publications.
- In evaluating validity information, it is important to determine whether the test can be used in the specific way you intended, and whether your target group is similar to the test reference group.
Test manuals and reviews should describe
- Available validation evidence supporting use of the test for specific purposes. The manual should include a thorough description of the procedures used in the validation studies and the results of those studies.
- The possible valid uses of the test. The purposes for which the test can legitimately be used should be described, as well as the performance criteria that can validly be predicted.
- The sample group(s) on which the test was developed. For example, was the test developed on a sample of high school graduates, managers, or clerical workers? What was the racial, ethnic, age, and gender mix of the sample?
- The group(s) for which the test may be used.
- The criterion-related validity of a test is measured by the validity coefficient. It is reported as a number between 0 and 1.00 that indicates the magnitude of the relationship, "r," between the test and a measure of job performance (criterion). The larger the validity coefficient, the more confidence you can have in predictions made from the test scores. However, a single test can never fully predict job performance because success on the job depends on so many varied factors. Therefore, validity coefficients, unlike reliability coefficients, rarely exceed r = .40.
| Table 3. General Guidelines for Interpreting Validity Coefficients |
| Validity coefficient value |
Interpretation |
| above .35 |
very beneficial |
| .21 - .35 |
likely to be useful |
| .11 - .20 |
depends on circumstances |
| below .11 |
unlikely to be useful |
As a general rule, the higher the validity coefficient the more beneficial it is to use the test. Validity coefficients of r =.21 to r =.35 are typical for a single test. Validities for selection systems that use multiple tests will probably be higher because you are using different tools to measure/predict different aspects of performance, where a single test is more likely to measure or predict fewer aspects of total performance. Table 3 serves as a general guideline for interpreting test validity for a single test. Evaluating test validity is a sophisticated task, and you might require the services of a testing expert. In addition to the magnitude of the validity coefficient, you should also consider at a minimum the following factors:
- level of adverse impact associated with your assessment tool
- selection ratio (number of applicants versus the number of openings)
- cost of a hiring error
- cost of the selection tool
- probability of hiring qualified applicant based on chance alone.
Here are three scenarios illustrating why you should consider these factors, individually and in combination with one another, when evaluating validity coefficients:
- Scenario One
You are in the process of hiring applicants where you have a high selection ratio and are filling positions that do not require a great deal of skill. In this situation, you might be willing to accept a selection tool that has validity considered "likely to be useful" or even "depends on circumstances" because you need to fill the positions, you do not have many applicants to choose from, and the level of skill required is not that high.
Now, let's change the situation.
- Scenario Two
You are recruiting for jobs that require a high level of accuracy, and a mistake made by a worker could be dangerous and costly. With these additional factors, a slightly lower validity coefficient would probably not be acceptable to you because hiring an unqualified worker would be too much of a risk. In this case you would probably want to use a selection tool that reported validities considered to be "very beneficial" because a hiring error would be too costly to your company.
Here is another scenario that shows why you need to consider multiple factors when evaluating the validity of assessment tools.
- Scenario Three
A company you are working for is considering using a very costly selection system that results in fairly high levels of adverse impact. You decide to implement the selection tool because the assessment tools you found with lower adverse impact had substantially lower validity, were just as costly, and making mistakes in hiring decisions would be too much of a risk for your company. Your company decided to implement the assessment given the difficulty in hiring for the particular positions, the "very beneficial" validity of the assessment and your failed attempts to find alternative instruments with less adverse impact. However, your company will continue efforts to find ways of reducing the adverse impact of the system.
Again, these examples demonstrate the complexity of evaluating the validity of assessments. Multiple factors need to be considered in most situations. You might want to seek the assistance of a testing expert (for example, an industrial/organizational psychologist) to evaluate the appropriateness of particular assessments for your employment situation.
When properly applied, the use of valid and reliable assessment instruments will help you make better decisions. Additionally, by using a variety of assessment tools as part of an assessment program, you can more fully assess the skills and capabilities of people, while reducing the effects of errors associated with any one tool on your decision making.
[back to Testing page] [back to Job page]
CHAPTER 4
Assessment Tools and Their Uses
This chapter briefly describes different types of assessment tools and procedures that organizations commonly use to conduct personnel assessment. Included are techniques such as employment interviews and reference checks, as well as various types of professionally developed assessment instruments. This chapter also includes a discussion of the use of medical tests and drug and alcohol testing in the workplace. Table 4, which appears at the end of this chapter, contains a brief description of the advantages and disadvantages of different types of assessment instruments.
Chapter Highlights
1. Mental and physical ability tests
2. Achievement tests
3. Biodata inventories
4. Employment interviews
5. Personality inventories
6. Honesty and integrity measures
7. Education and experience requirements (including licensing and certification)
8. Recommendations and reference checks
9. Assessment centers
10. Medical examinations
11. Drug and alcohol tests
It takes a good deal of knowledge and judgment to properly use assessment tools to make effective employment-related decisions. Many assessment tools and procedures require specialized training, education, or experience to administer and interpret correctly. These requirements vary widely, depending on the specific instruments being used. Check with the test publisher to determine whether you and your staff meet these requirements. To ensure that test users have the necessary qualifications, some test publishers and distributors require proof of qualifications before they will release certain tests.
1. Mental and physical ability tests
When properly applied, ability tests are among the most useful and valid tools available for predicting success in jobs and training across a wide variety of occupations. Ability tests are most commonly used for entry-level jobs, and for applicants without professional training or advanced degrees. Mental ability tests are generally used to measure the ability to learn and perform particular job responsibilities.
Examples of some mental abilities are verbal, quantitative, and spatial abilities. Physical ability tests usually encompass abilities such as strength, endurance, and flexibility.
- General ability tests typically measure one or more broad mental abilities, such as verbal, mathematical, and reasoning skills. These skills are fundamental to success in many different kinds of jobs, especially where cognitive activities such as reading, computing, analyzing, or communicating are involved.
- Specific ability tests include measures of distinct physical and mental abilities, such as reaction time, written comprehension, mathematical reasoning, and mechanical ability, that are important for many jobs and occupations. For example, good mechanical ability may be important for success in auto mechanic and engineering jobs; physical endurance may be critical for fire fighting jobs.
Although mental ability tests are valid predictors of performance in many jobs, use of such tests to make employment decisions often results in adverse impact. For example, research suggests that mental abilities tests adversely impact some racial minority groups and, if speed is also a component of the test, older workers may be adversely impacted. Similarly, use of physical ability tests often results in adverse impact against women and older persons. See Chapter 7 for strategies to minimize adverse impact in your assessment program.
2. Achievement tests
Achievement tests, also known as proficiency tests, are frequently used to measure an individual's current knowledge or skills that are important to a particular job. These tests generally fall into one of the following formats:
- Knowledge tests typically involve specific questions to determine how much the individual knows about particular job tasks and responsibilities. Traditionally they have been administered in a paper-and-pencil format, but computer administration is becoming more common. Licensing exams for accountants and psychologists are examples of knowledge tests. Knowledge tests tend to have relatively high validity.
- Work-sample or performance tests require the individual to actually demonstrate or perform one or more job tasks. These tests, by their makeup, generally show a high degree of job-relatedness. For example, an applicant for an office-machine repairman position may be asked to diagnose the problem with a malfunctioning machine. Test takers generally view these tests as fairer than other types of tests. Use of these tests often results in less adverse impact than mental ability tests and job knowledge tests. However, they can be expensive to develop and administer.
3. Biodata inventories
Biodata inventories are standardized questionnaires that gather job-relevant biographical information, such as amount and type of schooling, job experiences, and hobbies. They are generally used to predict job and training performance, tenure, and turnover. They capitalize on the well-proven notion that past behavior is a good predictor of future behavior.
Some individuals might provide inaccurate information on biodata inventories to portray themselves as being more qualified or experienced than they really are. Internal consistency checks can be used to detect whether there are discrepancies in the information reported. In addition, reference checks and resumes can be used to verify information.
4. Employment interviews
The employment interview is probably the most commonly used assessment tool. The interview can range from being totally unplanned, that is, unstructured, to carefully designed beforehand, that is, completely structured. The most structured interviews have characteristics such as standardized questions, trained interviewers, specific question order, controlled length of time, and a standardized response evaluation format. At the other end of the spectrum, a completely unstructured interview would probably be done "off the cuff," with untrained interviewers, random questions, and with no consideration of time. A structured interview that is based on an analysis of the job in question is generally a more valid predictor of job performance than an unstructured interview. Keep in mind that interviews may contain both structured and unstructured characteristics.
Regardless of the extent to which the interview is structured or unstructured, the skill of the interviewer can make a difference in the quality of the information gathered. A skillful, trained interviewer will be able to ask job-relevant follow-up questions to clarify and explore issues brought up during the interview.
It is unlawful to ask questions about medical conditions and disability before a conditional job offer. Even if the job applicant volunteers such information, you are not permitted to pursue inquiries about the nature of the medical condition or disability. Instead, refocus the interview so that emphasis is on the ability of the applicant to perform the job, not on the disability. In some limited circumstances, you may ask about the need for reasonable accommodation.
Where disability is concerned, the law requires that employers provide reasonable accommodations (meaning a modification or adjustment) to a job, the work environment or the way things are usually done so that qualified individuals with a disability are not excluded from jobs that they can perform. These legal requirements apply to all selection standards and procedures, including questions and rating systems used during the interview process.
Following a structured interview format can help interviewers avoid unlawful or inappropriate inquiries where medical conditions, disability, and age are concerned. For additional information on the ADA, see the EEOC Technical Assistance Manual on the Employment Provisions of the Americans with Disabilities Act and the EEOC ADA Enforcement Guidance: Preemployment Disability -Related Questions and Medical Examinations.
It is important to note that inquiries about race, ethnicity, or age generally are not expressly prohibited under the law, but usually serve no credible purpose in an interview. These types of questions are also closely scrutinized by organizations, including regulatory agencies, interested in protecting the civil rights of applicants.
5. Personality inventories
In addition to abilities, knowledge, and skills, job success also depends on an individual's personal characteristics. Personality inventories designed for use in employment contexts are used to evaluate such characteristics as motivation, conscientiousness, self-confidence, or how well an employee might get along with fellow workers. Research has shown that, in certain situations, use of personality tests with other assessment instruments can yield helpful predictions.
Some personality inventories have been developed to determine the psychological attributes of an individual for diagnostic and therapeutic purposes. These clinical tools are not specifically designed to measure job-related personality dimensions. These tests are used in only very limited employment situations, primarily with jobs where it is critical to have some idea about an applicant's state of mind, such as in the selection of law enforcement officers or nuclear power plant workers. This distinction between clinical and employment-oriented personality inventories can be confusing. Applicants asked to take personality tests may become concerned even though only employment-oriented personality inventories will be administered.
If a personality inventory or other assessment tool provides information that would lead to identifying a mental disorder or impairment, the tool is considered a medical exam under the ADA. The ADA permits medical examinations of applicants and employees only in limited circumstances.
There are a few additional concerns about personality tests. Since there are usually no right or wrong answers to the test items, test takers may provide socially desirable answers. However, sophisticated personality inventories often have "lie-scales" built in, which allow such response patterns to be detected. There is also a general perception that these tests ask personal questions that are only indirectly relevant to job performance. This may raise concern on the part of test takers that such tests are an invasion of privacy. Some of these concerns can be reduced by including personality tests only as one part of a broader assessment program.
6. Honesty and integrity measures
Honesty tests are a specific type of personality test. There has been an increase in the popularity of honesty and integrity measures since the Employee Polygraph Protection Act (1988) prohibited the use of polygraph tests by most private employers. Honesty and integrity measures may be broadly categorized into two types.
- Overt integrity tests gauge involvement in and attitudes toward theft and employee delinquency. Test items typically ask for opinions about frequency and extent of employee theft, leniency or severity of attitudes toward theft, and rationalizations of theft. They also include direct questions about admissions of, or dismissal for, theft or other unlawful activities.
- Personality-based measures typically contain disguised-purpose questions to gauge a number of personality traits. These traits are usually associated with a broad range of counterproductive employee behaviors, such as insubordination, excessive absenteeism, disciplinary problems, and substance abuse.
All the legitimate concerns and cautions of personality testing apply here. For instance, test takers may raise privacy concerns or question the relevance of these measures to job performance. If you choose to use an honesty test to select people for a particular job, you should document the business necessity of such a test. This would require a detailed job analysis, including an assessment of the consequences of hiring a dishonest individual. Make certain that your staff have the proper training and qualifications to administer and interpret integrity tests.
It is generally recommended that these tests be used only for pre-employment screening. Using the test with present employees could create serious morale problems. Using current employees' poor scores to make employment decisions may have legal repercussions when not substantiated by actual counterproductive behavior.
All honesty and integrity measures have appreciable prediction errors. To minimize prediction errors, thoroughly follow up on poor-scoring individuals with retesting, interviews, or reference checks. In general, integrity measures should not be used as the sole source of information for making employment decisions about individuals.
A number of states currently have statutes restricting the use of honesty and integrity measures. At least one state has an outright ban on their use. Consult regulations in your state that govern the use of honesty and integrity tests before using them.
7. Education and experience requirements (including licensing and certification)
Most jobs have some kind of education and experience requirements. For example, they may specify that only applicants with college degrees or equivalent training or experience will be considered. Such requirements are more common in technical, professional, and higher-level jobs. Certain licensing, certification, and education requirements are mandated by law, as in the case of truck drivers and physicians. This is done to verify minimum competence and to protect public safety.
Requirements for experience and education should be job-related. If the requirements you set result in adverse impact, you will have to demonstrate that they are job-related and justified by business necessity. However, in some cases job-relatedness might be difficult to demonstrate. For example, it is difficult to show that exactly 3 years of experience is necessary or demonstrate that a high school degree is required for a particular job.
8. Recommendations and reference checks
Recommendations and reference checks are often used to verify education, employment, and achievement records already provided by the applicant in some other form, such as during an interview or on a resume or application form. This is primarily done for professional and high-level jobs.
These verification procedures generally do not help separate potentially good workers from poor workers. This is because they almost always result in positive reports. However, use of these measures may serve two important purposes
- they provide an incentive to applicants to be more honest with the information they provide
- they safeguard against potential negligent hiring lawsuits.
9. Assessment centers
In the assessment center approach, candidates are generally assessed with a wide variety of instruments and procedures. These could include interviews, ability and personality measures, and a range of standardized management activities and problem-solving exercises. Typical of these activities and exercises are in-basket tests, leaderless group discussions, and role-play exercises. Assessment centers are most widely used for managerial and high level positions to assess managerial potential, promotability, problem-solving skills, and decision-making skills.
- In-basket tests ask the candidates to sort through a manager's "in-basket" of letters, memos, directives, and reports describing problems and scenarios. Candidates are asked to examine them, prioritize them, and respond appropriately with memos, action plans, and problem-solving strategies. Trained assessors then evaluate the candidates' responses.
- Leaderless group discussions are group exercises in which a group of candidates is asked to respond to various kinds of problems and scenarios, without a designated group leader. Candidates are evaluated on their behavior in the group discussions. This might include their teamwork skills, their interaction with others, or their leadership skills.
- In role-play exercises, candidates are asked to pretend that they already have the job and must interact with another employee to solve a problem. The other employee is usually a trained assessor. The exercise may involve providing a solution to a problem that the employee presents, or suggesting some course of action regarding a hypothetical situation. Candidates are evaluated on the behavior displayed, solutions provided, or advice given.
Assessors must be appropriately trained. Their skills and experience are essential to the quality of the evaluations they provide. Assessment centers apply the whole-person approach to personnel assessment. They can be very good predictors of job performance and behavior when the tests and procedures making up the assessment center are constructed and used appropriately.
It can be costly to set up an assessment center. Large companies may have their own assessment centers; mid-size and smaller firms sometimes send candidates to private consulting firms for evaluation.
10. Medical examinations
Medical examinations are used to determine if a person can safely and adequately perform a specific job. Medical exams may also be part of a procedure for maintaining comprehensive employee health and safety plans. In some limited circumstances, medical exams may be used for evaluating employee requests for reasonable accommodation for disabilities.
The Americans with Disabilities Act outlines when and in what manner medical exams can be used in employment-related situations. For additional information on the ADA, see Chapter 2 of the Guide, the EEOC Technical Assistance Manual on the Employment Provisions of the Americans with Disabilities Act, the EEOC ADA Enforcement Guidance: Preemployment Disability - Related Questions and Medical Examinations, and the EEOC Uniform Guidelines on Employee Selection Procedures. Some major points regarding medical exams are described below.
- Administering medical exams to job applicants or asking questions related to disability prior to making a job offer is prohibited.
- Once you make a job offer to an applicant, you may require a medical exam, as long as you require the exam of all persons entering the same job category. You may require a medical exam even if it bears no relevance to job performance. However, if you refuse to hire based on the results of the medical exam, the reasons for refusing to hire must be founded on issues of job-relevance and business necessity. In addition, you must demonstrate that no reasonable accommodation was available or possible without imposing undue hardship on your business.
- A medical exam may disqualify an individual who is deemed to be a direct threat to the health and safety of self or others. The EEOC has provided an explanation of what constitutes a direct threat. When an individual is rejected as a direct threat to health and safety,
- the employer must be prepared to show a significant current risk of substantial harm (not a speculative or remote risk)
- the specific risk must be identified
- consideration of the risk must be based on objective medical or other factual evidence regarding the particular individual
- even if a genuine significant risk of substantial harm exists, the employer must consider whether it can be eliminated or reduced below the level of a direct threat by reasonable accommodation.
- Stricter rules apply for medical exams or inquiries of current employees. Unlike the rules for applicants, these exams or inquiries must be justified based on job relevance and business necessity. The need for a medical exam may arise as a result of some problems with job performance or safety caused by a medical condition or it may be mandated by federal law for certain job categories.
- Your organization may conduct voluntary medical exams and inquiries of employees as part of an employee health program. However, the ADA imposes limitations on the use of this information. Medical records of all applicants and employees must be kept separate from all other personnel information.
If your organization uses medical information to make personnel decisions, you should develop a written policy on medical testing to ensure compliance with relevant federal, state, and local laws. For additional information on the ADA, see the EEOC Technical Assistance Manual on the Employment Provisions of the Americans with Disabilities Act, and the EEOC ADA Enforcement Guidance: Preemployment Disability - Related Questions and Medical Examinations.
11. Drug and alcohol tests
An employer may prohibit the use of alcohol and illegal drugs at the workplace and may require that employees not be under the influence of either while on the job. Some commonly reported negative work behaviors and outcomes associated with alcohol and drug abuse are industrial accidents, work-related injuries, excessive absenteeism or tardiness, and workplace violence.
Current use, possession, or distribution of illicit drugs does not qualify as a "disability" under the ADA. You may prohibit the use of such drugs at the workplace, and you may administer drug tests to applicants and employees alike. You may deny employment to an applicant and discipline or discharge an employee currently engaged in illegal drug use. However, you may not discriminate against a former drug addict who has successfully undergone rehabilitation and does not currently use illicit drugs.
If your organization is in the public sector, federal courts have generally upheld the use of random drug tests only when applied to safety-sensitive positions. This federal restriction does not apply if you are a private employer. However, state or local laws and collective bargaining agreements pertaining to drug testing may impose restrictions on your drug testing policy.
Some legal medications or even some foods can produce a positive reading on a drug screening test for an individual who, in fact, has not used illegal drugs. To minimize such errors, it is advisable to have a formal appeals process, and also provisions for retesting with a more sensitive drug test when necessary.
Under the ADA, a test for the illegal use of drugs is not considered a medical exam, but a test for alcohol use is. Therefore, you must follow the ADA rules on medical exams in deciding whether and when to administer an alcohol test to applicants or employees.
Alcoholism may qualify as a disability under the ADA, and hence an individual with this condition may be extended protection. However, organizations may discipline individuals who violate conduct or performance standards that are related to the job. Organizations also may discharge, or deny employment to individuals whose use of alcohol impairs job performance or compromises safety to the extent that he or she can no longer be considered a "qualified individual with a disability."
If your organization uses drug or alcohol tests to make personnel decisions, you should develop a written policy governing such a program to ensure compliance with all relevant federal, state, and local laws. Most states require written consent of employees and applicants before drug or alcohol tests can be administered. Consult the ADA, the EEOC Technical Assistance Manual on the Employment Provisions of the Americans with Disabilities Act, the EEOC ADA Enforcement Guidance: Preemployment Disability - Related Questions and Medical Examinations, and the EEOC Uniform Guidelines on Employee Selection Procedures, as well as your state and local laws when developing a drug or alcohol testing program.
Table 4. Main Advantages and Disadvantages of Different Types of
Assessment Instruments |
|
| Type of assessment instrument |
Advantages |
Disadvantages |
|
| Mental Ability tests |
- Are among the most useful predictors of performance across a wide variety of jobs
- Are usually easy and inexpensive to administer
|
- Use of ability tests can result in high levels of adverse impact
- Physical ability tests can be costly to develop and administer
|
|
Achievement/proficiency
tests |
- In general, job knowledge and work-sample tests have relatively high validity
- Job knowledge tests are generally easy and inexpensive to administer
- Work-sample tests usually result in less adverse impact than ability tests and written knowledge tests
|
- Written job knowledge tests can result in adverse impact
- Work-sample tests can be expensive to develop and administer
|
|
| Biodata inventories |
- Easy and inexpensive to administer
- Some validity evidence exists
- May help to reduce adverse impact when used in conjunction with other tests and procedures
|
- Privacy concerns may be an issue with some questions
- Faking is a concern (information should be verified when possible)
|
|
| Employment interviews |
- Structured interviews, based on job analyses, tend to be valid
- May reduce adverse impact if used in conjunction with other tests
|
- Unstructured interviews typically have poor validity
- Skill of the interviewer is critical to the quality of interview (interviewer training can help)
|
|
| Personality inventories |
- Usually do not result in adverse impact
- Predictive validity evidence exists for some personality inventories in specific situations
- May help to reduce adverse impact when used in conjunction with other tests and procedures
- Easy and inexpensive to administer
|
- Need to distinguish between clinical and employment-oriented personality inventories in terms of their purpose and use
- Possibility of faking or providing socially desirable answers
- Concern about invasion of privacy (use only as part of a broader assessment battery)
|
|
Honesty/integrity
measures |
- Usually do not result in adverse impact
- Have been shown to be valid in some cases
- Easy and inexpensive to administer
|
- Strong concerns about invasion of privacy (use only as part of a broader assessment battery)
- Possibility of faking or providing socially desirable answers
- Test users may require special qualifications for administration and interpretation of test scores
- Should not be used with current employees
- Some states restrict use of honesty and integrity tests
|
|
Education and
experience requirements |
- Can be useful for certain technical, professional, and higher level jobs to guard against gross mismatch or incompetence
|
- In some cases, it is difficult to demonstrate job relatedness and business necessity of education and experience requirements
|
|
| Recommendations and reference checks |
- Can be used to verify information previously provided by applicants
- Can serve as protection against potential negligent hiring lawsuits
- May encourage applicants to provide more accurate information
|
- Reports are almost always positive; they do not typically help differentiate between good workers and poor workers
|
|
| Assessment centers |
- Good predictors of job and training performance, managerial potential, and leadership ability
- Apply the whole-person approach to personnel assessment
|
- Can be expensive to develop and administer
- Specialized training required for assessors; their skill is essential to the quality of assessment centers
|
|
| Medical examinations |
- Can help ensure a safe work environment when use is consistent with relevant federal, state, and local laws
|
- Cannot be administered prior to making a job offer. Restrictions apply to administering to applicants post offer or to current employees.
- There is a risk of violating applicable regulations (a written policy, consistent with all relevant laws, should be established to govern the entire medical testing program)
|
|
| Drug and alcohol tests |
- Can help ensure a safe and favorable work environment when program is consistent with relevant federal, state, and local laws
|
- An alcohol test is considered a medical exam and applicable law restricting medical examination in employment must be followed.
- There is a risk of violating applicable regulations (a written policy, consistent with all relevant laws, should be established to govern the entire drug or alcohol testing program)
|
|
[back to Testing page] [back to Job page]
CHAPTER 5
How to Select Tests-Standards for Evaluating Tests
Previous chapters described a number of types of personnel tests and procedures and use of assessment tools to identify good workers and improve organizational performance. Technical and legal issues that have to be considered in using tests were also discussed. In this chapter, information and procedures for evaluating tests will be presented.
Chapter Highlights
1. Sources of information about tests
2. Standards for evaluating a test-information to consider to determine suitability of a test for your use
3. Checklist for evaluating a test.
Principle of Assessment
Use assessment instruments for which understandable and comprehensive documentation is available. |
1. Sources of information about tests
Many assessment instruments are available for use in employment contexts. Sources that can help you determine which tests are appropriate for your situation are described below.
- Test manual. A test manual should provide clear and complete information about how the test was developed; its recommended uses and possible misuses; and evidence of reliability, validity, and fairness. The manual also should contain full instructions for test administration, scoring, and interpretation. In summary, a test manual should provide sufficient administrative and technical information to allow you to make an informed judgment as to whether the test is suitable for your use. You can order specimen test sets and test manuals from most test publishers. Test publishers and distributors vary in the amount and quality of information they provide in test manuals. The quality and comprehensiveness of the manual often reflect the adequacy of the research base behind the test. Do not mistake catalogs or pamphlets provided by test publishers and distributors for test manuals. Catalogs and pamphlets are marketing tools aimed at selling products. To get a balanced picture of the test, it is important to consult independently published critical test reviews in addition to test manuals.
- Mental Measurements Yearbook (MMY). The MMY is a major source of information about assessment tools. It consists of a continuing series of volumes. Each volume contains reviews of tests that are new or significantly revised since the publication of the previous volume. New volumes do not replace old ones; rather, they supplement them.
The MMY series covers nearly all commercially available psychological, educational, and vocational tests published for use with English-speaking people. There is a detailed review of each test by an expert in the field. A brief description of the test covering areas such as purpose, scoring, prices, and publisher is also provided.
The MMY is published by the Buros Institute of Mental Measurements. The Buros Institute also makes test reviews available through a computer database. This database is updated monthly via an on-line computer service. This service is administered by the Bibliographic Retrieval Services (BRS).
- Tests in Print (TIP). TIP is another Buros Institute publication. It is published every few years and lists virtually every test published in English that is available for purchase at that time. It includes the same basic information about a test that is included in the MMY, but it does not contain reviews. This publication is a good starting place for determining what tests are currently available.
- Test Critiques. This publication provides practical and straightforward test reviews. It consists of several volumes, published over a period of years. Each volume reviews a different selection of tests. The subject index at the back of the most recent volume directs the reader to the correct volume for each test review.
- Professional consultants. There are many employment testing experts who can help you evaluate and select tests for your intended use. They can help you design personnel assessment programs that are effective and comply with relevant laws.
If you are considering hiring a consultant, it is important to evaluate his or her qualifications and experience beforehand. Professionals working in this field generally have a Ph.D. in industrial/organizational psychology or a related field. Look for an individual with hands-on experience in the areas in which you need assistance. Consultants may be found in psychology or business departments at universities and colleges. Others serve as full-time consultants, either working independently, or as members of consulting organizations. Typically, professional consultants will hold memberships in APA, SIOP, or other professional organizations.
Reference libraries should contain the publications discussed above as well as others that will provide information about personnel tests and procedures. The Standards for Educational and Psychological Testing and the Principles for the Validation and Use of Personnel Selection Procedures can also help you evaluate a test in terms of its development and use. In addition, these publications indicate the kinds of information a good test manual should contain. Carefully evaluate the quality and the suitability of a test before deciding to use it. Avoid using tests for which only unclear or incomplete documentation is available, and tests that you are unable to thoroughly evaluate. This is the next principle of assessment.
Principle of Assessment
Use assessment instruments for which understandable and comprehensive documentation is available. |
2. Standards for evaluating a test-information to consider to determine suitability of a test for your use
The following basic descriptive and technical information should be evaluated before you select a test for your use. In order to evaluate a test, you should obtain a copy of the test and test manual. Consult independent reviews of the test for professional opinions on the technical adequacy of the test and the suitability of the test for your purposes.
- General information
- Test description. As a starting point, obtain a full description of the test. You will need specific identifying information to order your specimen set and to look up independent reviews. The description of the test is the starting point for evaluating whether the test is suitable for your needs.
- Name of test. Make sure you have the accurate name of the test. (There are tests with similar names, and you want to look up reviews of the correct instrument.)
- Publication date. What is the date of publication? Is it the latest version? If the test is old, it is possible that the test content and norms for scoring and interpretation have become outdated.
- Publisher. Who is the test publisher? Sometimes test copyrights are transferred from one publisher to another. You may need to call the publisher for information or for determining the suitability of the test for your needs. Is the publisher cooperative in this regard? Does the publisher have staff available to assist you?
- Authors. Who developed the test? Try to determine the background of the authors. Typically, test developers hold a doctorate in industrial/organizational psychology, psychometrics, or a related field and are associated with professional organizations such as APA. Another desirable qualification is proven expertise in test research and construction.
- Forms. Is there more than one version of the test? Are they interchangeable? Are forms available for use with special groups, such as non-English speakers or persons with limited reading skills?
- Format. Is the test available in paper-and-pencil and/or computer format? Is it meant to be administered to one person at a time, or can it be administered in a group setting?
- Administration time. How long does it take to administer?
- Costs. What are the costs to administer and score the test? This may vary depending on the version used, and whether scoring is by hand, computer, or by the test publisher.
- Staff requirements. What training and background do staff need to administer, score, and interpret the test? Do you have suitable staff available now or do you need to train and/or hire staff?
- Purpose, nature, and applicability of the test
- Test purpose. What aspects of job performance do you need to measure? What characteristics does the test measure? Does the manual contain a coherent description of these characteristics? Is there a match between what the developer says the test measures and what you intend to measure? The test you select for your assessment should relate directly to one or more important aspects of the job. A job analysis will help you identify the tasks involved in the job, and the knowledge, skills, abilities, and other characteristics required for successful performance.
- Similarity of reference group to target group. The test manual will describe the characteristics of the reference group that was used to develop the test. How similar are your test takers, the target group, to the reference group? Consider such factors as age, gender, racial and ethnic composition, education, occupation, and cultural background. Do any factors suggest that the test may not be appropriate for your group? In general, the closer your group matches the characteristics of the reference group, the more confidence you will have that the test will yield meaningful scores for your group.
- Similarity of norm group to target group. In some cases, the test manual will refer to a norm group. A norm group is the sample of the relevant population on whom the scoring procedures and score interpretation guidelines are based. In such cases, the norm group is the same as the reference group. If your target group differs from the norm group in important ways, then the test cannot be meaningfully used in your situation. For further discussion of norm groups, see Chapter 7.
- Technical information
- Test reliability. Examine the test manual to determine whether the test has an acceptable level of reliability before deciding to use it. See Chapter 3 for a discussion of how to interpret reliability information. A good test manual should provide detailed information on the types of reliabilities reported, how reliability studies were conducted, and the size and nature of the sample used to develop the reliability coefficients. Independent reviews also should be consulted.
- Test validity. Determine whether the test may be validly used in the way you intended. Check the validity coefficients in the relevant validity studies. Usually the higher the validity coefficient, the more useful the test will be in predicting job success. See Chapter 3 for a discussion of how to interpret validity information. A good test manual will contain clear and complete information on the valid uses of the test, including how validation studies were conducted, and the size and characteristics of the validation samples. Independent test reviews will let you know whether the sample size was sufficient, whether statistical procedures were appropriate, and whether the test meets professional standards.
- Test fairness. Select tests developed to be as fair as possible to test takers of different racial, ethnic, gender, and age groups. See Chapter 7 for a discussion of test fairness. Read the manual and independent reviews of the test to evaluate its fairness to these groups. To secure acceptance by all test takers, the test should also appear to be fair. The test items should not reflect racial, cultural, or gender stereotypes, or overemphasize one culture over another. The rules for test administration and scoring should be clear and uniform. Does the manual indicate any modifications that are possible and may be needed to test individuals with disabilities?
- Potential for adverse impact. The manual and independent reviews should help you to evaluate whether the test you are considering has the potential for causing adverse impact. As discussed earlier, mental and physical ability tests have the potential for causing substantial adverse impact. However, they can be an important part of your assessment program. If these tests are used in combination with other employment tests and procedures, you will be able to obtain a better picture of an individual's job potential and reduce the effect of average score differences between groups on one test.
- Practical evaluation
- Test tryout. It is often useful to try the test in your own organizational setting by asking employees of your organization to take the test and by taking the test yourself. Do not compute test scores for these employees unless you take steps to ensure that results are anonymous. By trying the test out, you will gain a better appreciation of the administration procedures, including the suitability of the administration manual, test booklet, answer sheets and scoring procedures, the actual time needed, and the adequacy of the planned staffing arrangements. The reactions of your employees to the test may give you additional insight into the effect the test will have on candidates.
- Cost-effectiveness. Are there less costly tests or assessment procedures that can help you achieve your assessment goals? If possible, weigh the potential gain in job performance against the cost of using the test. Some test publishers and test reviews include an expectancy chart or table that you can consult to predict the expected level of performance of an individual based on his or her test score. However, make sure your target group is comparable to the reference group on which the expectancy chart was developed.
- Independent reviews. Is the information provided by the test manual consistent with independent reviews of the test? If there is more than one review, do they agree or disagree with each other? Information from independent reviews will prove most useful in evaluating a test.
- Overall practical evaluation. This involves evaluating the overall suitability of the test for your specific circumstances. Does the test appear easy to use or is it unsettling? Does it appear fair and appropriate for your target groups? How clear are instructions for administration, scoring, and interpretation? Are special equipment or facilities needed? Is the staff qualified to administer the test and interpret results or would extensive training be required?
3. Checklist for evaluating a test
It is helpful to have an organized method for choosing the right test for your needs. A checklist can help you in this process. Your checklist should summarize the kinds of information discussed above. For example, is the test valid for your intended purpose? Is it reliable and fair? Is it cost-effective? Is the instrument likely to be viewed as fair and valid by the test takers? Also consider the ease or difficulty of administration, scoring, and interpretation given available resources. A sample checklist that you may find useful appears on the following page. Completing a checklist for each test you are considering will assist you in comparing them more easily.
CHECKLIST FOR EVALUATING A TEST
- Characteristic to be measured by test (skill, ability, personality trait)
- Job/training characteristic to be assessed
- Candidate population (education, or experience level, other background)
- Test Characteristics
- Test name:
- Version:
- Type: (paper-and-pencil, computer) Alternate forms available
- Scoring method: (hand-scored, machine-scored)
- Technical considerations
- Reliability: r =
- Validity: r =
- Reference/norm group:
- Test fairness evidence
- Adverse impact evidence
- Applicability (indicate any special group)
- Administration considerations
- Administration time:
- Materials needed (include start-up costs, operational and scoring cost):
- Costs:
- Facilities needed:
- Staffing requirements
- Training requirements
- Other considerations (consider clarity, comprehensiveness, utility)
- Test manual
- Supporting documents from the publisher
- Publisher assistance
- Independent reviews
- Overall evaluation
[back to Testing page] [back to Job page]
CHAPTER 6
Administering Assessment Instruments
Proper administration of assessment instruments is essential to obtaining valid or meaningful scores for your test takers. This chapter discusses how to administer assessment instruments so that you can be certain that the results will be valid and fair.
Chapter Highlights
1. Training and qualifications of administration staff
2. Following instructions and guidelines stated in the test manual
3. Ensuring suitable and uniform assessment conditions
4. How much help to offer test takers
5. Test anxiety
6. Alternative assessment methods for special cases
7. Providing reasonable accommodation in the assessment process to people with disabilities
8. Administering computer-based tests
9. Obtaining informed consent of test takers and a waiver of liability claims
10. Maintaining assessment instrument security
11. Maintaining confidentiality of assessment results
12. Testing unionized employees
Principles of Assessment Discussed
Ensure that administration staff are properly trained.
Ensure that testing conditions are suitable for all test takers.
Provide reasonable accommodation in the assessment process for people with disabilities.
Maintain assessment instrument security.
Maintain confidentiality of assessment results. |
1. Training and qualifications of administration staff
The qualifications and training required for a test administrator will depend on the nature and complexity of the test. The more complex the test administration procedures, the more training an administrator will need. However, even simple-to-administer tests need trained staff to ensure valid results. Administrators should be given ample time to learn their responsibilities before they administer a test to applicants. Your staff may need professional training on test administration offered by some test publishers.
Only those staff who can administer the test in a professional and satisfactory manner should be assigned test administration duties. Test administrators should be well organized and observant, speak well, and be able to deal comfortably with people. They should also be trained to handle special situations with sensitivity. For example, they should know how to respond to a test taker's request for an accommodation and be able to calm down those who may become overly anxious about taking a test. This leads to our next principle of assessment.
Principle of Assessment
Ensure that administration staff are properly trained. |
2. Following instructions and guidelines stated in the test manual
Staff should be thoroughly familiar with the testing procedures before administering the test. They should carefully follow all standardized administration and scoring procedures as outlined in the test manual. Test manuals will indicate the test materials that are needed, the order of presentation, and the instructions that must be read verbatim. They will also indicate whether there are time limits, and, if so, what those time limits are. Any special instructions noted by the test manual should be observed. This includes meeting the requirements for specific equipment or facilities. Alterations can invalidate results.
3. Ensuring suitable and uniform assessment conditions
There are various extraneous influences that may affect the reliability and validity of an assessment procedure. To maintain the integrity of results you and your staff should make sure that adverse conditions are minimized.
- Choose a suitable testing location. Obtain a roo