WU SPSS Descriptive and Inferential Analyses Data Analysis Plan Paper

Throughout this course, you have practiced various skills that will allow you to identify, procure, and manipulate biosurveillance and secondary data. As addressed in previous sections, public health information needs are constantly growing, and the statistical analysis of data is just one step in this process. Decisions based on this information would rely not only on the accuracy of your analysis but also on the organization of its presentation.

This week for your Scholar-Practitioner Project you will conduct descriptive and inferential analyses using your selected data set, your prepared database from Week 8, and SPSS.

To prepare:

  • Review this week’s Learning Resources

, submit interpretation for your statistical analysis based on your selected data set, your prepared database, and SPSS. Mark sure to perform the following tasks for each of your research questions separately:

  • Provide interpretation for descriptive statistical analyses based on your SPSS output.
  • Summarize the numerical results with descriptive analysis tables or graphs, including your interpretation.
  • Provide interpretation of your inferential statistical analyses using SPSS outputs.
  • Summarize the numerical results with inferential analysis tables or graphs, including your interpretation.
  • Provide full answer and interpretation for each of your research question(s).
  • Follow APA guidelines.

REQUIRED READINGS

Kamin, L. F. (2010). Using a five-step procedure for inferential statistical analyses. The American Biology Teacher, 72(3), 186–188.

 

Maiti, T. (2005). Tutorials in biostatistics, vol. 1: Statistical methods in clinical studies / tutorials in biostatistics, vol. 2: Statistical modelling of complex medical data. Journal of the American Statistical Association, 100(472), 1468–1468.

 

Marshall, G., & Jonker, L. (2010a). A concise guide to descriptive statistics. Synergy, 22–25.

 

Marshall, G., & Jonker, L. (2010b). A concise guide to inferential statistics. Synergy, 20–24.

 

McHugh, M. L. (2003a). Descriptive statistics, part I: Level of measurement. Journal for Specialists in Pediatric Nursing8(1), 35–37.

 

McHugh, M. L. (2003b). Descriptive statistics, part II: Most commonly used descriptive statistics. Journal for Specialists in Pediatric Nursing8(3), 111–116.

 

Silva-Ayçaguer, L. C., Suárez-Gil, P., & Fernández-Somoano, A. (2010). The null hypothesis significance test in health sciences research (1995–2006): Statistical analysis and interpretation. BMC Medical Research Methodology, 10(1), 44.

 

Thebane, L., & Akhtar-Danesh, N. (2008). Guidelines for reporting descriptive statistics in health research. Nurse Researcher, 15(2), 72–81.

 

Wolverton, M. L. (2009). Research design, hypothesis testing, and sampling. The Appraisal Journal77(4), 370–382.

OPTIONAL RESOURCES

Bingenheimer, J. B., & Raudenbush, S. W. (2004). Statistical and substantive inferences in public health: Issues in the application of multilevel models. Annual Review of Public Health, 25, 53–77.

Diez-Roux, A. (2000). Multilevel analysis in public health research. Annual Review of Public Health, 21, 171–192.
Note: Retrieved from the Walden Library databases.

Forthofer, R. N., Lee, E. S., & Hernandez, M. (2007). Biostatistics: A guide to design, analysis, and discovery. Amsterdam, Netherlands: Elsevier Academic Press.

  • Chapter 3, “Descriptive Methods”
  • Review: Chapters 8–15

Gruber, F. A. (1999). Tutorial: Survival analysis—A statistic for clinical, efficacy, and theoretical applications. Journal of Speech, Language, and Hearing Research, 42(2), 432–447.
Note: Retrieved from the Walden Library databases.

Lee, E. T., & Go, O. T. (1997). Survival analysis in public health research. Annual Review of Public Health, 18, 105–134.

Levy, P. S., & Stolte, K. (2000). Statistical methods in public health and epidemiology: A look at the recent past and projections for the next decade. Statistical Methods in Medical Research, 9(1), 41–55.

Peace, K. E., Parrillo, A. V., & Hardy, C. J. (2008). Assessing the validity of statistical inferences in public health research: An evidence-based ‘best practices’ approach. Journal of the George Public Health Association, 1(1),10–23. Retrieved from http://faculty.mercer.edu/thomas_bm/documents/jgph…

WU SPSS Descriptive and Inferential Analyses Data Analysis Plan Paper

A concise guide to… descriptive statistics Marshall, Gill;Jonker, Leon Synergy; Oct 2010; ProQuest Central pg. 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Descriptive statistics, part I: Level of measurement McHugh, Mary L Journal for Specialists in Pediatric Nursing; Jan-Mar 2003; 8, 1; ProQuest Central pg. 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. abstract statistical applications are an essential Research Design, Hypothesis Testing, and Sampling by Marvin L. Wolverton, PhD, MAI element of the practicing appraiser’s tool kit. the extent to which statistics are employed in appraisal practice depends on a number of circumstances, such as the scope of work, the statistical competence of the analyst, and the availability of data. this article discusses essential theory necessary for competence in statistical inference— that is reaching a conclusion about an unknown characteristic of a population based on sample data. this article explains how to design and implement Research Design and Hypothesis Testing What Is the Question? At its most elementary level, the application of inferential statistics boils down to answering questions. For example, we might ask, “Does the theory of diminishing marginal utility hold for this property type in this market area?” Or, “How does this particular rental market react to proximity to public transportation?” Or, “What, if any, is the influence of nearby street noise on home prices in the subject subdivision?” Ultimately, the question may be as fundamental as, “What is my opinion of market value for this property, and can the credibility of my opinion be bolstered by the application of inferential methods?” Research questions like these may represent the entire scope of a valuation services assignment, or they may be small but important aspects of a more comprehensive study. In either case, the effective application of inferential methods requires a clear understanding of the relevant questions, explicit or implicit formulation of testable hypotheses, appropriate data, credible analysis, and valid interpretation of the analytical results. In much the same way that the appraisal process serves as a systematic and organized way to design a work plan consistent with the scope of a specific assignment, statistical analysis research design provides a road map for moving from research question to insight. inferential statistical research and examines From Research Question to Testable Hypotheses the topics of hypothesis Hypothesis testing relies on a principle often referred to as “Popperian falsification.” Early twentieth-century philosopher Karl Popper held that inferential statistics cannot prove anything with absolute certainty. However, inferential methods can cast doubt on the veracity of an assertion of “truth.” When sufficient doubt can construction, reliability and validity of research, sampling, setting significance levels, and sample size. The material in this article originally was published as chapter 6 in Marvin L. Wolverton, An Introduction to Statistics for Appraisers (Chicago: Appraisal Institute, 2009). 370 the appraisal Journal, Fall 2009 Research Design, Hypothesis Testing, and Sampling be raised, an assertion of truth can be “falsified,” at least to some degree. The degree of certainty associated with labeling a statement as false is related to “statistical significance” or merely “significance” in the language of statistics. The process of forming and testing a hypothesis (i.e., a theory) is as follows: 1. Determine an appropriate expected outcome based on theory and experience. This is generally referred to in inferential statistics as a “research hypothesis.” 2. Formulate a pair of testable hypotheses related to the research hypothesis: a “null hypothesis” and an “alternative (research) hypothesis.” The testable hypotheses must be mutually exclusive and collectively exhaustive. The hypothesis testing goal is to falsify or reject the statement of truth implied by the null hypothesis, leaving the research hypothesis as the only reasonable alternative. 3. Formulate a conclusion that falsifies (or fails to falsify) the null hypothesis. Hypothesis Testing in the Real World While the three-step process of forming and testing a hypothesis is easy to outline, it is generally much more difficult to apply. Let’s consider a simple example and think about the complications that may arise. Consider the effect of street noise on housing prices. This simple residential valuation issue illustrates the complications encountered in formulating and testing real world hypotheses. An appropriate research hypothesis could be a general statement like, “Exposure to street noise affects home price.” Or, if the analyst makes a more specific supposition concerning the direction of the effect, the research hypothesis could be, “Exposure to street noise reduces home price.” Depending on the scope of the assignment, the appropriate research question could be much more specific than this. More refinement might be required, resulting in research hypotheses such as “Exposure to street noise in excess of ‘X decibels’ above the ambient noise level reduces home price” or “Exposure to street noise reduces home price, but the size of the reduction decreases with distance from an abutting street and becomes negligible at ‘X feet’ from the abutting street.” Based on these simple examples, it should be apparent that the testable hypotheses must be customized to the underlying research question. For simplicity’s sake, assume that the appropriate research hypothesis is “Exposure to street noise reduces home price in the subject property’s market area.” This statement becomes the alternative hypothesis—the hypothesis you believe the data will support. Remember that the null hypothesis and the alternative hypothesis are mutually exclusive and collectively exhaustive. The null hypothesis (the statement of truth you are attempting to falsify) would therefore be “Exposure to street noise either increases home price or has no effect on home price in the subject property’s market area.” These two statements are mutually exclusive (only one of them can be true), and they are collectively exhaustive (home price must go up, down, or stay the same). In summary, the relevant hypotheses for this example are • Research hypothesis: Exposure to street noise reduces home price in the subject property’s market area. • Testable hypotheses: * Null hypothesis (H0 ): The street noise price effect is ≥ 0. * Alternative hypothesis (Ha ): The street noise price effect is < 0. Once these hypotheses have been formulated, a research plan must be devised that allows the analyst to credibly test the veracity of the null hypothesis. If the null hypothesis can be falsified with sufficient certainty, then the analyst can conclude that the alternative hypothesis is likely to be true. It is important to recognize, however, that inferential statistical methods are not intended as means of supporting illogical, unreasonable, or atheoretical suppositions. The research and alternative hypothesis statements should be well reasoned and logical, keeping in mind that inferential methods are designed to support valid research hypotheses. Validity and Reliability Tests of the veracity of the null hypothesis are of no use unless the tests are credible (i.e., worthy of belief). Two concepts—validity and reliability1—are paramount to credible hypothesis testing. These 1. A recommended source for a discussion of validity and reliability in research design and implementation is Mary L. Smith and Gene V. Glass, Research and Evaluation in Education and the Social Sciences (Boston: Allyn and Bacon, 1987). Research Design, Hypothesis Testing, and Sampling The Appraisal Journal, Fall 2009 371 Figure 1 Less than Ideal Target Shooting Lack of research validity stems from many sources, and assessing the validity of research involves numerous considerations such as • Logical validity • Construct validity • Internal validity • External validity • Statistical conclusion validity • Bias Unreliable Reliable but not valid concepts are deeply rooted in research design and scientific inquiry. The concepts of reliability and validity can be confusing at first, but they are actually quite simple. For example, consider Figure 1, which illustrates the idea that reliability is analogous to clustering shots on a target. Shots that are scattered all over the target, as in the left panel, are unreliable. Shots that are tightly clustered but off center, as in the right panel, are reliable but invalid. Only shots that are tightly clustered and centered on the target are reliable (consistent) and valid (accurate). Because threats to reliability and validity erode credibility, credible research and valuation-related opinions are more likely to occur when analysts understand and assess the extent to which the methods employed were both reliable and valid. Paying attention to a few simple criteria can go a long way toward ensuring credible results. For example, logical research designs, controlling measurement error, standardizing interview protocols, using representative data, and applying appropriate analytical tools are basic and essential elements of using statistical methods to support credible valuation opinions. Validity Strictly speaking, validity is the extent to which a statistical measure reflects the real meaning of what is being measured. Consider a scale that consistently indicates weights that are 95% of true weight. Obviously the result is not valid when the intent is to measure true weight. Although this sort of measurement error is correctable if the error is consistent and known, measurement error is usually neither consistent nor known in many situations. 372 The Appraisal Journal, Fall 2009 Logical Validity A research design consists of several parts, such as a problem statement, a research hypothesis, selection and definition of variables, implementation of design and procedures, findings, and conclusions. Logical validity is satisfied when each part of the overall design flows logically from the prior step. If the overall design isn’t logical, then the results aren’t likely to be valid. Appraisers should already be familiar with the elements and logical flow of research design because the valuation process is a similar algorithm. Construct Validity Construct validity deals with how well actual attributes, characteristics, and features are being measured. For example, although tall people generally weigh more than short people, use of a weight scale is not a valid construct for measuring height. As this example illustrates, construct validity is a simple concept, but it can be quite nuanced in practice. Concerns about construct validity are particularly applicable to the use of interviews and questionnaires. Precise definitions of variables and the elimination of ambiguity are important in ensuring that questions are not misinterpreted by respondents or researchers. Meanings assigned by respondents should be consistent with the meanings intended by the researcher. Furthermore, meaning should be consistent from respondent to respondent. Construct validity is especially problematic when respondents must interpret technical or scientific language, as is the case in many interviews related to real property transactions (e.g., sales confirmation). Do not assume that persons being interviewed fully understand the meanings of technical terms such as capitalization rate, internal rate of return, net operating income, effective gross income, obsolescence, and the like. Research Design, Hypothesis Testing, and Sampling Internal Validity Internal validity requires that all alternative explanations for causality have been ruled out. Ruling out threats to internal validity is a laborious task because it requires explicit identification of each alternative explanation for causation along with the rationale for rejecting it. If all reasonable alternative causes cannot be ruled out, the research may be inconclusive and invalid. External Validity External validity exists when findings and conclusions can be generalized from a representative sample to a larger or different population. Random selection from a target population is the best means of obtaining a representative sample, subject to the vagaries of sampling error, which is ubiquitous. Therefore, when a random sample has been obtained, the analyst should assess the extent to which the characteristics of the sample match the characteristics of the target population. Statistical Conclusion Validity Statistical conclusions will not be valid if the statistical tests being applied are inappropriate for the data being analyzed. The researcher should be aware of the assumptions underlying each statistical test and how robust the test is if those assumptions are violated. Bias Bias occurs when there is a systematic error in research findings. Bias can come from several sources, and it can be classified into two categories: nonsampling error and sampling error.2 Nonsampling error includes nonresponse bias, sample selection bias, and systematic measurement error. Sample selection bias may be encountered in real property studies, which often rely on observational samples (e.g., the occurrence of a comparable sale cannot be assumed to have been a random event). Sampling error stems from the fact that a random sample can differ from the underlying population simply by chance. Reliability Reliability is the extent to which “the same data would have been collected each time in repeated observations [measurements] of the same phenomenon.”3 A reliable model would produce results that can be thought of as consistent, dependable, and predictable. As an example, assume that six appraisers are asked to measure the same 1,400-sq.-ft. house and calculate its improved living area. A set of estimates consisting of 1,360 square feet, 1,420 square feet, 1,400 square feet, 1,450 square feet, 1,350 square feet, and 1,340 square feet would not be reliable, even though they tend to bracket the true floor area. However, in comparison, a set of estimates consisting of 1,440 square feet, 1,435 square feet, 1,445 square feet, 1,445 square feet, 1,440 square feet, and 1,435 square feet would be more reliable, despite not bracketing the true floor area. Although the second set of living area estimates is more reliable (i.e., predictable and consistent), the floor area calculations exhibit a systematic, upward bias, making this set of estimates invalid. An ideal set of estimates would be highly consistent (reliable) and accurate (valid), such as 1,398 square feet, 1,402 square feet, 1,400 square feet, 1,395 square feet, 1,405 square feet, and 1,400 square feet. Reliability can be difficult to attain, especially when data come from sources beyond the analyst’s control. For example, subjective assessments of condition, construction quality, and curb appeal provided by third parties may be unreliable, especially if more than one person is rendering opinions. What appears to be “excellent” to one person may be viewed as being “above average” or merely “average” to another. Because reliability can be difficult to assess and control, it is good practice to think about possible threats to reliability that may be encountered. When data comes from an outside source, ask if a standardized measurement or categorization protocol was employed. Find out if more than one person was involved in making quality or condition assessments. Think about how errors in scoring or measurement may occur, and make random checks for measurement error. Look for the use of ambiguous questions, ambiguous instructions, or idiosyncratic (technical) language that might be difficult for respondents to comprehend. Sampling A sample is a subset of a larger population selected for study. When the research goal is to better un- 2. David M. Levine, Timothy C. Krehbiel, and Mark L. Berenson, Business Statistics: A First Course, 3rd ed. (Upper Saddle River, N.J.: Prentice Hall, 2003), 23-25. 3. Earl Babbie, The Practice of Social Research, 6th ed. (Belmont, Calif.: Wadsworth, 1992). Research Design, Hypothesis Testing, and Sampling The Appraisal Journal, Fall 2009 373 derstand the larger population, the sample should be as similar to the larger population as possible. Statisticians use the term “representative” to indicate the similarity of a sample to the larger population. When a sample is not representative, it is difficult to assert that the characteristics of the sample are indicative of the characteristics of the larger target population. While sampling is a simple concept, it can be a challenging process in application. The first challenge is obtaining a sample frame, which is a list of items in, or members of, the population you want to study. Sometimes full or partial lists exist. Often they do not exist at all, or the compilers of the lists are unwilling to allow access to them. For example, if you were interested in knowing what percentage of lake homes in your state are serviced by central sewer systems and how many have on-site septic systems, you could develop a representative sample of lakeshore properties to obtain an estimate of the population proportions. However, obtaining a comprehensive list of lakeshore properties (the sample frame) in order to draw the sample could be difficult. Compiling an owner’s list yourself from county records is one option, but it would be time-consuming. Samples can be broadly divided into two categories—probability samples and nonprobability samples. Probability samples are characterized by knowledge of the probability that an item in the population will be selected. As you would expect, the probability that an item will be selected is unknown in a nonprobability sample. Statistical inferences formed through the analysis of probability samples are preferred because inferences drawn from nonprobability samples may be unreliable and inaccurate. Probability Samples Numerous probability sampling methods exist, and the most common include • Simple random samples • Stratified random samples • Systematic random samples • Cluster samples Simple Random Samples In a simple random sample, ever y item in a population has the same probability of selection. A simple random sample may be selected either with replacement or without replacement. When sampling 374 The Appraisal Journal, Fall 2009 with replacement, the probability of selection for each member of the population is 1/N each time a selection is made, where N represents total population size. When sampling without replacement the probability of selection increases as items are selected. The probability of selection for the first item selected is 1/N, reducing to 1/(N – 1) for the second item selected, 1/(N – 2) for the third item selected, and so forth as the unsampled population size is reduced through the sample selection process. Think of sampling with replacement as picking a card from a full deck, replacing the card, shuffling the deck, and picking another card. In contrast, think of sampling without replacement as being dealt a hand of poker, with each new card in your hand being dealt from a smaller and smaller deck. Because each item in a population has an equal probability of selection on a given draw, simple random samples are considered to be highly representative. Nevertheless, it is still possible to randomly select a nonrepresentative sample merely by chance. This possibility is referred to as sampling error. Although sampling error cannot be totally eliminated, it can be minimized through the selection of larger samples. Stratified Random Sample Creating a stratified random sample begins by dividing the population into subgroups (known as strata) based on one or more essential characteristics. Once this has been done, you can select random samples from each stratum. Stratified samples ensure that the sample proportion for the stratifying characteristic is identical to the population proportion, reducing sampling error and improving the accuracy of inferences. As a simple example of the value of stratified random sampling, assume that you want to use sampling to make a statement about the mean apartment rent in a market area. Assume also that the apartment population contains many floor plans with different bedroom and bath counts. If a simple random sample were used, you would have no assurance (due to sampling error) that the floor plan mix of the sample would be identical to the floor plan mix of the population. Although the mix would, on average, be the same with repeated random samples, the mix is apt to differ from the population in any single sample. Use of a stratified sample allows you to control the proportion of the sample being drawn from each apartment unit type, thereby controlling Research Design, Hypothesis Testing, and Sampling for unit-mix sampling error. When this is done, sample mean rent is a more accurate estimate of population mean rent. When the parameter of an important population proportion is known, a stratified random sample mirroring the population proportion usually provides the most accurate inferences. You might be wondering, “If stratified random sampling improves the accuracy of inferences, why isn’t it done more often?” The primary reason is insufficient understanding of the population proportion for one or more important characteristic. For instance, the simple stratification in the preceding paragraph could not be done if the population unit mix proportions were unknown. Systematic Random Sample A systematic sample is just what its name implies—a system employed to select the sample from the frame. Systematic sampling typically involves sorted data such as accounting records filed by date or medical records filed alphabetically. For example, if you want to sample 1,000 files out of a population of 30,000 files, you could decide to select every 30th file. You could then randomly select a file from the first 30 files as a starting point and then select every 30th file after the starting point. If you randomly chose to start with file 14, your sample would consist of files 14, 44, 74, 104, and so forth. While systematic sampling may seem convenient, it can pose problems when there is a systematic pattern associated with how the data were sorted. If this is the case, the sample could be biased. Say, for example, you are auditing your company and randomly choose to look at accounting records from the 4th and 23rd day of each month. You would not be happy to learn, after the fact, that a part-time employee who helped out on the 14th and 15th of each month had been embezzling money. Because you randomly chose the wrong days to audit, the theft would have gone undiscovered. Had a random sample been drawn from each month of the year, there would have been an 80% probability of picking the 14th or 15th day of at least one month.4 The pattern in the data, along with the systematic sample’s unfortunate starting point, biased the sample by inadvertently excluding all of the dates when criminal activity occurred. Use of a systematic sample requires an assessment of the likelihood of the existence of a pattern in the data in the sample frame that could bias the sample. When in doubt, use a different sampling method. Cluster Sampling Cluster sampling is often used for geographic data such as real estate where clusters are naturally occurring. City blocks, subdivisions, census tracts, and zip codes are examples of naturally occurring geographic clusters. Random selection of clusters and of items within each selected cluster constitutes a random sample. Consider the apartment sample referred to in the earlier discussion of stratified random samples. If there were no available sample frame, you could draw a sample by randomly selecting geographic clusters (e.g., census tracts) within the study area, identifying all of the apartments within each selected cluster, and randomly selecting a sample from the identified apartments in each cluster. The resulting sample would be representative of the population if the selected clusters were representative of the market and the properties chosen from each cluster were representative of their cluster. The problem with this method is one that appraisers are familiar with from other contexts—namely, compounding error. Achieving the state of “representativeness” becomes a multilayered construct in the use of cluster sampling. If the coarser selection layer—the clusters—is not representative, then the sample will not be representative regardless of how well the selected properties represent their clusters. If the coarser, cluster layer is representative, the more focused granular selection layer—properties within each cluster—may still not be fully representative if some or all of the selected properties do not represent their cluster. Due to these issues, sample size in terms of number of clusters and items selected from each cluster should be greater than the sample size required for a simple random sample or stratified sample. When a sample frame is unavailable, cluster sampling may be the only alternative. Care should be taken however to ensure that the clusters are as representative of the population as possible. With geographic data this often entails selecting clusters that incorporate all of a market area’s important 4. Assuming a 30-day month, the probability of randomly picking the 14th or 15th each month is 2 ÷ 30, or 1/15th. Over 12 months this sums to 12/15ths, or 80%. Research Design, Hypothesis Testing, and Sampling The Appraisal Journal, Fall 2009 375 geographic variables. Depending on the situation, important geographic variables might include • School districts • Municipalities • Counties • Age of neighborhoods • Relative household incomes • Length of commutes Self-Selection and the Appraiser’s Quandary Recall from the earlier discussion of validity that sample selection bias may be encountered in real property studies, which often rely on observational samples because a self-selection process separates properties that are offered for sale from those that are not offered for sale. Property owners are not randomly chosen to sell their homes each month. Therefore, a sample of homes “for sale” or “sold” may not be representative of the population of all similar properties in a market. Self-selection may or may not be a problem, depending on how and if sold properties differ from unsold or not-for-sale properties. Generally speaking, in broader and more active markets self-selection is less likely to be a relevant issue. For example, the housing market is more active than the shopping center market and is less likely to exhibit systematic differences between properties offered for sale and properties not for sale. Nevertheless, some residential neighborhoods could be affected by a localized externality such as an environmental hazard, plant closing, or change in access.5 In such cases data from an affected location might not be representative of properties in unaffected locations. The retail sector provides a good example of how self selection can affect real property transaction data. Suppose a prominent and common anchor tenant is ceasing operation or reorganizing through bankruptcy, and several of the market area’s shopping centers occupied by this anchor tenant are offered for sale. If and when these properties sell, they probably would not be representative of the remaining shopping centers in the market that were not affiliated with this tenant. Statistical analysis of market transaction data biased by inclusion of these sales might misrepresent the segment of the retail population unaffected by the store closings. The same logic applies to comparable rents associated with retail centers having dark anchor stores. Because real property offered for sale or rent is a self-selected sample rather than a random sample, appraisers should take care to ensure that the transaction data being analyzed is truly representative of the subject property’s competitive market. Experienced appraisers should be able to determine the influence, if any, of self-selection in a market that may preclude some data from inclusion in a given analysis or study. Furthermore, competent appraisers know that unfamiliarity with a market and an inability to assess the existence of self-selection bias within it require the assistance of someone who understands the market in order to credibly assess transaction data. Nonprobability Samples Nonprobability samples are less useful for inference than the sorts of probability samples we have been talking about so far because the conclusions we can reach through statistical analysis of a nonprobability sample are sample-specific. Information obtained from the sample data may not be applicable to the larger population because there is no guarantee that the sample data are representative of the population. For example, Internet surveys where users of a site are asked their opinion on matters as varied as election outcomes, results of sports contests, or whether or not an economic recession is looming are nonprobability samples. The results of such a survey only tell us how the proportion of a Web site’s users who responded felt about the issue. We do not know if survey respondents are representative of all of the site users. Nor do we know if the opinions of the respondents mirror the opinions of the general population. The survey results could be applicable to the general population, but no statistical measure has been provided of the relationship of such a nonprobability sample to the general population. This is why expert, professional appraisal judgment is necessary when applying statistical analysis of comparable sale or rental data to a subject property or subject market. Because the generation of comparable data is largely a self-selection process, valuation expertise is required to assess whether or not comparable data items are representative of 5. Localized externalities differ from marketwide externalities affecting all properties. For example, the 2008/2009 residential foreclosure wave determines the market in many locales, and a representative sample would legitimately be expected to include foreclosed properties and foreclosure price effects, if any. 376 The Appraisal Journal, Fall 2009 Research Design, Hypothesis Testing, and Sampling the subject of a study. When unrepresentative data items are identified, the items can either be removed from the analysis or flagged for later treatment (i.e., attempting to statistically control and adjust for the aspects of the transactions that cause them to be unrepresentative of the subject market). A decision to exclude data or to apply some form of statistical control depends on the amount of available data and the reason for conducting the study. When a reduced data set excluding unrepresentative data is large enough for a valid study, the unrepresentative data may be excluded. For example, in a residential context it is usually preferable to exclude luxury home sales and entry-level home sales from a study of midpriced home values. While it is possible to statistically control for the differences between luxury homes, entry-level homes, and mid-priced homes, figuring out which controls to employ may not be a simple task. However, if we needed to know the effects of an externality such as street noise or power line proximity on an entire residential market, then we would most likely want to understand the effect of the externality across all price categories—entry-level, mid-priced, and luxury. The study of the entire residential market might also include apartments, condominiums, and townhomes in the sample, depending on the scope of work applicable to the assignment. Sampling Error Sampling error occurs when the sample differs from the population. In hypothesis testing, sampling error may result in rejection of a null hypothesis that is actually true or it may result in failure to reject a null hypothesis that is actually false. Either of these results will lead to an inappropriate conclusion. In the first case, the research hypothesis is flawed, but the data indicate that it is not. In the second case, the research hypothesis is correct, but the analysis doesn’t support it. The outcomes of hypothesis testing can be reduced to four possibilities: H0 is true H0 is false H0 is true H0 is false fail to reject H0 reject H0 reject H0 fail to reject H0 correct result correct result erroneous result erroneous result Rejecting a true null hypothesis is referred to as Type I Error. With Type I error the study supports the research hypothesis even though it is based on a Research Design, Hypothesis Testing, and Sampling false supposition. The probability of rejecting a true null hypothesis is symbolized by the lowercase Greek letter alpha (α), which is called the significance level. The probability of not making a Type I error (1 – α) is referred to as the confidence level. The probability of Type I error can be controlled by selecting the significant level α prior to performing a statistical test of the null hypothesis. The researcher decides on an acceptable probability of rejecting a true null hypothesis and rejects the null hypothesis if the statistical results are at or better than the predetermined α threshold. For example, if α is set at 5% and the statistical result is an α of 3%, the result is considered to be significant and the null hypothesis is rejected. If this seems confusing, look at it from a confidence level perspective. Setting α at 5% is the same as saying, “If the data allow me to be 95% confident that my research supposition is correct, then I am going to reject the null hypothesis and accept the research hypothesis.” For example, if the analysis results in α of 3%, the corresponding confidence level is 97%. In this case we have exceeded the 95% confidence level threshold, supporting the validity of our research hypothesis. One way to guard against the erroneous rejection of a null hypothesis that is actually true is to take care in the construction of the research hypothesis. Note that the null hypothesis is true only if the research hypothesis is false. Better reasoning, logic, and understanding of underlying phenomena will help guard against flawed research designs that attempt to support false suppositions. The erroneous failure to reject a null hypothesis that is actually false is known as Type II Error. In this case the study fails to support the research hypothesis even though it is based on a true supposition. The probability of Type II error is symbolized by the lowercase Greek letter beta (β). Unfortunately β cannot be known with certainty unless you know the true population parameter you are attempting to infer. (If you know β, why would you be attempting to infer it?) Consider the example of the effect of traffic noise on home price. If the effect of noise is substantial, then the probability of failing to support the research hypothesis is small. If the effect does exist but is not substantial, then we will have more difficulty demonstrating the effect statistically, meaning that β will be relatively large. When the effect of traffic noise is small, the statistical analysis must be more “powerful,” increasing the probability of demon- The Appraisal Journal, Fall 2009 377 strating the effect. The power of a statistical test is symbolized by 1 – β. Statistical power can be increased in three ways: Figure 2 The Standard Normal Deviation 1. Relax α. This choice may not be very satisfactory, if the initial logic behind the initial decision on α has not changed.6 2. Increase the size of the sample. Small effects are much easier to uncover with more data. 3. Eliminate confounding effects. In the street noise example, the street noise effect may be masked if lots abutting a thoroughfare are generally larger than lots in the interior of the same subdivision. Controlling for lot size in the analysis should improve a model’s ability to detect the effect of street noise. Relating Choice of Significance Level α to the Standard Normal Distribution Choice of α is a way of stating how far—in statistical distance—the sample mean must be from what the population mean would be if the null hypothesis were true in order to reject the null hypothesis. Consider a simple pair of statistical test hypotheses: H0: μ = 0 Ha: μ ≠ 0 If we select α = 0.05 and the population to which μ applies is normally distributed, then we are saying, “If the sample mean is 1.96 standard deviations or more from 0, I will reject the null hypothesis that the population mean is 0.” Why 1.96 standard deviations? If we look up Z = 1.96 in the standard normal table, we will find a probability of Z ≤ 1.96 = 0.975. Additionally when we look up -1.96 on the standard normal table, we will find a probability of Z ≤ -1.96 = 0.025. Therefore, P(-1.96 ≤ Z ≤ 1.96) = 0.975 – 0.025 = 0.95. The confidence level of 95% is associated with α = 0.05 (5%). To be 95% confident that the null hypothesis of μ = 0 is false, x must be at least 1.96 standard deviations from 0. This concept is illustrated pictorially in Figure 2. Figure 2 shows the standard normal curve along with the locations of 0 standard deviations 6. Because the choice of a high level of significance reduces 378 The Appraisal Journal, Fall 2009 P(x ≤ 1.96) = 0.025 P(x ≥ 1.96) = 0.025 −1.96 0 1.96 Standard Deviations (the middle of the curve) and ±1.96 standard deviations. Recall that the area under the curve is equal to 1 (100%) and the area under a portion of the curve is equal to the probability of a sample mean (x ) being in that location when the true population mean is 0. By looking up -1.96 in the standard normal table we find that the area under the curve to the left of -1.96 is 0.025 (or a 2.5% probability of x being in this location if μ = 0). By looking up 1.96 in the standard normal table we find that the area under the curve to the left of 1.96 is 0.975 (97.5%), leaving 0.025 to the right of 1.96 (2.5% probability of x being in this location if μ = 0). Therefore, with α = 0.05, we can reject the null hypothesis that μ = 0 if x ≤ -1.96 standard deviations from 0 or if x ≥ 1.96 standard deviations from 0. Hypothesis testing is a skill that is developed through practice. So, let’s work through an example problem. Suppose we interviewed a representative of a fast food franchise and were told that the chain’s average restaurant floor area is 2,400 square feet. Other sources indicate that the average floor area for this particular fast food concept has grown over time as menus have expanded and adjusted to new consumption patterns. This fast food concept is fairly new to your state and we suspect that the average floor area here exceeds 2,400 square feet. We decide to use a random sample of floor areas to test our hypothesis and decide also that if we can be 90% confident we will conclude that the average floor area in this state exceeds 2,400 square feet. First we state the research, null, and alternative hypotheses and the significance level required to reject the null hypothesis with 90% confidence. β, the effect on β should have been considered in the initial selection of α. Research Design, Hypothesis Testing, and Sampling Research Hypothesis: Average floor area exceeds 2,400 square feet. H0: μ ≤ 2,400 square feet Ha: μ > 2,400 square feet α = 0.10 Notice that the null hypothesis in this example contains the “≤” symbol rather than the “=” symbol because the research hypothesis is stated as “exceeds.” Remember that the null and alternate hypotheses must be mutually exclusive and collectively exhaustive, therefore H0 must cover all of the possibilities that differ from Ha. Next, we calculate the sample mean and assume for now that we know the population standard deviation σ. x = 2,560 square feet σ = 114 square feet Z= x -m s = 2,560 – 2,400 = 1.40 114 Is the sample mean of 2,560 far enough from 2,400 in statistical terms to reject the hypothesis that the mean floor area is 2,400 square feet or less? We can address this question in one of two ways: 1. Select the Z value associated with α = 0.10 and compare 1.40 to this significance level threshold. 2. Assess the probability of x being 1.40 standard deviations or more from the hypothesized mean, and compare this result to the required α level of 10%. Let’s do it both ways. The Z value associated with α = 0.10 is the value–call it “B ”—where P(Z ≤ B) = 0.90. The standard normal table indicates that this occurs with a value of approximately 1.28 standard deviations. The value of 1.28 is called the critical value of Z because an x result of this amount or more is required to reject the null hypothesis.7 Because 1.40 is greater than the critical value of 1.28 you can reject the null hypothesis. Alternately, the standard normal table shows that the probability of Z being less than or equal to 1.40 is 0.919. Therefore, the significance level α indicated by the sample is 0.081, which is less than 0.10, so we can reject the null hypothesis and state with at least 90% confidence (or, more precisely, 91.9% confidence) that the mean floor area in this state is greater than 2,400 square feet. In statistics the Z value probability of 0.081 is referred to as a p-value, which is the probability of the x result being 1.40 standard deviations from 2,400, assuming the null hypothesis is true. In this example we rejected the null hypothesis based on what is called a one-tailed test. The null hypothesis contains a ≤ statement, so we need only be concerned with the right tail of the Z distribution to test the validity of the null hypothesis. Similarly, if the null hypothesis contained a ≥ statement, then we would only concern ourselves with the left tail of the Z distribution (also a one-tailed test). When the null hypothesis contains an = statement, it can be rejected at either tail of the Z distribution, which is referred to as a two-tailed test. As a practical matter this exercise, though quite simple in statistical terms, could be useful in assessing whether or not an old floor plan is significantly smaller than new store requirements in support of an assessment of functional obsolescence. Or it could support a highest and best use analysis of a pad site, adjusting requisite floor area ratio to the current trend in store size. Sample Size Once you decide to gather sample data for a statistical study, you are immediately confronted with the issue of how much data you need. The resolution of this issue can be simple or complex, depending on the situation. If you intend to study sample means or sample proportions, the calculation of sample size may be a straightforward result of selecting the accuracy you expect to achieve and plugging that information into a simple equation. If the study involves data collection by survey, the sample size will have to be adjusted for nonresponders and inappropriate responders. Of course you may not know how many of these you will encounter until you have completed the survey.8 7. Try calculating the critical value associated with 0.90 using the “=NORMINV” macro in Excel to derive a more precise critical value of 1.2816. 8. Perhaps the best reference for survey sampling design and maximizing response rate is by Don Dillman, who has written a series of books on the topic and is a highly regarded exper t. His latest book is Mail and Internet Sur veys: The Tailored Design Method (Hoboken, N.J.: John Wiley & Sons, 2007). If you are not anticipating conducting a Web-based survey, then one of his older books would be sufficient. Research Design, Hypothesis Testing, and Sampling The Appraisal Journal, Fall 2009 379 However, be aware that if you plan to employ a regression model the sample will have to be large enough to accommodate all of the variables you may need to include in the model. Unfortunately, you may not know how many variables are needed in the model in advance, which is a confounding issue. Sample Size for Estimating Means Suppose you want to estimate the mean rent for onebedroom apartments from a sample representative of all one-bedroom apartments in your city. Required sample size can be calculated once you make three decisions: 1. Level of confidence you require 2. Degree of accuracy you expect to achieve 3. An estimate of the standard deviation of onebedroom rents in the city The level of confidence you require is 1 – α. Therefore, this decision determines α, which is required to estimate sample size. The degree of accuracy you expect to achieve is stated in terms of units of measure. For instance, if you are estimating mean rent, the degree of accuracy is stated in dollars. Degree of accuracy is called sampling error, which is symbolized as e. The standard deviation (σ) of the variable being estimated will be unknown and must be estimated. Methods of estimating σ include referencing prior studies, conducting a small pilot study, or investigating the range of the variable of interest (the range will often be approximately 6 times σ for a normal distribution). The equation for estimating sample size required to estimate a population mean is n= Z 2s2 e2 Picking up the one-bedroom apartment rent example again, let’s assume you decide on a 95% confidence level, expect to be accurate within ±$10.00, and estimate the range of monthly rent for one-bedroom apartments in the market area at $120 ($650 to $770). Based on these factors, you select Z = 1.96 based on α = 0.05 and the standard normal distribution, e = 10 and σ = 20 ($120 ÷ 6). The required sample size is n= Z 2s2 1.962 • 202 = = 15.36 102 e2 Sample size calculations are generally rounded up, so you would want to draw a random sample of at 380 The Appraisal Journal, Fall 2009 least 16 one-bedroom apartment rents. Suppose you require more precision than an estimate of mean rent ±$10. For example, you may need more statistical power to compare mean rents for two types of one-bedroom apartment (say, those with and without a private balcony). Assume you need to decrease sampling error from $10 to $5 in order to have enough statistical power to detect the effect of private balconies. What does this requirement do to sample size? n= Z 2s2 1.962 • 202 = = 61.47 52 e2 Sample size essentially quadruples to 62. This emphasizes an important point: Increases in statistical power are “costly” when cost is stated in terms of sample size. Cutting sampling error in half quadruples sample size, and reducing sampling error to one-quarter of the amount illustrated in this example ($2.50) would increase sample size 16-fold. The relationship between sample size and sampling error is exponential due to the e 2 term in the denominator of the sample size equation. Sample Size for Estimating Proportions As Americans, we are accustomed to reading about proportion estimates at election time. The following was reported by Reuters on the eve of the January 8, 2008, New Hampshire presidential primary election: A Reuters/C-SPAN/Zogby poll showed Obama with a 10-point edge on Clinton in the state, 39 percent to 29 percent, as he gained a wave of momentum from his win in Iowa. The margin of error (e) for the statement above was reported elsewhere to have been ±4.4%. Assuming a significance level of 0.05, the poll taker was 95% confident that Obama’s proportion of the vote was between 34.6% and 43.4% and Clinton’s proportion was between 24.6% and 33.4%. Based on this information we can deduce that there were approximately 496 respondents, as we will see shortly. The equation for the sample size required to estimate a population proportion is n= Z 2p(p – 1) e2 where Z is the standard normal value associated Research Design, Hypothesis Testing, and Sampling with the confidence level, e is the margin of error, and p is an estimate of the population proportion. For most proportion estimates p is set at 0.50 because this proportion maximizes the value of p(1 – p), ensuring that the sample is sufficiently large regardless of the true population proportion. Returning to the New Hampshire presidential primary poll, we can apply the equation for sample size to deduce the number of respondents: n= Z 2(1 – p) 1.962 • 0.50(1 – 0.50) = = 496 0.442 e2 Now let’s look at a more practical problem for an appraiser. Suppose you want to estimate the proportion of recent in-migrants to a city opting for rental housing rather than home ownership during their first year of residency. Assuming you could obtain a list of recent in-migrants from which to draw a sample (e.g., from electric company records), you could determine the number of respondents you would need by deciding on a confidence level and the margin of error. If you set α = 0.05 and e = 2%, the number of randomly chosen responses you would need is: n= 1.962 • .50(1 – 0.50) = 2,401 0.022 Comparing the presidential primary poll to the housing tenure choice sample above, we can see that reductions in the margin of error (4.4% to 2%) dramatically increase sample size and related costs. Therefore, we must carefully consider the question of what is a sufficient margin of error and the associated amount time and money to devote to data gathering. Research Design, Hypothesis Testing, and Sampling Marvin L. Wolverton, PhD, MAI, is a practicing real property valuation theorist and consultant currently employed as a senior director in the national Dispute Analysis and Litigation Support practice at Cushman & Wakefield, where he engages in litigation consulting and expert witness services. Wolverton is also an emeritus professor, and the former Alvin Wolff Distinguished Professor of Real Estate, at Washington State University. He is a state-certified general appraiser and has been a member of the Appraisal Institute since 1985. Wolverton is a current member of the Appraisal Journal Review Panel. He has also served as editor of the Journal of Real Estate Practice and Education and on the editorial boards of the Journal of Real Estate Research and The Appraisal Journal. He has authored more than forty articles in refereed and professional journals, including the Journal of Real Estate Research, Real Estate Economics, Journal of Real Estate Finance and Economics, Assessment Journal, Journal of Real Estate Portfolio Management, Journal of Property Valuation and Investment, Journal of Property Research, and The Appraisal Journal. He has edited and written books and chapters of books on valuation theory and specialized appraisal topics, and he teaches appraisal courses on behalf of the Appraisal Institute. His formal education includes a bachelor of science in mining engineering from New Mexico Tech, a master of science in economics from Arizona State University, and a doctor of philosophy, specializing in real estate and decision science, from Georgia State University. Contact: marvin.wolverton@sbcglobal.net The Appraisal Journal, Fall 2009 381 Web Connections Internet resources suggested by the Lum Library Appraisal Institute Education Courses, “Real Estate Finance, Statistics, and Valuation Modeling” http://www.appraisalinstitute.org/education/state_cert.aspx American Statistical Association http://www.amstat.org Bureau of Labor Statistics http://www.bls.gov Commercial Real Estate Research, National Association of Realtors http://www.realtor.org/research/research/reportscommercial Data and Statistics—General Data Resources, U.S. General Services Administration Reference Center http://www.usa.gov/Topics/Reference_Shelf/Data.shtml DataQuick News http://www.dqnews.com/ FedStats http://www.fedstats.gov Field Guide to Quick Real Estate Statistics, National Association of Realtors Library http://www.realtor.org/library/library/fg006 Trends and Statistics—Real Estate, Internal Revenue Service http://www.irs.gov/businesses/small/industries/content/0,,id=99264,00.html U.S. Census Bureau http://www.census.gov The World Wide Web Virtual Library—Statistics, University of Florida, Department of Statistics http://www.stat.ufl.edu/vlib/statistics.html 382 The Appraisal Journal, Fall 2009 Research Design, Hypothesis Testing, and Sampling Copyright of Appraisal Journal is the property of Appraisal Institute and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder’s express written permission. However, users may print, download, or email articles for individual use. Tutorials in Biostatistics, Vol. 1: Statistical Methods in Clinical … Maiti, Tapabrata Journal of the American Statistical Association; Dec 2005; 100, 472; ProQuest Central pg. 1468 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. HOW-TO-DO-IT Using a Five-Step Procedure for Inferential Statistical Analyses LAWRENCE F. KAMIN 3,218 ABSTRACT Many sialiuics texts pose inferential statistical problems in a tJisjoinli’íí way. By using a simple five-step procedure as a template for statistical inference problems, the siudeni can solve problems in an iwganized fashion. Tlie problem and Us soíuíjíin wiîî thus be ll siand-by-Uself organic whole and a single unit of though and effort. The íiescriijcií i”iedure can be used for both parametric and nonparametnc inferential tests. The ixumple given is a chi-square goodness-oj-fit test of a genetics experiment involving a dihyhid cross in com that follows a 9:3:3:1 ratio. This experimental analysis i,s commonly done in introductory biology labs. Keywords: Five-itep procedure; statistical inference; chi-square; goodness-oj-fit; dihyhrid cross; introductory bioiogy. 7.815 whole. This procedure both formalizes and crystalizes student thinking. Another advantage of this five-step procedure is that it can be used for essentially all statistical inference tests – both parametric and nonparametric. I was taught this technique in a graduate-level course in statistics, and I have been using it ever since. O The Five General Steps in Hypothesis Testing Stepi Write down the null and alternative hypotheses in both symbols and words, using complete sentences. Inferential statistics is an indispensible tool for biological hypothesis Step 2 testing. Early iri their science education, students learn about the sciCalculate ihe test statistic to the appropriate number entific method and how inductive rather than of significant figures. deductive reasoning is used to make the logical [We] use statistics “to leap from particular experimenlal results to one Step 3 or more general conclusions. However, before compare the data with our (a) State the given a (probability of a Type 1 any conclusion can be reached, the experimenlal error). results must be tesied for statistical significance. ideas and theones, to see (b) Calculate the degrees of freedom, After all, there is a chance that any difference (c) Give the region of rejection both in synv between two or more experimental treatments or how good a match there is. bols and in a graph. tests is attributable to random events. Therefore, we use statistics “to compare the data with our Step 4 ideas and theories, lo see how good a match there is” (Hand, 2008: Draw a conclusion based on the calculated test statistic. p. 10). The five-step procedure presented here was designed to aid in this proces5. (a) If the test statistic is in the region of rejection (RR), reject the null hypothesis and state the conclusion in one or more complete Science teachers must lead students through a strange new statistical sentences. landscape that combines logic, jargon, and mathematical calculations such as variance, standard deviation, sum of squares, and calculated test (b) If the test statistic is not in RR, accept the null hypothesis and statistics. Concepts like Type I errors, one-tailed or two-tailed alternastate the conclusion in one or more complete sentences. tive hypotheses, and p value must be defined and related to specific examples. But even in excellent statistics and biostatistics texts, data are Step 5 given, a value for a (level of significance) is given, and then, typically, a Bracket the p value. •’Whal do you conclude?” question is asked. As an afterthought, usually Example a part B to the problem, students are asked to give the p value for their conclusion. This method of posing statistics problems has always struck A chi-square goodness-of-fit test is quite commonly used to check ihe me as disjointed. appropriateness of a proposed model that uses categorical data. One popular experiment involves checking to see if a cross involving com I believe that the following simple procedure allows ihe given plants results in the Mendelian dihybrid phenolypic ratio of 9 purple problem to be stated, viewed, and solved as a stand-by-itself organic The American ß/o/ogy Teacher, Vol. 72, No. 3, pages 186-188. ISSN 0002-7685, eleclronic ISSN 1938-1211. ©2010 by Nafional Association of Biology Teachers. All rights reserved. Request permission lo photocopy or reproduce article content at the Universily of California Press’s Rights and Permissions Web site at wujw.ucpressjournals.com/repríntinfo.asp. DOI: t0.1525/abt.2010.72.3.11 THE AMERICAN BIOLOGY TEACHER VOLUME 72, NO, 3. MARCH 2010 population mean. Regarding H^, for this example, one could state that the data do not fit tbe proposed model or simply thai H is false. Step 2 3.218 7.815 Figure 1 . The chi-square distribution showing the region of The “expected” counts arc calculated under the assumption that H is true. Thus, the expected count for purple smooth corn grains was calculated as 9/16 X 361 (total of all com grains). The chi-square statistic is simply the sum of the last column in the table given in Step 2, or S tObs – Exp)^ / Exp, For this example, it is 3.218, The chi-square statistic was calculated to the same number oí significant figures in the chi-square table. It is assumed that the instructor has informed students of the conditions for validity of this test, namely that (1) the data represent a random sample from a large population, (2) the data are whole (counting) numbers and not percentages or standardized scores, and (3) the expected count for each class is >5 (Samuels &r Witmer, 2003: chapter 10; Mendenhall et al„ 1990: pp. 665-666). Step 3 smooth to 3 purple wrinkled to 3 yellow smooth to 1 yellow wrinkled corn grains. The following example and data are from such an experiment from one of my botany iab groups. Step! H^: The data fit the model of 9 purple smooth to 3 purple wrinkled to 3 yellow smooth to 1 yellow wrinkled com grains, H,: H, is false. Step 2 Phenotypic Cla5s Observed Expected (Obs – Exp)2/ Exp Purple smooth 210 203,06 0,2372 Purple wrinkled 74 67,69 0,5882 Yellow smooth 55 67.69 2.3790 Yellow wrinkled 22 22.56 0,0139 361 361.00 3,2183 Totals; X^= 3,218 Step 3 TheprobabilityofaTypelcrror, a,musi be given as part of the problem, A Type 1 error is made when a true null hypothesis (H^) is rejected. The degtïes oí freedom (dD are calculated as k – 1 , where k is ihe number of data classes. The chi-square statistic (x^) has a domain of zero to infinity The region of rejection (RR) is obtained from a statistical table of chi-square values. Step 4 This is the imporiant “Decision Rule” of many statistics books. By plotting the x\-aic value of 3.218 on the graph in Step 3, one can see that 3,218 does not lie in the region of rejection (RR) but, raiher, lies in the region of acceptance; this means that the null hypothesis is accepted. Since an absolute tmih is not known, in the sense that the conclusion could be wrong, most statisticians prefer stating that there is insufTicient evidence to reject the null hypothesis. Failing to reject H^,, under the constraints of committing a Type 1 or Type 11 error, is a better decision than simply accepting it, even though the two choices appear to give a similar conclusion. At this point, depending on time and the level of the class, the instructor may wish to discuss Type 11 errors, A Type II error is made if a false null hypothesis is accepted (not rejected^ The probability of a Tyjx- II error (ß) can be calcitlated after the fact (Glover ¿r Mitchell, 2006: section 5.3; Schork & Remington, 2000: pp, 17*1-181), looked up in tables for some tests (Ponney &r Watkins, 2009: p, 853), or controlled for by calculating the sample size needed for a given ß value (Mendenhall ei al,, 1990: pp, 443-446), The instnaclor may also wish to explain why in most cases, a Type I error is more insidious than a Type II error and that most problems thus give the value for a without ever mentioning ß. (a) a = 0,05 Step 5 (b) df = 4 – 1 = 3 Most statistics books offer excellent explanations for the concept of “p value,” One of the best and simplest explanations I have found is: “The term”p-viAmis used to describe the probability that we would observe a value of the test statistic as extreme or more extreme than that actually observed, if the null hypothesis were true” (Hand, 2008: p, 88), In some statistics books, 0.20 is the largest value for p found in the chi-square table, in that case. Step 5 for this example would be written as p > 0,20, (c) RR = (7,815,=o) Step 4 x\alL = 3,218 does not lie in RR; therefore, I accept H^, (the null hypothesis) and conclude that the data fit the model proposed in H^ iihove. Step 5 0 30 < p < 0.40 O Comments Stepi i^or this example, no symbols were used in Step 1, although one could use, for example, p, = 9/16, p^ = 3/16, P3 = 3/16, and p^ = 1/16. In a test for means equality, the null hypothesis might be as follows: H^: ^j = p^; and H., might be p,, i= ^.^ or fx, < p.^ or p-, > i^^, where p refers to the AMERICAN BIOLOGY TEACHER O Discussion 1 he live-step procedure for general hy-pothesis testing given here allows students to follow a handy template or procedure for statistical inference tests. This procedure formalizes the approach to problem solving and forces the math and logic involved in such tests to form an organic whole. The five steps stand as a unified entity The problem is stated, a test statistic is calculated, a conclusion is reached based on a given value for a, and a confidence leve! is given as the last step (see Step 5 in the Comments section above). The problem and its solution thus stand as a single unit of thought and effort. USING A FIVE-STEP PROCEDURE FOR INFERENTIAL STATISTICAL ANALYSES Porlney, L.G, & Watkins, M.P, (2009), Foundations of Clinical Research: AppUcalions to Practice, 3rd Ed. Upper Saddle River. NJ: Prenlice Hall, References Samuels, M,L, & Witmer. J,A. (2003), Statistics for the Life Sciences. 3rd f d . Upper Glover, T,Si Mitchell, K, (2006), An Introduction to Biostatistlcs. Long Grove, IL: Waveland Press. Schork, M,A, 8> Remington. R,D, (2000). Stalisiics tailh Applications to the Biologi- Hand, DJ. (2008). Stalîstics: A Very Short Introduction. NY: Oxford University Press. Mendenhall, W,, Wackerly. D.D. & Scheaffer, R,L, (1990), Mathematical Statistics with Applications, -ilh Ed. Boston: PWS-Kent, Saddle River, NJ: Prentice Hall, cal and Healih Sciences, 3rd Ed. Upper Saddle River, NJ: Prentice HalL LAWRENCE F, KAMIN is Professor of Biological Sciences at Benedictine University, 1344 Yorkshire Drive, Carol Stream, IL 60188; e-mail: lkamin@ben,edu. The DNA Store DNA Items: Toys, models, neckties, art, earrings, mugs, music, stamps, balloons, coins, stamps, cards – if you can think of it – we have it. THE AMERICAN BIOLOGY TEACHER VOLUME 72, NO. 3, MARCH 2010 Copyright of American Biology Teacher is the property of National Association of Biology Teachers and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder’s express written permission. However, users may print, download, or email articles for individual use. ACS PUMS DATA DICTIONARY – 2005-2007 HOUSING December 6, 2012 RT 1 Record Type H .Housing Record or Group Quarters Unit SERIALNO 13 Housing unit/GQ person serial number 2005000000001..2007999999999 .Unique identifier DIVISION 1 Division 0 1 2 3 4 5 6 7 8 9 code .Puerto Rico .New England (Northeast region) .Middle Atlantic (Northeast region) .East North Central (Midwest region) .West North Central (Midwest region) .South Atlantic (South region) .East South Central (South region) .West South Central (South Region) .Mountain (West region) .Pacific (West region) PUMA 5 Public use microdata area code (PUMA) Designates area of 100,000 or more population. Use with ST for unique code. 00100..08200 77777 .combination of 01801, 01802, and 01905 in Louisiana REGION 1 Region code 1 .Northeast 2 .Midwest 3 .South 4 .West 9 .Puerto Rico ST 2 State Code 01 .Alabama/AL 02 .Alaska/AK 04 .Arizona/AZ 05 .Arkansas/AR 06 .California/CA 08 .Colorado/CO 09 .Connecticut/CT 10 .Delaware/DE 11 .District of Columbia/DC 12 .Florida/FL 13 .Georgia/GA 15 .Hawaii/HI 1 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 44 45 46 47 48 49 50 51 53 54 55 56 72 .Idaho/ID .Illinois/IL .Indiana/IN .Iowa/IA .Kansas/KS .Kentucky/KY .Louisiana/LA .Maine/ME .Maryland/MD .Massachusetts/MA .Michigan/MI .Minnesota/MN .Mississippi/MS .Missouri/MO .Montana/MT .Nebraska/NE .Nevada/NV .New Hampshire/NH .New Jersey/NJ .New Mexico/NM .New York/NY .North Carolina/NC .North Dakota/ND .Ohio/OH .Oklahoma/OK .Oregon/OR .Pennsylvania/PA .Rhode Island/RI .South Carolina/SC .South Dakota/SD .Tennessee/TN .Texas/TX .Utah/UT .Vermont/VT .Virginia/VA .Washington/WA .West Virginia/WV .Wisconsin/WI .Wyoming/WY .Puerto Rico/PR ADJHSG 7 Adjustment factor for housing dollar amounts (6 implied decimal places) 1062086 .2005 factor 1028369 .2006 factor 1000000 .2007 factor Note: The values of ADJHSG inflation-adjusts reported housing costs to 2007 dollars and applies to variables CONP, ELEP, FS, FULP, GASP, GRNTP, INSP, MHP, MRGP, SMOCP, RNTP, SMP, and WATP in the housing record. ADJHSG does not apply to AGS, TAXP, and VAL because they are categorical variables 2 that should not be inflation-adjusted. ADJINC 7 Adjustment factor for income and earnings dollar amounts (6 implied decimal places) 1082467 .2005 factor (1.019190 * 1.06208580) 1044488 .2006 factor (1.015675 * 1.02836879) 1016787 .2007 factor (1.016787 * 1.00000000) Note: The values of ADJINC inflation-adjusts reported income to 2007 dollars. ADJINC incorporates an adjustment that annualizes the different rolling reference periods for reported income (as done in the single-year data using the variable ADJUST) and an adjustment to inflation-adjust the annualized income to 2007 dollars. ADJINC applies to variables FINCP and HINCP in the housing record, and variables INTP, OIP, PAP, PERNP, PINCP, RETP, SEMP, SSIP, SSP, and WAGP in the person record. WGTP 4 Housing Weight 0001..9999 .Integer weight of housing unit NP 2 Number of person records following this housing record 00 .Vacant unit 01 .One person record (one person in household or .any person in group quarters) 02..20 .Number of person records (number of persons in .household) TYPE 1 Type of unit 1 .Housing unit 2 .Institutional group quarters 3 .Noninstitutional group quarters ACR 1 Lot size b 1 2 3 AGS .N/A (GQ/not a one-family house or mobile home) .House on less than one acre .House on one to less than ten acres .House on ten or more acres 1 Sales of Agriculture Products b .N/A (less than 1 acre/GQ/vacant/ .2 or more units in structure) 1 .None 2 .$ 1 – $ 999 3 .$ 1000 – $ 2499 4 .$ 2500 – $ 4999 5 .$ 5000 – $ 9999 6 .$10000+ 3 Note: No adjustment factor is applied to AGS. BDS 1 Bedrooms b 0 1 2 3 4 5 .N/A (GQ) .No bedrooms .1 Bedroom .2 Bedrooms .3 Bedrooms .4 Bedrooms .5 or more bedrooms BLD 2 Units in structure bb .N/A (GQ) 01 .Mobile home or trailer 02 .One-family house detached 03 .One-family house attached 04 .2 Apartments 05 .3-4 Apartments 06 .5-9 Apartments 07 .10-19 Apartments 08 .20-49 Apartments 09 .50 or more apartments 10 .Boat, RV, van, etc. BUS 1 Business b 1 2 CONP or medical office on property .N/A (GQ/not a one-family house or mobile home) .Yes .No 4 Condo fee (monthly amount) bbbb .N/A (not owned or being bought/not .condo/GQ/vacant/no condo fee) 0001..9999 .$1 – $9999 (Rounded and top-coded) Note: Use values from ADJHSG to adjust CONP to constant dollars. ELEP 3 Electricity (monthly cost) bbb .N/A (GQ/vacant) 001 .Included in rent or in condo fee 002 .No charge or electricity not used 003..999 .$3 to $999 (Rounded and top-coded) Note: Use values from ADJHSG to adjust ELEP to constant dollars. FS 5 Food stamp amount (yearly amount) bbbbb .N/A (vacant) 4 0 .None 1..99999 .$1 to $99999 (Rounded) Note: Use values from ADJHSG to adjust FS to constant dollars. FULP 4 Other fuel cost(yearly cost) bbbb .N/A (GQ/vacant) 0001 .Included in rent or in condo fee 0002 .No charge or these fuels not used 0003..9999 .$3 to $9999 (Rounded and top-coded) Note: Use values from ADJHSG to adjust FULP to constant dollars. GASP 3 Gas (monthly cost) bbb .N/A (GQ/vacant) 001 .Included in rent or in condo fee 002 .Included in electricity payment 003 .No charge or gas not used 004..999 .$4 to $999 (Rounded and top-coded) Note: Use values from ADJHSG to adjust GASP to constant dollars. HFL 1 House heating fuel b .N/A (GQ/vacant) 1 .Utility gas 2 .Bottled, tank, or LP gas 3 .Electricity 4 .Fuel oil, kerosene, etc. 5 .Coal or coke 6 .Wood 7 .Solar energy 8 .Other fuel 9 .No fuel used INSP 5 Fire/hazard/flood insurance (yearly amount) bbbbb .N/A (not owned or being bought/not a one .family house, mobile home, or .condo/GQ/vacant) 00000 .None 00001..10000 .$1 to $10000 (Rounded and top-coded) Note: Use values from ADJHSG to adjust INSP to constant dollars. KIT 1 Complete b 1 2 kitchen facilities .N/A (GQ) .Yes, has all three facilities .No 5 MHP 5 Mobile home costs (yearly amount) bbbbb .N/A (GQ/vacant/not owned or being bought/ .not mobile home/no costs) 00000..99999 .$0 to $99999 (Rounded and top-coded) Note: Use values from ADJHSG to adjust MHP to constant dollars. MRGI 1 Payment include fire/hazard/flood insurance b .N/A (GQ/vacant/not owned or being bought/ .Not a one family house, MHT or condo/not .mortgaged/no regular mortgage payment) 1 .Yes, insurance included in payment 2 .No, insurance paid separately or no insurance MRGP 5 Mortgage payment (monthly amount) bbbbb .N/A (not owned or being bought/not a one .family house, mobile home, or .condo/GQ/vacant) 00001..99999 .$1 to $99999 (Rounded and top-coded) Note: Use values from ADJHSG to adjust MRGP to constant dollars. MRGT 1 Payment includes real estate taxes b .N/A (GQ/vacant/not owned or being bought/not a .one family house or condo/not mortgaged/ .No regular mortgage payment) 1 .Yes, taxes included in payment 2 .No, taxes paid separately or taxes not required MRGX 1 Mortgage status b .N/A (not owned or being bought/not a one family .house, mobile home, or condo/GQ/vacant) 1 .Mortgage deed of trust, or similar debt 2 .Contract to purchase 3 .None PLM 1 Complete b 1 2 RMS plumbing facilities .N/A (GQ) .Yes, has all three facilities .No 1 Rooms b 1 2 3 .N/A (GQ) .1 Room .2 Rooms .3 Rooms 6 4 5 6 7 8 9 .4 .5 .6 .7 .8 .9 Rooms Rooms Rooms Rooms Rooms or more rooms RNTM 1 Meals included in rent b .N/A (GQ/not a rental unit/rental-NCR) 1 .Yes 2 .No RNTP 5 Monthly rent bbbbb .N/A (GQ/not a rental unit) 00001..99999 .$1 to $99999 (Rounded and top-coded) Note: Use values from ADJHSG to adjust RNTP to constant dollars. SMP 5 Second mortgage payment (monthly amount) bbbbb .N/A (GQ/vacant/condo/not owned or being .bought/not a one family house/not .mortgaged/ no second mortgage) 00001..99999 .$1 to $99999 (Rounded and top-coded) Note: Use values from ADJHSG to adjust SMP to constant dollars. TEL 1 Telephone in Unit b .N/A (GQ/vacant) 1 .Yes 2 .No TEN 1 Tenure b 1 2 3 4 VACS .N/A (GQ/vacant) .Owned with mortgage or loan .Owned free and clear .Rented for cash rent .No cash rent 1 Vacancy status b .N/A (occupied/GQ) 1 .For rent 2 .Rented, not occupied 3 .For sale only 4 .Sold, not occupied 5 .For seasonal/recreational/occasional use 6 .For migratory workers 7 .Other vacant 7 VAL 2 Property value bb .N/A (GQ/rental unit/vacant, not for sale only) 01 .Less than $ 10000 02 .$ 10000 – $ 14999 03 .$ 15000 – $ 19999 04 .$ 20000 – $ 24999 05 .$ 25000 – $ 29999 06 .$ 30000 – $ 34999 07 .$ 35000 – $ 39999 08 .$ 40000 – $ 49999 09 .$ 50000 – $ 59999 10 .$ 60000 – $ 69999 11 .$ 70000 – $ 79999 12 .$ 80000 – $ 89999 13 .$ 90000 – $ 99999 14 .$100000 – $124999 15 .$125000 – $149999 16 .$150000 – $174999 17 .$175000 – $199999 18 .$200000 – $249999 19 .$250000 – $299999 20 .$300000 – $399999 21 .$400000 – $499999 22 .$500000 – $749999 23 .$750000 – $999999 24 .$1000000+ Note: No adjustment factor is applied to VAL. VEH WATP 1 Vehicles b 0 1 2 3 4 5 6 (1 ton or less) available .N/A (GQ/vacant) .No vehicles .1 vehicle .2 vehicles .3 vehicles .4 vehicles .5 vehicles .6 or more vehicles 4 Water (yearly cost) bbbb .N/A (GQ/vacant) 0001 .Included in rent or in condo fee 0002 .No charge 0003..9999 .$3 to $9999 (Rounded and top-coded) Note: Use values from ADJHSG to adjust WATP to constant dollars. YBL 1 8 When structure first built b .N/A (GQ) 1 .2005 or later 2 .2000 to 2004 3 .1990 to 1999 4 .1980 to 1989 5 .1970 to 1979 6 .1960 to 1969 7 .1950 to 1959 8 .1940 to 1949 9 .1939 or earlier FES 1 Family type and employment status b .N/A (GQ/vacant/not a family) 1 .Married-couple family: Husband and wife in LF 2 .Married-couple family: Husband in labor force, wife .not in LF 3 .Married-couple family: Husband not in LF, .wife in LF 4 .Married-couple family: Neither husband nor wife in .LF 5 .Other family: Male householder, no wife present, in .LF 6 .Other family: Male householder, no wife present, .not in LF 7 .Other family: Female householder, no husband .present, in LF 8 .Other family: Female householder, no husband .present, not in LF FINCP 8 Family income (past 12 months) bbbbbbbb .N/A(GQ/vacant) 00000000 .No family income -59999..99999999 .Total family income in dollars (Components are .rounded) Note: Use values from ADJINC to adjust FINCP to constant dollars. FPARC 1 Family presence and age of related children b .N/A (GQ/vacant/not a family) 1 .With related children under 5 years only 2 .With related children 5 to 17 years only 3 .With related children under 5 years and 5 to 17 .years 4 .No related children GRNTP 4 Gross rent (monthly amount) bbbb .N/A (GQ/vacant, not rented for cash rent) 0001..9999 .$1 – $9999 (Components are rounded) 9 Note: Use values from ADJHSG to adjust GRNTP to constant dollars. GRPIP 3 Gross rent as a percentage of household income past 12 months bbb .N/A (GQ/vacant/not rented for cash .rent/owner occupied/no household income) 001..100 .1% to 100% 101 .101% or more HHL 1 Household language b .N/A (GQ/vacant) 1 .English only 2 .Spanish 3 .Other Indo-European language 4 .Asian or Pacific Island language 5 .Other language HHT 1 Household/family type b .N/A (GQ/vacant) 1 .Married-couple family household .Other family household: 2 .Male householder, no wife present 3 .Female householder, no husband present .Nonfamily household: .Male householder: 4 .Living alone 5 .Not living alone .Female householder: 6 .Living alone 7 .Not living alone HINCP 8 Household income (past 12 months) bbbbbbbb .N/A(GQ/vacant) 00000000 .No household income -59999..99999999 .Total household income in dollars (Components are .rounded) Note: Use values from ADJINC to adjust HINCP to constant dollars. HUGCL HUPAC 1 Flag to b 0 1 indicate grandchild living in housing unit .N/A (GQ/vacant) .HU does not contain grandchildren .HU does contain grandchildren 1 HH presence and age of children 10 b 1 2 3 4 .N/A (GQ/vacant) .With children under 6 years only .With children 6 to 17 years only .With children under 6 years and 6 to 17 years .No children HUPAOC 1 HH presence and age of own children b .N/A (GQ/vacant) 1 .Presence of own children under 6 years only 2 .Presence of own children 6 to 17 years only 3 .Presence of own children under 6 years and 6 to 17 years 4 .No own children present HUPARC 1 HH presence and age of related children b .N/A (GQ/vacant) 1 .Presence of related children under 6 years only 2 .Presence of related children 6 to 17 years only 3 .Presence of related children under 6 years and 6 to 17 years 4 .No related children present LNGI 1 Linguistic isolation b .N/A (GQ/vacant) 1 .Not linguistically isolated 2 .Linguistically isolated MV 1 When moved into this house or apartment b .N/A (GQ/vacant) 1 .12 months or less 2 .13 to 23 months 3 .2 to 4 years 4 .5 to 9 years 5 .10 to 19 years 6 .20 to 29 years 7 .30 years or more NOC 2 Number of own bb 00 01..19 NPF NPP children in household (unweighted) .N/A(GQ/vacant) .No own children .Number of own children in household 2 Number of persons in family (unweighted) bb .N/A (GQ/vacant/non-family household) 02..20 .Number of persons in family 1 Grandparent headed household with no parent present 11 b .N/A (GQ/vacant) 0 .Not a grandparent headed household with no parent present 1 .Grandparent headed household with no parent present NR NRC 1 Presence of nonrelative in household b .N/A (GQ/vacant) 0 .None 1 .1 or more nonrelatives 2 Number of related children in household (unweighted) bb .N/A (GQ/vacant) 00 .No related children 01..19 .Number of related children in household OCPIP 3 Selected monthly owner costs as a percentage of household income during the past 12 months bbb .N/A (not owned or being bought/not a one .family house, mobile home, or .condo/GQ/vacant/no HH income) 001..100 .1% to 100% 101 .101% or more PARTNER 1 Unmarried partner household b .N/A (GQ/vacant) 0 .No unmarried partner in household 1 .Male householder, male partner 2 .Male householder, female partner 3 .Female householder, female partner 4 .Female householder, male partner PSF R18 R60 1 Presence b 0 1 of subfamilies in Household .N/A (GQ/vacant) .No subfamilies .1 or more subfamilies 1 Presence b 0 1 of persons under 18 years in household (unweighted) .N/A (GQ/vacant) .No person under 18 in household .1 or more persons under 18 in household 1 Presence b 0 1 2 of persons 60 years and over in household (unweighted) .N/A (GQ/vacant) .No person 60 and over .1 person 60 and over .2 or more persons 60 and over 12 R65 1 Presence b 0 1 2 of persons 65 years and over in household (unweighted) .N/A (GQ/vacant) .No person 65 and over .1 person 65 and over .2 or more persons 65 and over RESMODE 1 Response mode b .N/A (GQ) 1 .Mail 2 .CATI/CAPI SMOCP 5 Selected monthly owner costs bbbbb .N/A (not owned or being bought/not a one .family house, mobile home, or .condo/GQ/vacant/no costs ) 00000 .No costs 00001..99999 .$1 – $99999 (Components are rounded) Note: Use values from ADJHSG to adjust SMOCP to constant dollars. SMX 1 Second mortgage or home equity loan status b .N/A (GQ/vacant/not owned or being bought/ .not a one family house, mobile home, trailer or .condo/not mortgaged/no second mortgage) 1 .Yes, a second mortgage 2 .Yes, a home equity loan 3 .No 4 .Both a second mortgage and a home equity loan SRNT 1 Specified rent unit 0 .Not specified rent unit 1 .Specified rent unit SVAL 1 Specified value unit 0 .Not specified owner unit 1 .Specified value unit TAXP 2 Property taxes (yearly amount) bb .N/A (GQ/vacant/not owned or being bought/not a .one-family house, mobile home or trailer or .condo) 01 .None 02 .$ 1 – $ 49 03 .$ 50 – $ 99 13 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 .$ 100 .$ 150 .$ 200 .$ 250 .$ 300 .$ 350 .$ 400 .$ 450 .$ 500 .$ 550 .$ 600 .$ 650 .$ 700 .$ 750 .$ 800 .$ 850 .$ 900 .$ 950 .$1000 .$1100 .$1200 .$1300 .$1400 .$1500 .$1600 .$1700 .$1800 .$1900 .$2000 .$2100 .$2200 .$2300 .$2400 .$2500 .$2600 .$2700 .$2800 .$2900 .$3000 .$3100 .$3200 .$3300 .$3400 .$3500 .$3600 .$3700 .$3800 .$3900 .$4000 .$4100 .$4200 .$4300 .$4400 – $ 149 $ 199 $ 249 $ 299 $ 349 $ 399 $ 449 $ 499 $ 549 $ 599 $ 649 $ 699 $ 749 $ 799 $ 849 $ 899 $ 949 $ 999 $1099 $1199 $1299 $1399 $1499 $1599 $1699 $1799 $1899 $1999 $2099 $2199 $2299 $2399 $2499 $2599 $2699 $2799 $2899 $2999 $3099 $3199 $3299 $3399 $3499 $3599 $3699 $3799 $3899 $3999 $4099 $4199 $4299 $4399 $4499 14 57 58 59 60 61 62 63 64 65 66 67 68 .$4500 .$4600 .$4700 .$4800 .$4900 .$5000 .$5500 .$6000 .$7000 .$8000 .$9000 .$10000+ $4599 $4699 $4799 $4899 $4999 $5499 $5999 $6999 $7999 $8999 $9999 Note: No adjustment factor is applied to TAXP. WIF 1 Workers in family during the past 12 months b .N/A (GQ/vacant/non-family household) 0 .No workers 1 .1 worker 2 .2 workers 3 .3 or more workers in family WKEXREL 2 Work experience of householder and spouse b .N/A (GQ/vacant/not a family) 1 .Householder and spouse worked FT 2 .Householder worked FT; spouse worked < FT 3 .Householder worked FT; spouse did not work 4 .Householder worked < FT; spouse worked FT 5 .Householder worked < FT; spouse worked < FT 6 .Householder worked < FT; spouse did not work 7 .Householder did not work; spouse worked FT 8 .Householder did not work; spouse worked < FT 9 .Householder did not work; spouse did not work 10 .Male householder worked FT; no spouse present 11 .Male householder worked < FT; no spouse present 12 .Male householder did not work; no spouse present 13 .Female householder worked FT; no spouse present 14 .Female householder worked < FT; no spouse present 15 .Female householder did not work; no spouse present WORKSTAT 2 Work status of householder or spouse in family households bb .N/A (GQ/not a family household) 1 .Husband and wife both in labor force, both employed or in .Armed Forces 2 .Husband and wife both in labor force, husband employed or in .Armed Forces, wife unemployed 3 .Husband in labor force and wife not in labor force, husband .employed or in Armed Forces 4 .Husband and wife both in labor force, husband unemployed, wife 15 .employed or in Armed Forces 5 .Husband and wife both in labor force, husband unemployed, wife .unemployed 6 .Husband in labor force, husband unemployed, wife not in labor .force 7 .Husband not in labor force, wife in labor force, wife .employed or in Armed Forces 8 .Husband not in labor force, wife in labor force, wife .unemployed 9 .Neither husband nor wife in labor force 10 .Male householder with no wife present, householder in .labor force, employed or in Armed Forces 11 .Male householder with no wife present, householder in .labor force and unemployed 12 .Male householder with no wife present, householder not in .labor force 13 .Female householder with no husband present, householder in .labor force, employed or in Armed Forces 14 .Female householder with no husband present, householder in .labor force and unemployed 15 .Female householder with no husband present, householder not in .labor force FACRP 1 Lot size allocation 0 .No 1 .Yes FAGSP 1 Sales of Agricultural Products allocation 0 .No 1 .Yes FBDSP 1 Number of bedrooms allocation 0 .No 1 .Yes FBLDP 1 Units in structure allocation 0 .No 1 .Yes FBUSP 1 Business or medical office on property allocation 0 .No 1 .Yes FCONP 1 Condominium fee allocation 0 .No 1 .Yes 16 FELEP 1 Electricity (monthly cost) allocation 0 .No 1 .Yes FFSP 1 Food stamp amount (yearly amount) allocation 0 .No 1 .Yes FFULP 1 House heating fuel (yearly cost) allocation 0 .No 1 .Yes FGASP 1 Gas (monthly cost) allocation 0 .No 1 .Yes FHFLP 1 House heating fuel allocation 0 .No 1 .Yes FINSP 1 Fire, hazard, flood insurance allocation 0 .No 1 .Yes FKITP 1 Complete kitchen facilities allocation 0 .No 1 .Yes FMHP 1 Mobile home costs allocation 0 .No 1 .Yes FMRGIP 1 Payment include fire, hazard, flood insurance allocation 0 .No 1 .Yes FMRGP 1 Regular mortgage payment allocation 0 .No 1 .Yes FMRGTP 1 Payment include real estate taxes allocation 17 0 .No 1 .Yes FMRGXP 1 Mortgage status allocation 0 .No 1 .Yes FMVP 1 When moved into this house or apartment allocation 0 .No 1 .Yes FPLMP 1 Complete plumbing facilities allocation 0 .No 1 .Yes FRMSP 1 Rooms allocation 0 .No 1 .Yes FRNTMP 1 Meals included in rent allocation 0 .No 1 .Yes FRNTP 1 Monthly rent allocation 0 .No 1 .Yes FSMP 1 Second mortgage payment allocation 0 .No 1 .Yes FSMXHP 1 Home equity loan status allocation 0 .No 1 .Yes FSMXSP 1 Second mortgage status allocation 0 .No 1 .Yes FTAXP 1 Taxes on property allocation 0 .No 1 .Yes 18 FTELP 1 Telephones in house allocation 0 .No 1 .Yes FTENP 1 Tenure allocation 0 .No 1 .Yes FVACSP 1 Vacancy status allocation 0 .No 1 .Yes FVALP 1 Value allocation 0 .No 1 .Yes FVEHP 1 Vehicles available allocation 0 .No 1 .Yes FWATP 1 Water (yearly cost) allocation 0 .No 1 .Yes FYBLP 1 When structure first built allocation 0 .No 1 .Yes WGTP1 4 Housing Weight replicate 1 -9999..9999 .Integer weight of housing unit WGTP2 4 Housing Weight replicate 2 -9999..9999 .Integer weight of housing unit WGTP3 4 Housing Weight replicate 3 -9999..9999 .Integer weight of housing unit WGTP4 4 Housing Weight replicate 4 -9999..9999 .Integer weight of housing unit WGTP5 4 Housing Weight replicate 5 19 -9999..9999 .Integer weight of housing unit WGTP6 4 Housing Weight replicate 6 -9999..9999 .Integer weight of housing unit WGTP7 4 Housing Weight replicate 7 -9999..9999 .Integer weight of housing unit WGTP8 4 Housing Weight replicate 8 -9999..9999 .Integer weight of housing unit WGTP9 4 Housing Weight replicate 9 -9999..9999 .Integer weight of housing unit WGTP10 4 Housing Weight replicate 10 -9999..9999 .Integer weight of housing unit WGTP11 4 Housing Weight replicate 11 -9999..9999 .Integer weight of housing unit WGTP12 4 Housing Weight replicate 12 -9999..9999 .Integer weight of housing unit WGTP13 4 Housing Weight replicate 13 -9999..9999 .Integer weight of housing unit WGTP14 4 Housing Weight replicate 14 -9999..9999 .Integer weight of housing unit WGTP15 4 Housing Weight replicate 15 -9999..9999 .Integer weight of housing unit WGTP16 4 Housing Weight replicate 16 -9999..9999 .Integer weight of housing unit WGTP17 4 Housing Weight replicate 17 -9999..9999 .Integer weight of housing unit WGTP18 4 Housing Weight replicate 18 -9999..9999 .Integer weight of housing unit 20 WGTP19 4 Housing Weight replicate 19 -9999..9999 .Integer weight of housing unit WGTP20 4 Housing Weight replicate 20 -9999..9999 .Integer weight of housing unit WGTP21 4 Housing Weight replicate 21 -9999..9999 .Integer weight of housing unit WGTP22 4 Housing Weight replicate 22 -9999..9999 .Integer weight of housing unit WGTP23 4 Housing Weight replicate 23 -9999..9999 .Integer weight of housing unit WGTP24 4 Housing Weight replicate 24 -9999..9999 .Integer weight of housing unit WGTP25 4 Housing Weight replicate 25 -9999..9999 .Integer weight of housing unit WGTP26 4 Housing Weight replicate 26 -9999..9999 .Integer weight of housing unit WGTP27 4 Housing Weight replicate 27 -9999..9999 .Integer weight of housing unit WGTP28 4 Housing Weight replicate 28 -9999..9999 .Integer weight of housing unit WGTP29 4 Housing Weight replicate 29 -9999..9999 .Integer weight of housing unit WGTP30 4 Housing Weight replicate 30 -9999..9999 .Integer weight of housing unit WGTP31 4 Housing Weight replicate 31 -9999..9999 .Integer weight of housing unit 21 WGTP32 4 Housing Weight replicate 32 -9999..9999 .Integer weight of housing unit WGTP33 4 Housing Weight replicate 33 -9999..9999 .Integer weight of housing unit WGTP34 4 Housing Weight replicate 34 -9999..9999 .Integer weight of housing unit WGTP35 4 Housing Weight replicate 35 -9999..9999 .Integer weight of housing unit WGTP36 4 Housing Weight replicate 36 -9999..9999 .Integer weight of housing unit WGTP37 4 Housing Weight replicate 37 -9999..9999 .Integer weight of housing unit WGTP38 4 Housing Weight replicate 38 -9999..9999 .Integer weight of housing unit WGTP39 4 Housing Weight replicate 39 -9999..9999 .Integer weight of housing unit WGTP40 4 Housing Weight replicate 40 -9999..9999 .Integer weight of housing unit WGTP41 4 Housing Weight replicate 41 -9999..9999 .Integer weight of housing unit WGTP42 4 Housing Weight replicate 42 -9999..9999 .Integer weight of housing unit WGTP43 4 Housing Weight replicate 43 -9999..9999 .Integer weight of housing unit WGTP44 4 Housing Weight replicate 44 -9999..9999 .Integer weight of housing unit WGTP45 4 22 Housing Weight replicate 45 -9999..9999 .Integer weight of housing unit WGTP46 4 Housing Weight replicate 46 -9999..9999 .Integer weight of housing unit WGTP47 4 Housing Weight replicate 47 -9999..9999 .Integer weight of housing unit WGTP48 4 Housing Weight replicate 48 -9999..9999 .Integer weight of housing unit WGTP49 4 Housing Weight replicate 49 -9999..9999 .Integer weight of housing unit WGTP50 4 Housing Weight replicate 50 -9999..9999 .Integer weight of housing unit WGTP51 4 Housing Weight replicate 51 -9999..9999 .Integer weight of housing unit WGTP52 4 Housing Weight replicate 52 -9999..9999 .Integer weight of housing unit WGTP53 4 Housing Weight replicate 53 -9999..9999 .Integer weight of housing unit WGTP54 4 Housing Weight replicate 54 -9999..9999 .Integer weight of housing unit WGTP55 4 Housing Weight replicate 55 -9999..9999 .Integer weight of housing unit WGTP56 4 Housing Weight replicate 56 -9999..9999 .Integer weight of housing unit WGTP57 4 Housing Weight replicate 57 -9999..9999 .Integer weight of housing unit WGTP58 4 Housing Weight replicate 58 23 -9999..9999 .Integer weight of housing unit WGTP59 4 Housing Weight replicate 59 -9999..9999 .Integer weight of housing unit WGTP60 4 Housing Weight replicate 60 -9999..9999 .Integer weight of housing unit WGTP61 4 Housing Weight replicate 61 -9999..9999 .Integer weight of housing unit WGTP62 4 Housing Weight replicate 62 -9999..9999 .Integer weight of housing unit WGTP63 4 Housing Weight replicate 63 -9999..9999 .Integer weight of housing unit WGTP64 4 Housing Weight replicate 64 -9999..9999 .Integer weight of housing unit WGTP65 4 Housing Weight replicate 65 -9999..9999 .Integer weight of housing unit WGTP66 4 Housing Weight replicate 66 -9999..9999 .Integer weight of housing unit WGTP67 4 Housing Weight replicate 67 -9999..9999 .Integer weight of housing unit WGTP68 4 Housing Weight replicate 68 -9999..9999 .Integer weight of housing unit WGTP69 4 Housing Weight replicate 69 -9999..9999 .Integer weight of housing unit WGTP70 4 Housing Weight replicate 70 -9999..9999 .Integer weight of housing unit WGTP71 4 Housing Weight replicate 71 -9999..9999 .Integer weight of housing unit 24 WGTP72 4 Housing Weight replicate 72 -9999..9999 .Integer weight of housing unit WGTP73 4 Housing Weight replicate 73 -9999..9999 .Integer weight of housing unit WGTP74 4 Housing Weight replicate 74 -9999..9999 .Integer weight of housing unit WGTP75 4 Housing Weight replicate 75 -9999..9999 .Integer weight of housing unit WGTP76 4 Housing Weight replicate 76 -9999..9999 .Integer weight of housing unit WGTP77 4 Housing Weight replicate 77 -9999..9999 .Integer weight of housing unit WGTP78 4 Housing Weight replicate 78 -9999..9999 .Integer weight of housing unit WGTP79 4 Housing Weight replicate 79 -9999..9999 .Integer weight of housing unit WGTP80 4 Housing Weight replicate 80 -9999..9999 .Integer weight of housing unit DATA DICTIONARY – 2005-2007 POPULATION RT 1 Record Type P .Person Record SERIALNO 13 Housing unit/GQ person serial number 200500000001..200799999999 .Unique identifier SPORDER 2 25 Person number 01..20 .Person number PUMA 5 Public use microdata area code (PUMA) Designates area of 100,000 or more population. Use with ST for unique code. 00100..08200 77777 .combination of 01801, 01802, and 01905 in Louisiana ST 2 State Code 01 .Alabama/AL 02 .Alaska/AK 04 .Arizona/AZ 05 .Arkansas/AR 06 .California/CA 08 .Colorado/CO 09 .Connecticut/CT 10 .Delaware/DE 11 .District of Columbia/DC 12 .Florida/FL 13 .Georgia/GA 15 .Hawaii/HI 16 .Idaho/ID 17 .Illinois/IL 18 .Indiana/IN 19 .Iowa/IA 20 .Kansas/KS 21 .Kentucky/KY 22 .Louisiana/LA 23 .Maine/ME 24 .Maryland/MD 25 .Massachusetts/MA 26 .Michigan/MI 27 .Minnesota/MN 28 .Mississippi/MS 29 .Missouri/MO 30 .Montana/MT 31 .Nebraska/NE 32 .Nevada/NV 33 .New Hampshire/NH 34 .New Jersey/NJ 35 .New Mexico/NM 36 .New York/NY 37 .North Carolina/NC 38 .North Dakota/ND 39 .Ohio/OH 40 .Oklahoma/OK 41 .Oregon/OR 42 .Pennsylvania/PA 44 .Rhode Island/RI 45 .South Carolina/SC 26 46 47 48 49 50 51 53 54 55 56 72 .South Dakota/SD .Tennessee/TN .Texas/TX .Utah/UT .Vermont/VT .Virginia/VA .Washington/WA .West Virginia/WV .Wisconsin/WI .Wyoming/WY .Puerto Rico/PR ADJINC 7 Adjustment factor for income and earnings dollar amounts (6 implied decimal places) 1082467 .2005 factor (1.019190 * 1.06208580) 1044488 .2006 factor (1.015675 * 1.02836879) 1016787 .2007 factor (1.016787 * 1.00000000) Note: The values of ADJINC inflation-adjusts reported income to 2007 dollars. ADJINC incorporates an adjustment that annualizes the different rolling reference periods for reported income (as done in the single-year data using the variable ADJUST) and an adjustment to inflation-adjust the annualized income to 2007 dollars. ADJINC applies to variables FINCP and HINCP in the housing record, and variables INTP, OIP, PAP, PERNP, PINCP, RETP, SEMP, SSIP, SSP, and WAGP in the person record. PWGTP 4 Person’s weight 0001..9999 .Integer weight of person AGEP Age 2 00 .Under 1 year 01..99 .1 to 99 years (Top-coded***) CIT 1 Citizenship status 1 .Born in the U.S. .Born in the U.S., Guam, the U.S. Virgin Islands, or the Northern .Marianas if current residence is Puerto Rico 2 .Born in Puerto Rico, Guam, the U.S. Virgin Islands, .or the Northern Marianas .Born in Puerto Rico if current residence is Puerto Rico 3 .Born abroad of American parent(s) 4 .U.S. citizen by naturalization 5 .Not a citizen of the U.S. COW 1 Class of worker b .N/A (less than 16 years old/unemployed who 27 1 2 3 4 5 6 7 8 9 .never worked/NILF who last worked more than 5 years .ago) .Employee of a private for-profit company or .business, or of an individual, for wages, .salary, or commissions .Employee of a private not-for-profit, .tax-exempt, or charitable organization .Local government employee (city, county, etc.) .State government employee .Federal government employee .Self-employed in own not incorporated .business, professional practice, or farm .Self-employed in own incorporated .business, professional practice or farm .Working without pay in family business or farm .Unemployed DDRS 1 Difficulty dressing b .N/A (Less than 5 years old) 1 .Yes 2 .No DEYE 1 Vision or hearing difficulty b .N/A (Less than 5 years old) 1 .Yes 2 .No DOUT 1 Difficulty going out b .N/A (Less than 16 years old) 1 .Yes 2 .No DPHY 1 Physical b 1 2 difficulty .N/A (Less than 5 years old) .Yes .No DREM 1 Difficulty remembering b .N/A (Less than 5 years old) 1 .Yes 2 .No DWRK 1 Difficulty working b .N/A (Less than 16 years old) 1 .Yes 2 .No 28 ENG 1 Ability to speak English b .N/A (less than 5 years old/speaks only English) 1 .Very well 2 .Well 3 .Not well 4 .Not at all FER 1 Child born within the past 12 months b .N/A (less than 15 years/greater than 50 years/ .male) 1 .Yes 2 .No GCL 1 Grandchildren living in this house b .N/A (less than 30 years/institutional GQ) 1 .Yes 2 .No GCM 1 Months responsible for grandchildren b .N/A (less than 30 years/grandparent not responsible for .grandchild/institutional GQ) 1 .Less than 6 months 2 .6 to 11 months 3 .1 to 2 years 4 .3 to 4 years 5 .5 or more years GCR 1 Responsible for grandchildren b .N/A (less than 30 years/grandchild not living in house/institutional GQ) 1 .Yes 2 .No INTP 6 Interest, dividends, and net rental income past 12 months (signed) bbbbbb .N/A (less than 15 years old) 000000 .None -09999 .Loss of $9999 or more (Rounded and bottom-coded) -00001..-09998 .Loss $1 to $9998 (Rounded) 000001 .$1 or break even 000002..999999 .$2 to $999999 (Rounded and top-coded) Note: Use values from ADJINC to adjust INTP to constant dollars. JWMNP 3 Travel time to work bbb .N/A (not a worker or worker who worked at 29 .home) 001..200 .1 to 200 minutes to get to work (Top-coded) JWRIP 2 Vehicle occupancy bb .N/A (not a worker or worker whose means of .transportation to work was not car, truck, .or van) 01 .Drove alone 02 .In 2-person carpool 03 .In 3-person carpool 04 .In 4-person carpool 05 .In 5-person carpool 06 .In 6-person carpool 07 .In 7-person carpool 08 .In 8-person carpool 09 .In 9-person carpool 10 .In 10-person or more carpool JWTR 2 Means of transportation to work bb .N/A (not a worker–not in the labor force, .including persons under 16 years; unemployed; .employed, with a job but not at work; Armed .Forces, with a job but not at work) 01 .Car, truck, or van 02 .Bus or trolley bus 03 .Streetcar or trolley car (carro publico in Puerto Rico) 04 .Subway or elevated 05 .Railroad 06 .Ferryboat 07 .Taxicab 08 .Motorcycle 09 .Bicycle 10 .Walked 11 .Worked at home 12 .Other method LANX 1 Language b 1 2 MAR MIG other than English spoken at home .N/A (less than 5 years old) .Yes, speaks another language .No, speaks only English 1 Marital status 1 .Married 2 .Widowed 3 .Divorced 4 .Separated 5 .Never married or under 15 years old 1 30 Mobility b 1 2 status (lived here 1 year ago) .N/A(less than 1 year old) .Yes, same house (nonmovers) .No, outside US if current residence is US; .No, outside Puerto Rico and US if current residence is .Puerto Rico 3 .No, different house in US if current residence is US; .No, different house in Puerto Rico is current residence is .Puerto Rico MIL 1 Military b 1 2 3 service .N/A (less than 17 years old) .Yes, now on active duty .Yes, on active duty during the last 12 months, but not now .Yes, on active duty in the past, but not during the last 12 .months 4 .No, training for Reserves/National Guard only 5 .No, never served in the military MILY 1 Years of active duty military service b .N/A (less than 17 years/no active duty .military service) 1 .Less than 2 years of service 2 .2 years or more of service MLPA 1 Served September 2001 or later b .N/A (Less than 17 years old/no active duty) 0 .Did not serve this period 1 .Served this period MLPB 1 Served August 1990 – August 2001 (including Persian Gulf War) b .N/A (Less than 17 years old/no active duty) 0 .Did not serve this period 1 .Served this period MLPC 1 Served September 1980 – July 1990 b .N/A (Less than 17 years old/no active duty) 0 .Did not serve …
Purchase answer to see full attachment

WU SPSS Descriptive and Inferential Analyses Data Analysis Plan Paper