- Education and Science
Data Analysis & Interpretation
Once necessary information has been collected through observation or survey or experiment, the following steps are to be taken.
1 Editing the data
Information may have been noted in haste and now required to be deciphered. Data should be edited before being presented as information to ensure that figures or words are accurate. Editing can be done manually or with computer or both depending upon the medium, whether paper or electronic.
The editing is done on two levels- micro and macro. In micro-editing, the basic records are corrected. Usually, all records are securitized one by one for apparent mistakes. The intent is to determine consistency of the data. For example, at one place the distances may be in miles while in another place these may be in km. Or there may be obvious mistake like showing a distance of 100 km where it should be only 10 km or less.
On macro level, aggregates are compared with data from other surveys or files or earlier versions of the same data. This is done to determine compatibility. For example, one survey has estimated total number of residents in a sector at 2,000. In another survey of family size, the total number of residents workout to be 2,500. Obviously, one of the estimates is wrong. In case, the figure of 2,000 was considered correct because of the double-check, the second would have to be reviewed for mistakes in totaling or multiplication.
Several types of data edits are available. In validity edits, it is ensured that specified units of measures (like kgs, liters or sq. Meters) are written. In range edit, one would observe that the values are within pre-established or common sense limits. Similarly, there are edits for duplications, consistency and history.
On the other hand, there are data errors such as (i) unasked questions, (ii) unrecorded answers and (iii) inappropriate responses.
Sometimes, a researcher is confronted with an exceptional but true figure like a very unusual temperature of 90 F (34.4 C). This is “unrepresentative” or “outlying” observations in a data set. What should we do about the “outliers” in a sample? “Should such data be deleted?” is for the researcher to decide.
2 Handling blank response
If more than 25% is left blank, discard the questionnaire. In other cases, only delete the particular question. Sometime, a mid-point value is assigned so as not to distort the average.
In data collection process, the final phase is quantification of the qualitative data. It is transformation of answers into a format that computer can understand.
Alpha-numeric codes are used to covert responses such as good and bad, poor or strong. Data are transferred, if necessary, to coding sheets, responses to negative worded questions are reversed to conform to the same direction.
Coding is a “systematic way in which to condense extensive data sets into smaller analyzable units through the creation of categories and concepts derived from the data.”
It is the “process by which verbal data are converted into variables and categories of variables using numbers, so that the data can be entered into computers for analysis.”
CODING THE DATAClick thumbnail to view full-size
To categorize data, the researcher puts them into categories or classes or segments which are mutually exclusive. Examples are gender, age, religion etc. Nominal scales are used for this purpose.
Category will be determined by the query. Is it about income levels, is it about education or customs or habits? The categories are accordingly drawn and item listed in a table.
5 Entering Data
Technology had made life easy. Data can be collected on scanner answer sheet which enable a researcher to enter them directly into computer file. In other cases, raw data would be manually entered into computer as data file. Here some software like SPSS data editor can be used to enter, edit and view the contents. It is easy to add, change or delete values after the data has been entered.
There are three objectives of the data analysis:
- Getting a feel of the data,
- validity and reliability and
testing the hypotheses of the investigation.
1. Feel of the Data
Lists or statements are summarized to get a feel of the data. Descriptive statistics helps reduce the large data into meaning full indicators showing central tendencies and spread. Three measures of central tendency are commonly used in statistical analysis – the mode, the medium and the mean. Each measure is designed to represent a typical score. The choice depends upon the shape of the distribution (whether normal or skewed) and the variable’s level of measurements (nominal, ordinal or interval)
Averages etc do not tell us everything. At times, it could be misleading. Income per capital of Brunei is US$ 53,1000. One tends to feel that there would be hardly any poor person. But there are people below poverty level even in such a rich country. Such information is disclosed by dispersion or deviation or spread. These measures inform us how wealth is distributed in the country. There are super rich and very poor people which is indicated by the spread or standard deviation.
FEEL OF THE DATAClick thumbnail to view full-size
2. Reliability & Validity
The data should be both reliable and valid. While reliability shows trust-worthiness and dependability, validity shows appropriateness or authenticity or suitability or genuineness.
SAMPLE PROBLEM USING A SAMPLING DISTRIBUTION
3. Hypotheses Testing
Once cleared of any doubt as to reliability and validity, the researcher can go ahead in testing the hypotheses already formed for the report.
A car manufacturing company plans to test a new engine in order to find out whether it meets new standards of air pollution. The average should be less than 20 parts per million of carbon. Ten engines were picked up for test and after determining their emission levels, the average was found to be 17.17. Apparently, it is far lower than 20 but it is based on a small sample and average emission of total engines produced may be higher.
The General Manager wants to find out if the engines meet the pollution standards, This should evaluated at 99% confidence or with 1% chance of error.
NULL HYPOTHESIS: The engine does not meet the requirement, the average being 20
ALTERNATE: The engines meet the requirement, average is less than 20
REJECTION REGION: For ∂ =0.01 and d/f (degree of freedom) = n -1 =9, the one tail rejection region is t < - t = 2.821
Using a standard formula ( as shown on the side), the General Manager found the t-value to be -3.00 which far exceeds the t - value -2.821, as per table, and hence rejected the Null Hypothesis which means that the engines do meet the requirements. (under these conditions, the chance of being wrong are one in one thousand which is very low probability).
Once necessary data has been collected through surveys or experiment, it should be edited to ensure that only correct data is used. Next, it would be coded and categorized and entered in the tables or computers.
Based on the query or question, hypothesis should be developed and tested using the appropriate and reliable measures.
The results should be interpreted and a decision taken to solve the problem.