Data Analysis & Interpretation

Flow diagram of Data Analysis Process
Flow diagram of Data Analysis Process

Once necessary information has been collected through observation or survey or experiment, the following steps are to be taken.

1 Editing the data

Information may have been noted in haste and now required to be deciphered. Data should be edited before being presented as information to ensure that figures or words are accurate. Editing can be done manually or with computer or both depending upon the medium, whether paper or electronic.

The editing is done on two levels- micro and macro. In micro-editing, the basic records are corrected. Usually, all records are securitized one by one for apparent mistakes. The intent is to determine consistency of the data. For example, at one place the distances may be in miles while in another place these may be in km. Or there may be obvious mistake like showing a distance of 100 km where it should be only 10 km or less.

On macro level, aggregates are compared with data from other surveys or files or earlier versions of the same data. This is done to determine compatibility. For example, one survey has estimated total number of residents in a sector at 2,000. In another survey of family size, the total number of residents workout to be 2,500. Obviously, one of the estimates is wrong. In case, the figure of 2,000 was considered correct because of the double-check, the second would have to be reviewed for mistakes in totaling or multiplication.

Several types of data edits are available. In validity edits, it is ensured that specified units of measures (like kgs, liters or sq. Meters) are written. In range edit, one would observe that the values are within pre-established or common sense limits. Similarly, there are edits for duplications, consistency and history.

On the other hand, there are data errors such as (i) unasked questions, (ii) unrecorded answers and (iii) inappropriate responses.

Sometimes, a researcher is confronted with an exceptional but true figure like a very unusual temperature of 90 F (34.4 C). This is “unrepresentative” or “outlying” observations in a data set. What should we do about the “outliers” in a sample? “Should such data be deleted?” is for the researcher to decide.

2 Handling blank response

If more than 25% is left blank, discard the questionnaire. In other cases, only delete the particular question. Sometime, a mid-point value is assigned so as not to distort the average.

3 Coding

In data collection process, the final phase is quantification of the qualitative data. It is transformation of answers into a format that computer can understand.

Alpha-numeric codes are used to covert responses such as good and bad, poor or strong. Data are transferred, if necessary, to coding sheets, responses to negative worded questions are reversed to conform to the same direction.

Coding is a “systematic way in which to condense extensive data sets into smaller analyzable units through the creation of categories and concepts derived from the data.”

It is the “process by which verbal data are converted into variables and categories of variables using numbers, so that the data can be entered into computers for analysis.”

CODING THE DATA

Click thumbnail to view full-size

4. Categorizing

To categorize data, the researcher puts them into categories or classes or segments which are mutually exclusive. Examples are gender, age, religion etc. Nominal scales are used for this purpose.

Category will be determined by the query. Is it about income levels, is it about education or customs or habits? The categories are accordingly drawn and item listed in a table.

5 Entering Data

Technology had made life easy. Data can be collected on scanner answer sheet which enable a researcher to enter them directly into computer file. In other cases, raw data would be manually entered into computer as data file. Here some software like SPSS data editor can be used to enter, edit and view the contents. It is easy to add, change or delete values after the data has been entered.

Data Analysis

There are three objectives of the data analysis:

  1. Getting a feel of the data,
  2. validity and reliability and
  3. testing the hypotheses of the investigation.

1. Feel of the Data

Lists or statements are summarized to get a feel of the data. Descriptive statistics helps reduce the large data into meaning full indicators showing central tendencies and spread. Three measures of central tendency are commonly used in statistical analysis – the mode, the medium and the mean. Each measure is designed to represent a typical score. The choice depends upon the shape of the distribution (whether normal or skewed) and the variable’s level of measurements (nominal, ordinal or interval)

Averages etc do not tell us everything. At times, it could be misleading. Income per capital of Brunei is US$ 53,1000. One tends to feel that there would be hardly any poor person. But there are people below poverty level even in such a rich country. Such information is disclosed by dispersion or deviation or spread. These measures inform us how wealth is distributed in the country. There are super rich and very poor people which is indicated by the spread or standard deviation.

FEEL OF THE DATA

Click thumbnail to view full-size

2. Reliability & Validity

The data should be both reliable and valid. While reliability shows trust-worthiness and dependability, validity shows appropriateness or authenticity or suitability or genuineness.

SAMPLE PROBLEM USING A SAMPLING DISTRIBUTION

3. Hypotheses Testing

Once cleared of any doubt as to reliability and validity, the researcher can go ahead in testing the hypotheses already formed for the report.

A car manufacturing company plans to test a new engine in order to find out whether it meets new standards of air pollution. The average should be less than 20 parts per million of carbon. Ten engines were picked up for test and after determining their emission levels, the average was found to be 17.17. Apparently, it is far lower than 20 but it is based on a small sample and average emission of total engines produced may be higher.

The General Manager wants to find out if the engines meet the pollution standards, This should evaluated at 99% confidence or with 1% chance of error.

NULL HYPOTHESIS: The engine does not meet the requirement, the average being 20

ALTERNATE: The engines meet the requirement, average is less than 20

REJECTION REGION: For ∂ =0.01 and d/f (degree of freedom) = n -1 =9, the one tail rejection region is t < - t = 2.821

Using a standard formula ( as shown on the side), the General Manager found the t-value to be -3.00 which far exceeds the t - value -2.821, as per table, and hence rejected the Null Hypothesis which means that the engines do meet the requirements. (under these conditions, the chance of being wrong are one in one thousand which is very low probability).

Conclusion

Once necessary data has been collected through surveys or experiment, it should be edited to ensure that only correct data is used. Next, it would be coded and categorized and entered in the tables or computers.

Based on the query or question, hypothesis should be developed and tested using the appropriate and reliable measures.

The results should be interpreted and a decision taken to solve the problem.

More by this Author


Comments 26 comments

Rufi Shahzada profile image

Rufi Shahzada 6 years ago from Karachi

Dear Sir,

Marvelous HUB on DATA ANALYSIS AND INTERPRETATION, but I have one question that many people are very much biased and not even read the questions of the questionnaire and just tick randomly to save time or whatsoever reason, So how come we could eliminate or possibly reduce the element of biasness in such conditions?

Regards,

Rufi Shahzada


hafeezrm profile image

hafeezrm 6 years ago from Pakistan Author

There are many techniques and questionnaire is just one of them. Usually, the researcher knows the degree of biasness associated with certain questions. The projective techniques are helpful in digging out the truth.

If you a ask a student, "Do you cheat in exam", the answer would be invariably 'no'. But if asked "do your class mates cheat in the exam', the student would honestly reply to such a question.


Rufi Shahzada profile image

Rufi Shahzada 6 years ago from Karachi

That is so true.

Thanks a lot Sir!

RUFI SHAHZADA


haaris_1 profile image

haaris_1 6 years ago

Sir great hub very informative and very well organized. Thanks for sharing.


Data Cleaning 6 years ago

Thanks for such a nice blog post....i was searching for something like that.


UBEDULLAH KHAN MAHSOOD 6 years ago

DEAR HAFEEZ UR REHMAN, YOUR ARTICLE ON DATA ANALYSIS AND INTERPRETATION IS VERY INFORMATIVE AND BRINGS MUCH INFORMATION TO ME I HOPE YOU WILL PROVIDING ME INFORMATION AND KNOWLEDGE IN FUTURE.

AND I APPRICIAT YOUR WORK YOU ARE DOING A GREAT WORK WHICH IS DEMAND OF TIME, NO ONE CAN IGNOR YOUR EFFORT IN THIS ASPECT.

THANKS.


kamran 6 years ago

sir..wow...this is deep..ive made a copy of this...reading again and again. worthy to note and implement in organizations. thnx


data cleaner 6 years ago

While reading your blog it seems that you research on this topic very much. I must tell you that your blog is very informative and it helps other also.


Asad Ali 5 years ago

Dear Sir...

Your article of Data Analysis and Interpretation is very good after going through this article all concepts are clear and this has enhance my knowledge.

Thanks for sharing this information this has given me valuable insight and I will also implement in organization.


Trilochan Pokharel 5 years ago

It's fascinating and worthy collection of information for one who is interested to have quick knowledge on data analysis. It would be much valuable if one more topic "Data Management" could be included.


Data Cleaning Software 5 years ago

This is such a great blog post. I have been searching for something like this for ages.


Data Cleaning Software 5 years ago

Thanks for such a nice blog post....i was searching for something like that.


sanjay maheshwari 4 years ago

sirji thanks very good information about data analysis again thanks


hafeezrm profile image

hafeezrm 4 years ago from Pakistan Author

Thanks Trilochan Pokhaarei, @data cleaning software and Sanjay Maheshwari for your comments.


dsa 4 years ago

sadas


hafeezrm profile image

hafeezrm 4 years ago from Pakistan Author

Thanks


ammyjames profile image

ammyjames 4 years ago

Reading through such an excellent piece of writing is always amusing for me where i can found an element of captivation along with some informative material.

http://www.selftestengine.com/HP0-J29.html


hafeezrm profile image

hafeezrm 4 years ago from Pakistan Author

Thanks @ammyjames for your comments.


getachew workineh 4 years ago

your hub is essential and informative which used as a short hand reference book for the beginner researchers and students! if you try to inculcate others such as regression(both linear and logistics) it becomes really smart and tough HUB!


hafeezrm profile image

hafeezrm 4 years ago from Pakistan Author

Thanks @getachew workineh for your comments. Linear Regression and Logistic or curvilinear are method used for forecasting. I will certainly deal with at some point of time.


mercieful mercita 4 years ago

congratulations sir hazeez honestly this is a good job.keep it up and may God/Allah bless u abundantly.


hafeezrm profile image

hafeezrm 4 years ago from Pakistan Author

Thanks Mercieful Mercita for your comments.


MUHAMMAD KHAN 3 years ago

SALAM SIR,

I collected data through structure questionnaire, scale was (strongly disagree, disagree, neutral, strongly agree and agree) now for a particular question mean is 3.02 Std. Error of Mean is 0.08, Std. Deviation is 1.26 and variance is 1.60.

please tell me sir in simple words what it reflects, ie mean, Std. Error of Mean, Std. Deviation and variance.

thank u

m. khan


hafeezrm profile image

hafeezrm 3 years ago from Pakistan Author

All the terms which you have mentioned are part of descriptive statistics which can be distributed into two categories: Central Tendency and Spread. Mean, median and mode are part of central tendency while variance, standard deviation and standard error reflect the spread.

Two countries may have the same per capita income say US$1,500 per head. But when we study their standard deviations, these may be quite different i.e. 10 and 1000. The country with mean income of $1500 and standard deviation of $10 is quite homogeneous and its people would be living in peace and harmony. On the other hand, the other country having a greater standard deviation would have a wide rich-poor gap and there would be lot of disgruntlement, strikes and conflicts.

Standard deviation of population is difficult to compute. Instead, many samples are taken and their means computed. Thereafter, the standard deviation of sample-means is calculated. If this is very large, it is advised that more samples should be drawn and standard error is re-computed.


m.khan 3 years ago

thank you sir, may you live long.


bkaushal 3 years ago

very good notes sir

    Sign in or sign up and post using a HubPages Network account.

    0 of 8192 characters used
    Post Comment

    No HTML is allowed in comments, but URLs will be hyperlinked. Comments are not for promoting your articles or other sites.


    Click to Rate This Article
    working