How to Improve Survey Data Quality for Actionable Results

November 5, 2014

Are you tired of your survey results not being acted on? Despite having put a lot of time and effort into your survey project and getting buy in from stake holders early in the process, no one is taking your results seriously.

Maddening isn’t it?

Could it be that your data quality is suspect? If so, it is wise not to act on the data.

Bad data leads to bad decisions.

Causes of Poor Data Quality and How to Avoid Them

All too often survey creators are focused on the quantity of responses rather than the quality of the response. The assumption is that more responses leads to more accurate conclusions. This is not necessarily true.

You can increase the quality of your data by avoiding these 4 poor data contributors:

1. The Effect of Selection Bias on Data Quality

Selection bias occurs when responses are included that do not reflect the representative sample.

Obviously it is important that you have a statistically valid sample size. But if the targeted population is large, the responses collected may not accurately reflect the entire population if each member of the population did not have an equally random chance of being included in sample. For example, collecting responses from a rural population using an online survey, where a large portion of the population may not have internet access.

The source of the sample is far more important than sample size when it comes to reliable results. A sample from a homogenous population will give you consistent results. If however, you are surveying a heterogeneous population, a larger sample size is required so that all subgroups are represented.

After choosing your sample, you still need to use proper controls such as disqualifying logic, to filter out respondents who are not part of the targeted sample population.

2. The Effect of Question Bias on Data Quality

Certainly, leading questions will skew your results and ruin your credibility. This is a common problem if you are emotionally tied or close to the subject matter. You may not even be aware that you have angled a question for a particular answer. You can avoid this by having others who are not as emotionally involved review your survey.

The way you set up your questions and answers options can also create bias. Often, when people are unsure how to respond, they will choose the middle option when presented with a ranking question. If it is a multiple choice question, people tend to choose the first or last option if they are not unsure how to respond. You can avoid skewing results by randomizing your question and answer options.

3. The Effect of Irrelevant Questions on Data Quality

This is a big one; don’t present respondents with irrelevant questions. This will always result in poor quality data. You can avoid this by using question and page logic to display questions that are relevant to the respondent based on a previous answer option.

4. The Effect of Duplicate Responses on Data Quality

If you’re collecting votes or you’re your survey is incentivized, or if you use a panel company where individuals are paid to take your survey, respondents may be tempted to take the survey more than once to increase their odds of winning or collecting benefits.

Be sure to use vote protection so that respondents can only take the survey one time.


Indicators of Poor Responses That Lead to Low Data Quality

Even if you follow the best practices listed above, you may still have poor quality responses.

“How do I know if I have bad data?”, you may ask. There are several components that indicate poor data quality.


These are respondents who sped through your survey and did not give much thought to answers making their answers less accurate. Advanced online survey tools use start and finish times so that you can determine how long it took a respondent to complete a survey. More advanced survey tools even have a page timer so that you can see how long it took them to complete each page of the survey.

Straight-liners and Pattern Responses

Responses that have all the same answers (straight-lining) or answers in a pattern (zigzag or Christmas-tree) can indicate that the data may not represent the respondent’s actual opinion.

These behaviors indicate that the respondent wanted to get through the survey quickly rather than reading the question and giving the answer some thought. This behavior is common with incentivized surveys.

Red Herring Questions

Professional researchers have learned to plant trap questions that indicate the respondent is not engaged.

These are simple questions with obvious answer such as “does the sun rise is the East”. If the question is not answered correctly, it is a flag that the response is of poor quality.

Consistency Checks

Another way to determine how engaged respondent are is to use consistency checks. These questions are a repeat of a previous question but in a different format. If the answers are different, than it is likely that the respondent is not engaged.

Fake and Gibberish Answers

Fake answers (e.g., lorem ipsum, test, etc.) to open-ended questions may indicate respondent did not provide truthful answers elsewhere in the survey. Nonsense answers to open-ended questions may also indicate that the respondent was not engaged.

All or Single Checkbox Answers

Checkbox questions with a single option selected in specific scenarios may indicate poor data quality. Professional survey takers understand that some surveys will disqualify them if they don’t check anything on a screening question. To circumvent this, they will just mark one answer so that they can proceed.

One Word Answers

Single-word answers to open-ended questions may indicate a form-filler program was been used that the respondent was not engaged.


The Problem with Data Cleaning to Enhance Data Quality

If you are not cleaning your data, or you do not know how to clean your data, you are not alone. According to the Alchemer Benchmark Guide Survey, only 70 percent of online surveyors clean their data before analyzing.

Why aren’t more surveyors scrubbing their data?

There are 2 problems with data cleaning:

1. It’s Time Consuming

While it is well worth the effort to ensure that you weed out invalid and biased responses, data cleaning is a time consuming process. It requires sifting through all of your data and removing bad responses.

2. It’s a Subjective Process

Besides being time consuming, data cleaning is a subjective process and you can easily introduce your own bias when interpreting the data.

How to Eliminate Data Interpretation Bias to Improve Data Quality

Some advanced survey tools, such as Alchemer, have an automated data cleaning feature that can help eliminate personal bias when interpreting the data.

Alchemer’s data cleaning tool normalizes data responses and flags outliers that have suspect behavior such as speeders, pattern responses, and data inconsistencies.

A Dirty Data Score is calculated for each response using flags that you have enabled. Each flag is multiplied times the weights that you selected for each flag type. All Dirty Data Scores are normalized on a curve so that the response with the highest score receives a 100 and the response with the lowest score receives a 0. You can then set the threshold for which responses should be quarantined.

This tool still requires human intervention to determine the acceptable threshold and if indeed the response is bad but it makes identifying suspect responses much easier and faster.

Other Data Quality Tests

While the Data Cleaning tool will detect suspect responses, you may still want to run further tests to detect question outliers. The Engineer Statistics Handbook, defines an outlier as “an observation that appears to deviate from other observations in the sample”.

Outliers often indicate bad data. If you are using averages in presenting your data findings, outliers can have a significant impact. Before simply deleting them, you will want to do some investigating.

It could be that the normality assumptions used are invalid. Incorrect assumptions can generate wildly inaccurate conclusions. A normal probability plot will help determine normal probability.

Outliers could also indicate that the survey was setup, administered or coded incorrectly. Or, the outlier could be due to a random variation.

You will want to export your data so that you can do some more analysis. Scatter, box plots, and histograms are common graphical methods for detecting outliers.

Drawing Conclusions From Your Data Results

Clean data allows you to draw valid conclusions that you can act on. It will give you the confidence you need to present your data findings and make the best possible decisions.

Source:Alchemer Market Research Benchmark Guide: 2012, Survey Techniques Survey, Do you clean your data before reporting on it?, n=1,070 Total Sample

Sample Size
Engineer Statistical Handbook
Identifying Statistical Outliers In Your Survey Data

  • Get Your Free Demo Today
    Get Demo
  • See How Easy Alchemer Is to Use
    See Help Docs
  • Start making smarter decisions

    Start a free trial