How To Check And Validate Consistency Of The Data

Validation of the data collected is the most crucial step towards a successful benchmark. Many associations make the mistake of assuming that the data they collected is good, validated and perfect for benchmarking without so much as double checking it for errors. We always advocate putting each questionnaire through a battery of validation steps to ensure that you’re only using the best data, which as you’d expect yields the best benchmarking results.

Note: The overall number of validation steps will primarily depend on the length of the questionnaire. Also, the internal relations and the benchmark’s frequency plays an important role in how much the data is validated.

Associations need to remember that there is a dollar value associated with the time and effort associated with validating data. The more data you need to verify the more time it takes and consequently costs more money. That’s why it is so crucial that each question on the survey sheet is clear, well thought of and tested before its sent to the majority of members. Taking these steps will then mean that your data validation does not take so much time and consequently isn’t all that expensive.

Checking for Internal Consistency

The questionnaire you send to members needs to have some form of internal consistency. What does that mean? Well, it means that when you are asking for things like total revenue or revenue per client the sum should equal the total. You can build the validation check into the questionnaire, so participants aren’t able to submit incorrect information. In order words, if the participant reports $10,000 of profits a month but puts the annual profit sum of $40,000, the system should flag this as incorrect. Though this kind of validation checking isn’t as easy for other types of questions. In those situations, the answers will need to be validated individually.

Tip: Asing an intelligent Benchmark Tool makes checking for internal consistency quicker, cheaper and easier due to automated validation checks within the tool.. So, if you have a large member base, let the system work for you.

Checking a Single Period for External Consistency

The external consistency through a single period generally occurs when the data of everyone participating for a single metric is consistent. What this will entail is calculating the accumulated metrics, and key performance indicators and outliers. The outliers could signal that the underpinned data is wrong, though it isn’t in most cases. Though outliers tend to influence the mean result with a certain degree of bias.

You will want to take all the entered questionnaires, the spreadsheets, data, etc. honestly it can be anything where the data is being collected. Then using the software sort the information for every question in either ascending or descending order (there is no right or wrong way of doing it), then analyse it for quality.

Example: In the survey data you have asked for revenue and the total number of employees. It allows you to calculate the average revenue generated per employee. Then all businesses can be ranked in terms of their revenue per employee in either ascending or descending order and check for the outliers. All of this allows you to check for the lowest and the highest values then compare those to the rest. Upon checking the numbers, you may notice that the businesses didn’t account for their full time or part time employees as everyone was grouped under a single head. Unless each person works full time or all employees are part-time there will be a deviation in the results.

Measuring External Consistency Between Specific Periods

Now the external consistency also should be uniform between the questionnaires for two or more periods but the same member business. Working with the example given above, it needs to be around the same for both periods unless there is some sort of significant change.

Having consistency between both periods does not mean that both the items have to be developed in a similar fashion. There are some categories which can increase in value while at the same times there are others which could decrease too. However, if there are forgotten categories or very large differences, then it should set off alarm bells. You should see the differences and then ask the member business to clarify why there is a difference. In most cases, at least in our experience, it is the result of an error. Though it could be right in which case it is essential to understand the circumstances under which it happened. You can then decide if the participant should be included in those growth figures.

How to Fix Errors During Validation?

One of the easiest ways to fix or correct errors during validation and something we touched upon briefly above is to contact the participants. You should ask them to explain why they entered what they did to double check if it was, in fact, an error or was deliberate. If you are familiar with the participant and know for sure that it is an error, then you can fix that yourself and not have to bother the participant.

Note: Participants can be contacted via email or over the phone. Though sometimes they may not respond. In that case, their data will have to be removed from the benchmarking results to preserve the purity of the final results.

Check Calculations and Important Settings

You have always to make sure that the calculations entered by participants are correct. You will also need to check the results to ensure that all the figures add up to the final result produced. Most KPIs you run into will have a denominator and nominator. Both of these should have a value which helps to calculate a KPI. But can you calculate the KPI correctly if one of these values happens to be missing?

Take for instance the amount of revenue each employee is generating for the company (the example above), so what if the member business enters their number of employees but not necessarily the revenue? Without knowing the amount in income, it is hard to know what the average rate of earning is per employee. Consequently, if the participant does not enter the number of employees and only puts a revenue figure, the result will end up being empty. Should there be a value for the benchmarks? Obviously, it should because you can only calculate a KPI if it has both a denominator and nominator with real values. Sure, the nominator in some instances could be zero but highly unlikely in the scenario of the abovementioned example.

What do you do with the 0 and the empty value?

Sometimes a zero ‘0’ should be used as such, sometimes it should be converted to an empty value. Either way, you should take this into account in your validation checks. You will also want to set rules for if the value is empty when in fact the value should be zero ‘0’.

When you run into a ‘0’, that’s going to influence the mean and various other group values, so it counts as a value nevertheless. Though a variable which has no associated value, i.e., it is left blank does not influence other values in the group.

In specifications of a total you want the empties to be converted to zero’s, while in individual variables that should not be zero (mostly entered because the respondents don’t know the answer) you want the zero to be converted to empty. And you want it to be converted automatically. Only a sophisticated benchmarking survey tool will do this for you.

So, one thing you’d want to do when checking for consistency is to convert a ‘0-empty’ or ‘empty-0’. What this means is that you either convert the zeros to empties or the empties to zeros. But when do you do that? If that zero has no meaning or consequence, it can be converted into an empty. Take the number of employees, for instance, this factor can’t be zero, so it has to be empty. If the participant puts a ‘0’ that entry has to be blocked or if accepted converted into a blank automatically which then has to be verified or in this case, rectified.

Conclusion

Benchmarking a group of businesses by an association takes a great deal of time, much of that time is taken up with validating and checking the data for consistency. It is a huge part of making sure that the data fed into the benchmarking system is solid. Even leading benchmarking programs like the Compare Method, will not be able to fix consistency errors plus, the results will be affected by incorrect information.

Whenever an association plans on sending questionnaires, they have to plan on making sure that the questions are crystal clear. Ideally, they should get elicit a binary response like ‘yes’ or ‘no’ or a figure. Admittedly that isn’t always as simple, that’s why some data has to be verified and validated manually.

Learn more

How we support you

We believe that benchmarking could be much simpler, cheaper and more efficient through professional automation. And we believe you can add much more value for your clients with intelligent automation.

More than 15 years of experience in benchmarking
Help on implementing the questionnaire
Support on analysing and reporting the results
Training you to conduct your own benchmarking surveys successfully