In the previous chapter we reduced the potential causes for your problem in the measure phase. Until now, your main tool of problem solving was the brainpower of your team. Many people oversee the risk that until now there is no real “proof” that your ideas and suggestions are actually true. You might have experienced something called “group think” in your meetings. At a certain point you might find that not the best idea wins, but the one which has the most supporters – be it out of political reasons or just because the boss likes it or some employees are more vocal and aggressive in their opinions than others.
But instead of trying to change human nature, we will try to circumvent the downsides of brainstorming by applying mathematical methods to prove or disprove the impact of your “few vital X’s”. In order to do so, we are converting our practical theories of our causes/ Xs into scientific hypotheses and then use mathematical analysis tools to prove or disprove our hypotheses. As a side note: in pure scientific theory hypotheses of course cannot be fully proven, but for our sake of argument, we will stick to this easier notation.
Our next step is to collect enough data for each X. This is a very critical moment for a company. Did you actually collect enough historic data? Where is this data stored? In what format are you having the data stored? Did you invest enough time and money over the last years to have a solid data base to work with. Nothing is more frustrating than realizing that you have to drop certain Xs or postpone your projects for months to start collecting data because right now you have no database to work with. Or that you logged the wrong data for the last 6 months. I strongly advise that you have a very good understanding for yourself about the existing data before you talk to your outsourcing partner. Data is the fuel your whole project will run on. And their engine will only run as good as the fuel you provide.
Regarding the data you should be able to answer following questions: Is my data accessible? Is it accurate? Do I have the right employees in my team who could support the outsourcing partner to collect and maybe change data? Just imagine your outsourcing partner would need just a subset of a table. Do you have employees on stand-by who can work with your existing databases or write SQL-statements. And if not, are you feeling comfortable enough to give an outsourced partner full access to your servers to “get the data for themselves”?
In the following passage we will briefly discuss how statistical testing could look like. I stress the word briefly, because these texts are for the general public to understand. This means we will not go into mathematical depths
Mathematical hypothesis testing
Today we will look at mathematical hypothesis testing. The purpose of appropriate hypothesis testing is to integrate the Voice of the Process with the Voice of the Business to make data-based decisions to resolve problems.
Hypothesis testing always analyzes two alternatives and our mathematical analysis will determine which answer is correct. Imagine your overall problem is the low quality of customer satisfaction with your coffee in your café. One of your Xs you assume is that different coffee bean suppliers cause statistical differences in the taste of your coffee.
Then your first hypothesis H0 would say:
H0: There is no difference in taste depending which coffee bean supplier you choose.
The alternative hypothesis H1 would however say following:
H1: There is a difference in taste depending which coffee bean supplier you choose.
In order to run a mathematical analysis, you need to have the coffee being tasted for a certain period of time and being rated by your customer (for optimal results on a 5-7 point scale). At the same time you need to write down which coffee bean was used when a customer rates the coffee.
The mathematical analysis will help you to answer following questions: how many samples do I have to take? Which variables out of the collected data will I use to make my judgement? How much different do my results have to be to disprove H0 or H1?
Just imagine, supplier A beans show an average customer satisfaction of 4.3, supplier B beans yield an average customer satisfaction of 4.5 and supplier C beans produce an average customer satisfaction of 4.7.
Would you be able to say with certainty and without mathematical help that supplier A is significantly better? Is a difference of 0.2 enough to call it already “better”? And what does average actually mean? In a worst case scenario half of your customers loved it (7) and half of it hated it (rating it 1.6). As you see, you will also look at the so called variance of the data collected.
To answer all these questions, we would use inferential statistics. Inferential statistics is the cool brother of descriptive statistics which you are using in Excel. Bar or pie charts would be examples of descriptive statistics and all they do is describing data.
Inferential statistics is, as Wikipedia puts it, “the process of using data analysis to infer properties of an underlying distribution […] Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates.
In the above example we assumed that there was a linear connection between supplier and coffee taste. But in reality there would be a group of Xs interacting with each other to create the taste of the coffee. Imagine your barista Betty is amazing in working the machine while Michele is not as good. And your machine is also producing probably better results after having been cleaned in the morning. So in this example you have three interacting variables which influence the taste of your coffee. These interacting effects can also be modeled with inferential statistics.
If your models get to complicated and no one in your company is able to run these analyses, try to get help early on to get this right. Because identifying the potential vital few root causes of your problem will later on boost the results of your outsourcing partner.
As the old saying goes: “Good data in, good data out”.