Championing Data Transparency

Data transparency is essential to build trust and confidence from those who use it to make decisions. Unfortunately, too often, when asked to review the data model, we are told it is confidential or, better yet, proprietary. The Freakonomics podcast recently had multiple episodes on candles within the academic research community due to other researchers trying to replicate their research. They found there were both accidental and purposeful manipulation of the data.

Manipulation of the Manipulation

A very common manipulation of the data is what I refer to as manipulation of the manipulation. This is common, for example, for a simple weighting of data where you remove some of the highest and lowest numbers from the dataset. This is called “trimmed mean” or “adjusted mean.” This action is not malicious and is intended to remove outliers at both ends of the data range. A simple example is when you are reviewing something, and you can give it a score within a range. If you want to help your favorite be in the final consideration, you should never give it a perfect or too low of a score as they will be removed from the set as outliers. People will still aggregate votes, but they will not be outside the mean.

Can Others Trust the Outcomes?

During business school, I worked on building a decision support system that helped companies identify the best mode of entry into a market. The team has identified over 900 variables that a company needs to consider to decide. Each variable had a weighting that indicated how easy or complex this factor was to enter. Each factor is folded into a category and then into the total score for a mode of entry and a market.

I lost a lot of sleep thinking about how an over or underweight variable could negatively impact the calculations. One of the variables was electricity. What was the outlet type, quality, availability, and cost per unit? If our ratios were off, it might dissuade someone from entering the market. That’s why I argued for transparency in our weightings so users could understand our logic and even potentially update the ratio based on their research. I created an extensive addendum explaining our range and reasoning behind the weightings. We ultimately gave the option for the ratio to be adjusted for anomalies we did not identify.

Sharing data can lead to challenges, but defending our methods is essential. One way we did this was by showing raw data. For example, if a client’s website traffic didn’t increase after our changes, we hadn’t made enough impact. I wanted clients to understand this for two reasons.

First, it could reveal areas where we should have focused more. One example is a company that made a midsection-shaping garment called “shapewear.” While they knew that “girdle” was a more widely recognized term, they refused to use it as it was dated and did not reflect the brand’s essence. By showing them the data, they understood that their niche focus limited their potential growth but persisted. They took a similar stand on denim vs. jeans. A few months later, after failing to meet their sales numbers, they acknowledged that our advice was right and we needed to connect at the point of reference and introduce them into the modern vernacular.

Second, sharing data helps create a more transparent and honest relationship with clients. It allows them to see the logic behind our decisions and better understand the impact of our work. By being open with our data, we can foster trust and collaboration with those we serve. We wanted to demonstrate that by trusting the numbers and models, we could inform better decisions.

Data Differers from Expectations

I have had companies block several of our initiatives because our models did not support what they hoped. Too many startups use the Pets.com model of opportunity. They will say there are X people in the world that like, need, or want something, and therefore if we can reach Y percent, we will make a gazillion dollars.

We have had cases where the client wanted detailed projections that were unrealistic or did not eliminate the blockers from us achieving those goals. For those that had overblown expectations, we would try to simplify our models further to indicate that these five actions should contribute X percent to the performance we’re expecting, and then the sum of these X percent will be our result.

We often needed to introduce a “blocker variable” where we would indicate a negative adjustment for every action where there was a delay or a blocker out of our control that was not removed. This blocker calculator showed the expected contribution level of traffic if our recommendations were facilitated when presented. It was a simple calculation using Excel’s @now calculation, compounding the expected opportunity labeled as Missed Opportunity. It would compare the actual vs. the projected numbers, plus the previous missed projection values.

Using this same ratio, projected over a year, they would miss out on 2.7 million people. We then added the cost to acquire these individuals through an ad agency, for example, which could cost $2 or $3 a click, representing millions of dollars of incremental costs to replace the low-cost traffic.

Sometimes, executives see this and realize they missed out on a quarter’s progress due to silly errors, egos, or unresolved viewpoints. We position this as a lost opportunity, as once the search is gone, it cannot be regained.

Collective Contribution

An example of integration and collaboration towards a common goal was a project for a major insurance company. Like many in the dot-com boom, they had an idea that they should create a knowledge portal for small businesses. They would come for the information, and we could advise them.

They were budgeted $50 million for building the site, content, and advertising. Their ultimate goal was to get the email addresses for 500,000 small businesses, with an initial goal of capturing 50,000 at launch. As often happens, the launch day was approaching, and the site still wasn’t ready. The last-minute plan was to put a logo and a teaser to stay tuned. The new site would rock your business or something crazy like that.

The night before, around midnight, before millions of dollars of print, radio, and news show advertising were to hit, I logged in and added an email signup option as it would be crazy to just lose any of this traffic. Around 2 am, I got a call on my cell from the development manager screaming at me for adding that without his permission. A series of emails were exchanged early in the morning with a few less eg-driven managers saying “Leave it. What’s it going to hurt?

By the end of the first day, that signup box had captured 52,000 e-mails and over 300,000 by the end of the week, and the site was still not ready. In the meeting later that week, the ads team was taking credit for the email signups, as was the development team, which had nothing to do with it. In the end, SEO, on the back of the massive and expensive media push

The Benefits of Data Transparency

Through data transparency, clients can see and understand the models, which use prediction and actuals. When we used these combined models, what we predicted might happen overlayed with what is/did happen, and the clients felt more confident about the status of the project. When it was first launched, we called it our “No Bullshit Model” since it showed if we did or did not meet the projections. This type of model represented reality and we could see the current state and results of the efforts. It would show if we were on track or not. Most appreciate transparency and say, “We never really believed the projections, but this model made them believable. Here’s the increase, and this is the decrease.” That transparency is crucial especially to highlight the real value of our deliverables.

This data transparency from the No Bullshit models allowed people to believe in what we were saying. Rather than just saying, hey, you got a 500% increase. Let’s go drink some champagne! The reports would show both the increases and the decreases, and where there was no change, which forced us to explain why. In the end, even if we were not perfect in all areas, people loved the transparency. They loved the integrity to show both wins and losses.

In a few organizations, we were asked to update the reporting to show how we were contributing to a person’s manager’s goals. This is interesting because, in reality, we are all simply contributing to our manager’s goals, so why not show both elements?

Conversly, we have had some that did not want us to present this type of report. Most felt the transparency of the reporting would set unrealistic expectations. In some cases, we could not realize gains as the client would not make any changes. In these cases, we would indicate what the barriers were and could show the lost opportunity over time due to them not making the required changes.

At the end of the day, I believe everyone must demonstrate their economic value and worth. I believe whether you’re writing something academic that challenges the norm, demonstrating added value, or showing how blockers are preventing you from achieving your goals, you need transparency.

Manipulation of the Manipulation

Can Others Trust the Outcomes?

Sharing and Validating Data

Data Differers from Expectations

Collective Contribution

The Benefits of Data Transparency