Some datasets are missing the data that we really need to build a valuable dashboard.
In a world of big data, organizations often gather information simply for the sake of having data. Organizations often collect what data they have easily available in their current state and do not design a data collection and storage system with any specific insights in mind. This approach is generally more simple but can often result in data lacking certain key elements that would make a normal dataset into a powerful source of insights.
We often examine data in the California Open Data Portal . The Open Portal datasets are useful in building proof-of-concepts and helping departments in the public-sector better understand the capabilities of Microsoft Power BI. We have found that the datasets are often missing data points that would help us in building a dashboard that enabled us to tell a more complete data story.
A simple example is,
How much would it cost to upgrade the California Department of Transportation’s fleet with more environmentally friendly vehicles?
To answer this question, we would need information on the existing fleet, some of which is available from their public database portal at this URL: california-state-fleet dataset. Some important datapoints that already exist in this dataset are:
- Make and Model (but in general terms, such as “Ford F150”, not specifying trim like “EcoBoost” that may impact data quality)
- Type of fuel
- Age of equipment (but not the actual mileage)
This is a good start, but to fully answer the question, we would need to know the cost of maintaining the current fleet, the availability of alternative fuel infrastructure, like EV chargers, hydrogen or LNG terminals, etc. Modern, economic vehicles should get better fuel economy than a legacy fleet, however, long term maintenance costs may be higher over the lifespan of these vehicles. All this information would be needed to get a complete picture of the new fleet’s replacement costs.
Average fuel economy, maintenance costs, and lifespan can be estimated from vehicle databases that may exist with the EPA or the Federal Department of Transportation. But it is not a straightforward join that data to the existing California Fleet data set because of the lack of specificity to published make and model, to the earlier example, an F150 can get anywhere between 24 and 15 mpg depending on the specification and trim and the existing data set labels them all as Ford F150.
Age is also available, but not actual mileage. On the same track, a 10-year-old vehicle that has 150k miles on it will have significantly higher expected maintenance costs than the same model vehicle with only 50k miles. The resale value of the 50k vehicle is also significantly higher. Caltrans may well have all of this data at their private disposal, but this is a good example of the types datapoints that are needed but not necessarily available in a public facing dataset to make meaningful insights.
These examples of published public datasets are great starting points for users to create their own data visualization sets. The gap of information and data allows users to answer some questions, but not all.
Kiefer can help your organization in determining how to best put your data to work and how to build dashboards that can help your organization make better, data-driven decisions. The best way to start is by taking an iterative approach and building a proof of concept. This engagement will help your organization in seeing what is possible, the importance of complete datasets, and the long-term benefits of building a strong enterprise data culture.