
Experienced data scientists will tell you data prep is (almost) everything and is the area that they spend the majority of their time. Blue Hill research reports that data analysts spend at least 2 hours per day in data preparation activities. At 2 hours per day, Blue Hill estimates that it costs about $22,000 per year per data analyst to prepare data for use in data analytics activities.
One of the reasons that prep takes up so much time is that it is generally a very manual process. You can throw tons of technology and systems at your data, but the front-end of the data analytics workflow is still very manual. While there are automated tools available to help with data preparation, this step in the process is still a very manual process.
Data preparation is important. But…what exactly is it?
The Importance of Data Preparation
Data prep is really nothing more than making sure your data meets the needs of your plans for that data. Data needs to be high quality, describable and in a format that is easily used in future analysis and has some context included around the data.
There’s tons of ways to ‘do’ data preparation. You can use databases, scripts, data management systems or just plain old excel. In fact, according to Blue Hill, 78% of analysts use excel for the majority of their data preparation work. Interestingly, 89% of those same analysts claim that they use excel for the majority of their entire data analytics workflow.
As I mentioned before, there are some tools / systems out there today to help with data prep, but they are still in their infancy. One of these companies, Paxata, is doing some very interesting stuff with data preparation, but I think we are a few years off before these types of tools become widespread.
Data preparation is integral to successful data analytics projects. To do it right, it takes a considerable amount of time and can often take the majority of a data analyst’s time. Whether you use excel, databases or a fancy system to help you with data prep, just remember the importance of data preparation.
If you don’t prepare your data correctly, your data analytics may fail miserable. The old saying of “garbage in, garbage out” definitely applies here.
How focused are you on data preparation within your organization?
from Eric D. Brown http://ericbrown.com/data-analytics-data-preparation.htm
http://ericbrown.com/wp-content/uploads/2016/07/CIO-Crossing-the-gap-to-big-data-2-300x169.png
Microsoft announced a number of updates to Word, PowerPoint and Outlook today that will bring more of its machine learning smarts (among other things) to some of its core Office suite products. For Word, the focus is on helping you become a better, more confident writer. With Researcher for Word, the team is now building a new tool into Word that helps you find information regarding the…
Today seems to be a day of e-commerce inevitabilities in India. Alongside news of Rocket Internet-backed Jabong’s sale comes the launch of Amazon Prime in the country.
Amazon today announced that it has partnered with the UK Government to test some of the technologies that may one day enable its Prime Air drone delivery service. In a partnership with the UK Civil Aviation Authority (CAA), Amazon now has the permission to explore beyond-line-of-sight operations in rural and suburban areas (something the U.S.’s Federal Aviation Administration (FAA) does…
It can’t be disputed: Windows 10 has been a wild success for Microsoft, after being installed on more than 350 million devices. To mark this, August 2nd is the release of the Anniversary Update which brings a slew of tweaks, fixes and new features.
This weekend, Microsoft dropped the price of the Xbox One for the third time since May, giving users access to the Xbox 360 successor for a mere $250. Remember, the Xbox One originally launched with a 500GB drive and a Kinect sensor for $499, all the way back in November of 2013. In the years since, the console has dropped dramatically in price, going to $399 in June of 2014. Since May of…