Archive for June, 2015

Data-ism in the Information Era

Thursday, June 25th, 2015

I just finished reading the book “Data-ism” by Steve Lohr (check this talk the author gave about the book) and it really changed how I view big data and its impact. In general, the concept of data-ism is useful in the business world, where many companies may not have gone much further than a big data approach to helping them aggregate or mine copious amounts of data for different applications and business processes. People say we never had so much information on people and things. New cloud hosting solutions and other sophisticated data systems, especially software, have also led to the rise of data skeptics, who push back against the idea that good data handling can provide infinite results without other types of planning. Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured, time sensitive or simply very large cannot be processed by relational database engines or by one person. This type of data requires a different processing approach called big data analytics, which uses massive parallelism on readily available hardware. Quite simply, big data reflects the changing world we live in. The more things change, the more the changes are captured and recorded as data; mainly personal information. For example, where we bought our new shoes, how much we spent in credit cards, where we are in real time, etc. Take weather as an example. For a weather forecaster, the amount of data collected around the world about local conditions is substantial, but sometimes wrong. Weather forecast relies on statistics. Logically, it would make sense that local environments dictate regional effects and regional effects dictate global effects, but it could well be the other way around. One way or another, this weather data reflects the attributes of big data, where real-time processing is needed for a massive amount of data, and where the large number of inputs can be machine generated, personal observations or outside forces like sun spots or magnetism. However, getting to that Big Data payoff is proving a difficult challenge for many organizations. Big Data is often voluminous and tends to rapidly change and morph, making it challenging to get a handle on and difficult to access. The majority of tools available to work with Big Data are complex and hard to use, and most enterprises don’t have the in-house expertise to perform the required data analysis and manipulation to draw out the answers that the business is seeking. New technologies tend to spawn utopian and dystopian thinking in equal measure. For all his caveats about the unproven promise of big data, the book author Steve Lohr is clearly one of the enthusiasts. He has been captured by data-ism, evincing open admiration for those on its leading edge. Perhaps for this reason, he focuses more on the benign “stumbles” of big data than on the serious ones: the humorous mistakes of intelligent IBM’s artificially computer system, Watson, as it trained to compete on the game show “Jeopardy!” (check the article “In ‘Data-ism’ Steve Lohr gives his take on how Big Data will shape our future” from The Washington Post). Controversial or not, data-ism is affecting both Science and Technology since these areas are becoming more and more data-driven. One example is how we can sequence genomes and get other clinical information from patients using sensors. The collection of 100,000 data points per second for several variables in a patient, for example, needs software and big data analytics tools to make sense of the data. Science that is dependent on single individuals generating and interpreting the information is endangered. We are living the Data-driven scientific era. Hypothesis-driven science will disappear, and Institutions and Enterprises will dispute Data Scientists. Just as a glimpse on the impact of data, in the last decade we already generated more information than in all the rest of the humankind history combined. Be prepared for a future full of information! The challenge will always be making sense of too much data, especially in science… Well, but science was always challenging, wasn’t it?