Data models should work across regions, across nature, across data sets. This means “Data Universality”. If natural phenomenon exhibit universal patterns like geometry, outliers, 80-20 principle, mean reversion, fat tails etc. then the data these natural phenomena generate should also express a similar behavior; actually they do. But we still consider data sets as religious, the stock market data is useful for the financial analyst, while the subatomic data is useful for physicists, the social network data is for marketers.
We see similar trends in all data sets but we don’t mix and match sub-atomic, social and stock market data. Why? Because it would be kind of blasphemy when Higgs particle would illustrate a high correlation with a high-frequency dollar signal. The objective thought would be that stock market data can’t be reconciled with subatomic data as the respective elements vibrate at a different frequency. Yes, that’s true, but social data has a workable frequency with stock markets. This is the reason twitter forecasting is in vogue, and companies are relying on the sentiment today to under consumers and market trends. It would not shock me if there are market systems out there using twitter data to trade.
What’s the problem? The problem is like Charles Handy mentioned paraphernalia. Calling a seamstress a designer does not change her real role. In the process of finding and worshiping big data tools, we claim to have moved to the next stage without acknowledging the elephant in the room. It’s the same elephant but we call it something else. We chose to ignore that the answer to tomorrow’s problems is not in a discipline, but between disciplines.
It was about 36 months back we started compiling small noncapital market exercise. We took Google search data for Fortune 500 companies, various emotions (fear, greed, happiness etc.) and ran our data algorithms on the same. Just like gold, oil, dollar we could create cycles of growth and decay for simple web data. We could predict which Fortune 500 companies would be searched more and which will see decay in the search. Last week we went a step ahead. We actually benchmarked it to the google search data. Assuming Fortune 500 search date to be a set of time series like that of stocks, we applied the ORMI (Orpheus Risk Management Index) Active methodology to select which of the top Fortune 500 companies will be searched more, assuming we made money if our portfolio of 10 selected companies were searched more. This is what the companies actually need to know, are they going to be searched more or searched less (assuming positive search is positive bandwidth).
In a short period of 24 months (search data has limited history), our ORMI Fortune Index moved up from 100 points to 120 while an equal weighted google search data of Fortune 500 fell by a negative 10%. ORMI Fortune Index outperformed its respective universe by 30% over 24 months. This outperformance was a proof of predictiveness, which is where data mining should head rather than subjective extrapolation, which can’t be quantified. How much is your data-mining adding to your bottom-line is quantifiable? The top 10 potential search growth companies on google search lead to six selections viz. AMGEN, DOW CHEMICAL, HALLIBURTON, MCKESSON, DANAHER, CHEVRON, four components in the model were cash. How we integrate this model to stock market forecasting is another step. But the current work proves how lacking the current data tools are. We illustrated more such noncapital market or social mood cases over the last few years. When Spain came from the negative outlier to end of winning the world cup; Long Football-Short Baseball illustrated how time series for sports can be dissected like any other stock market time series.
We explained the problem with current web search in ‘Jazz and Trading’. Despite all the current search tools, I can’t find the American jazz singer I heard a few years back. The search is not cognitive or semantic yet. Google could not help me reach the singer because I needed related search parameters. What was her age? Was she an African-American? What was her net worth? Suddenly, something so relevant for me got lost in the deluge of information. The problem with search is the lack of smart cataloged databases, which can understand each other. Only when databases are able to understand each other can data come alive and make search smarter. In the article ‘Researching Google Search’, we explained how everything from the number of searches to what we search is connected with time, and any predictive cycle tool that can measure and anticipate emotion can help researchers understand where a society is headed and where it was when hope was rising and where will it go when hope shall fall.
The age of big data accompanies numerous data types like web date, social data, and consumer data. Hence it has become essential to lay down a framework for data universality. This means commonality of behavior, commonality of patterns and data character. Such guidelines could make data visualization, transformation and interpretation easier. The natural universalities leading to data universality can harmonize big data classification and improve the predictive model.
What does real data seasonality tell us? Variable growth and decay for multiple periods (e.g. intraday, weekly, quarterly, or for the decade ahead). Why is it important? Because nature is not just about growth, it’s also about decay. Extrapolation of a trend is incomplete science, understanding when that extrapolation will peak is as important. What is missing today and where is the need for value addition? The society is missing the connection of growth with decay (seasonality), the temporal element of growth and decay, the lack of data universality, the lack of acceptance of interconnectedness of everything with everything, the lack of anticipation whether it’s really the butterfly creating the tornado or did the role switch to the tortoise from the butterfly last week as the butterfly decided not to flutter. What does this mean in business terms? Everything, if you can’t anticipate, you can’t recreate the past or construct the future, you don’t understand the data set. What is the secret sauce? The secret sauce is universality. If all data sets exhibit a universal behavior, data manipulation should be based on these universalities and seeks, identify and enhance the respective natural patterns. Tomorrow’s data should know that google and googly are two different things.