top of page

How to cope with open source data in the age of artificial intelligence

As the Data Economy booms, differentiating and standing out from the crowd is crucial, just look at IBM’s acquisition of The Weather Company in 2016.

When data replaces code as the secret sauce for analytics, it should come as no surprise that an open data movement exists, which like the open source movement, seeks to ensure useful big data sets are freely available to all.

Moshe Kranc (MK), CTO at Ness Digital Engineering, talks to Data Economy (DE) on how calls for open data stand to impact both large and small companies, as well as the future of Artificial Intelligence (AI).

DE: How are companies adjusting to this Open Data movement?

MK: Some companies are aggressively safeguarding their data, while others are taking a contrarian position, by releasing their proprietary data to the public, as a means of generating PR kudos.

One key example of the latter approach is Uber Movement, which uses data from the billions of rides Uber has provided, to let planning agencies and researchers track car travel times between any location at any time of day.

DE: Why should the Uber Movement be taken seriously when it comes to Open Source Data?

MK: Uber deserves major kudos for releasing rides data. Having reached the conclusion there was no way they could directly monetize this data, Uber could have just sat on it.

Instead, they recognized the value of goodwill potentially generated by releasing this data. In the long term, they may even derive some financial benefit.

As they negotiate the right to run taxi services in cities across the globe, Uber often runs into opposition from local taxi drivers. It certainly helps their cause to have generated goodwill with urban planners and residents.

DE: Are there any downsides to releasing your data?

MK: Companies that offer goodwill gestures in terms of releasing data must be careful to not inadvertently violate customer privacy. AOL Research made a similar goodwill gesture to Uber’s in 2006, when they released their search logs to better help researchers tune their search algorithms.

Although the data was anonymous, an industrious New York Times reporter successfully traced specific searches by the same anonymous user to locate that user. The resulting lawsuits ensured that no one would ever release search logs again. Let’s hope Uber’s goodwill gesture meets a better fate than AOL’s.

DE: In terms of big data, how are companies approaching this concept in the market?

MK: Some companies have taken a very aggressive approach to benefitting from Big Data. I experienced an example of this several years ago, when a book I authored in 2004 went out of print.

Several months later I was surprised to discover that my book was available, as scanned page images, on Google Books, even though I had never given them permission.

I wrote to Google to complain and received a response explaining that Google scanned any book that went out of print in order to analyze the text so they could improve their natural language processing (NLP) algorithms.

Google offered me a choice: receive $5 for the rights to the book, or have the book removed from Google Books. I took the $5.

In retrospect though, I regret my choice, because it gives Google an unfair advantage over competitors in training its NLP algorithms to have such a vast corpus of books.

DE: Have algorithms for artificial intelligence become readily accessible?

MK: The algorithms for artificial intelligence have indeed become readily accessible, thanks to the major cloud vendors (Amazon, Google, Microsoft and IBM) providing cheap access to complex algorithms to create vendor lock-in to their cloud offering. Each comes at it from a different angle.

For example, Google TensorFlow gives expert users lots of options, while Amazon’s tools are geared towards less technical users who appreciate ease of use. But, the bottom line is that nowadays you don’t need to know a whole lot about how collaborative filtering algorithms work to add a recommendation box to your ecommerce web page.

DE: If everyone has access to the same algorithms, how can companies make their products and services standout?

MK: The only real differentiator is the data used to train the algorithms. The software giants realized this last year, and began an all-out war to acquire exclusive access to meaningful data sets.

For example, in 2016 IBM acquired The Weather Company’s data gathering infrastructure, which gives them exclusive access to highly detailed information about weather, based on time and location.

This kind of data is invaluable for data scientists because the weather has an impact on just about any kind of analysis you can imagine, from shopping patterns to machinery failure predictions. Microsoft jumped into the fray by acquiring LinkedIn, in part for the data it collects about its customers.

LinkedIn knows where you work, what your skills and professional interests are and who you work with. Microsoft plans to mine this data in order to enhance its office productivity products. For example, imagine a Microsoft Calendar app that can automatically give you strategic background information about every person you invite to a meeting.

DE: How can mid-size and small companies compete against vendors or end-user companies?

MK: While the giants are busy buying up data sources to make them proprietary, the rest of us must rely on publicly available data sources. For example, the US government has been an excellent source of Open Data, with a variety of agencies providing free access to detailed data and data mining tools via initiatives such as The Opportunity Project and Open Data 500. Small and mid-sized companies are the biggest benefactors from the Open Data movement and can help advance the movement by opening up their own data for public use.

DE: What are your predictions for open source data and AI?

MK: I believe the Open Data movement will succeed in the same trajectory as the Open Source software movement. When Richard Stallman launched the GNU free software project in 1983, it seemed a utopian cause at best.

Today most CPUs run Open Source operating systems like Linux, and most Big Data projects use Open Source databases like Hadoop, Spark and Cassandra. The Open Source movement succeeded because it tapped into the wisdom of the crowd – no commercial software company can compete with the massive community resources that create, evolve and debug an Open Source software product.

Similarly, the Open Data movement will succeed when masses of individuals and small companies contribute their data for the benefit of all, creating a huge data corpus that no single company can match.

As for AI, it’s here and it’s real. The Cloud vendors have lowered the barrier to entry by providing algorithms and training data to the point where AI has become table stakes.

Expect to see personalized web shopping become ubiquitous, expect AI-based Intelligent Virtual Assistants to begin replacing humans in performing repetitive office tasks, and expect voice interfaces to become far more common.

These are easy predictions because they are already happening. As for the harder questions (Will AI actually make business more efficient? What effect will AI have on the unemployment rate when it automates tasks that are done today by humans?) – ask me next year.


RECENT POST
bottom of page