Data scientists in hot demand thanks to Big Data

Dr. Andreas Jedlitschka, Fraunhofer IESE, explains in an interview why data scientists are in such demand today. Credit: Fraunhofer IESE

Dr. Andreas Jedlitschka, Fraunhofer IESE, explains in an interview why data scientists are in such demand today. Credit: Fraunhofer IESE

Data scientist is one of the most attractive jobs of the 21st century. This impression is confirmed when you take a look at relevant online job portals. According to a study by the McKinsey Global Institute, in the USA demand exceeds supply by far – and this does not appear to be any different in Germany. But what is it that makes this job so interesting in the first place? Someone who knows this is Dr. Andreas Jedlitschka, Head of the Data Engineering Department at the Fraunhofer Institute for Experimental Software Engineering IESE and a member of the Expert Committee on Data Science of the Personal Certification Body at the Fraunhofer Institute for Applied Information Technology FIT, Sankt Augustin.

Why do companies have such an enormous need for data specialists?

With the increasing networking among all areas all the way to digital ecosystems, the deluge of data in companies and organizations also increases exponentially. At the same time, the growing data availability and the success stories published in the press also lead to an increasing desire to use data systematically, i.e., to perform data analyses, and thus the need arises for experts who can perform these. These "data specialists" are frequently combined under the term data scientists.

What makes a data scientist in the first place?

First of all, I would like to define the term "Data Science": Data Science is about extracting knowledge from data and doing so ideally for the benefit of the company. To do so, methods and techniques from computer science, mathematics, and statistics are used. The job profile is varied and ranges from Big Data analytics and visual analytics via Big Data architecture to integration. In addition, business models must be taken into account, resp. developed, and thus must also be understood. Furthermore, you need to talk to the customer, i.e., the user of the information as the addressee, and with the domain expert.

What are the tasks that data scientists do, and which skills do they need?

Data scientists must be experts in several disciplines at the same time: They do not only assess data, but must also understand the business contexts in companies and organizations. They must identify suitable data sources, determine and improve data quality, put together data, prepare and perform analyses, and then assess the results in terms of given criteria. If you work as a data scientist, you often bear great responsibility since far-reaching strategic decisions or even human lives may depend on the results of the data analyses − just think of systems used for diagnosis support in the medical domain or learning processes used in various areas in autonomous vehicles. This is why the underlying data and the analysis results must be continually checked in terms of plausibility, completeness, correctness, and relevance, in cooperation with domain experts. The requirements profile of a data scientist grows according to how their work is embedded in the company and includes not only technical skills, but also a number of soft skills such as ability to work in a team, strong communication skills, and creativity.

How to become a data scientist? What are the prerequisites, resp. what previous knowledge is required?

At Fraunhofer, we are offering a certified course in the context of the Big Data Alliance, where we make the participants fit for Big Data projects. The participants are often decision makers, but mainly business developers, analysts, data managers, and software developers. The prerequisite is basic knowledge of computer science and mathematics. In the beginner courses, the participants learn about the important fundamentals, processes, and best practices for dealing with large amounts of data and for the development of smart solutions with high standards on privacy and security. In the advanced courses, individual processes are studied in detail; then the focus is on being able to apply what was learned. In these courses, we teach state-of-the art knowledge in a manufacturer-neutral, practically relevant, and at the same time theoretically sound manner.

Young scientists coming from university also benefit from your certification course. Which background is needed to get the chance to become a qualified data scientist?

Researchers who come straight from university have excellent subject knowledge, especially from their study program, such as computer science or mathematics. What the young scientists are often lacking, however, is a broad overview and the practical experience required to collaborate in Big Data projects. And this is exactly what they learn in our data scientist course. The training is designed for a wide range of applications. They learn how business developers unlock the potential of Big Data in their company, how data engineers describe and integrate data, how analysts use machine learning processes to detect patterns and trends, and how software engineers use modern databases and distributed calculation methods to develop robust and scalable Big Data systems. All this while taking into account privacy and security. The aim is to get basic knowledge in all relevant areas. Those who want can then go on to become certified data scientists.

Book your tickets for the Data Science Summit here:


What is data science? A method for turning data into value

Data science is a method for transforming business data into assets that help organizations improve revenue, reduce costs, seize business opportunities, improve customer experience, and more.


What is data science?

Data science is a method for gleaning insights from structured and unstructured data using approaches ranging from statistical analysis to machine learning. For most organizations, data science is employed to transform data into value that might come in the form improved revenue, reduced costs, business agility, improved customer experience, the development of new products, and the like.

"The amount of data you can grab, if you want, is immense, but if you're not doing anything with it, turning it into something interesting, what good is it? Data science is about giving that data a purpose," says Adam Hunt, chief data scientist at RiskIQ.

Data science vs. analytics

While closely related, data analytics is often viewed as a component of data science, used to understand what an organization’s data looks like. Data science takes the output of analytics to solve problems.

"Data science is coming to conclusions that drive your data forward," Hunt says. "Understanding what your data looks like is analytics, but there's no outcome beyond the data itself. If you're not solving a problem with data, if you're just doing an investigation, that's just analysis. If you're actually going to use the outcome to explain something, you're going from analysis to science. Data science has more to do with the actual problem-solving than looking at, examining, and plotting [data]."

Data science vs. big data

Data science and big data are often viewed as connected concepts, but data scientists don't just work with big data. Data science can be used to extract value from data of all sizes, whether structured, unstructured, or semi-structured.

Big data is useful to data science teams in many cases, because the more data you have, the more parameters you can include in a given model.

"With big data, you're not necessarily bound to the dimensionality constraints of small data," Hunt says. "Big data does help in certain aspects, but more isn't always better. If you take the stock market and try to fit it to a line, it's not going to work. But maybe, if you only look at it for a day or two, you can set it to a line."

The business value of data science

The business value of data science depends on the organization it's serving. Data science could help an organization build tools to predict hardware failures, allowing the organization to perform maintenance and prevent unplanned downtime. It could be used to predict what to put on supermarket shelves, or how popular a product will be based on its attributes.

"The biggest value a data science team can have is when they are embedded with business teams. Almost by definition, a novelty-seeking person, someone who really innovates, is going to find value or leakage of value that is not what people otherwise expected," says Ted Dunning, chief application architect at MapR Technologies. "Often they'll surprise the people in the business. The value wasn't where people thought it was at first."

Organization of data science teams

Data science is generally a team discipline. Data scientists are the forward-looking core of most data science teams, but moving from data to analysis, and then transforming that analysis into production value requires a range of skills and roles. For example, data analysts should be on board to investigate the data before presenting it to the team and to maintain data models. Data engineers are necessary to build data pipelines to enrich data sets and make the data available to the rest of the company.

Mark Stange-Tregear, vice president of analytics at eBates, says it's essential to think in terms of teams, rather than seeking a "unicorn" — an individual that combines non-linear thinking with advanced mathematics and statistics knowledge and the ability to code.

"Data engineering I don't think of as a key data scientist trait," Stange-Tregear explains. "I want someone that actually adds something else. If I can have someone build a model, be able to evaluate the statistics, and communicate the benefits of that model to the business, then I can hire data engineers that are sophisticated enough to take that model and implement it."

The embedded approach to data science

Rather than isolate data science teams, some organizations opt to commingle data scientists with other functions. For example, MapR's Dunning recommends organizations follow a DataOps approach to data science, by embedding data scientists in DevOps teams with business line responsibilities. These DataOps teams tend to be cross-functional — cutting across "skill guilds" like operations, software engineering, architecture and planning, and product management — and can orchestrate data, tools, code, and environments from beginning to end. DataOps teams tend to view analytic pipelines as analogous to manufacturing lines.

"It's not a data science team's job to do data science in some abstract sense," Dunning says. "You want to get value out of that part of the business using data. An isolated data science team might want to deploy the most sophisticated model. The embedded data scientist is going to look for cheap wins that are maintainable. They're mercenary, pragmatic, about solutions they pick."

That said, data scientists aren't necessarily permanently embedded in DataOps teams.

"Typically, there's a data scientist embedded in the team for a time," Dunning says. "Their capabilities and sensibilities begin to rub off. Someone on the team then takes on the role of data engineer and kind of a low-budget data scientist. The actual data scientist embedded in the team then moves along. It's a fluid situation."

Data science goals and deliverables

The goal of data science is to construct the means for extracting business-focused insights from data. This requires an understanding of how value and information flows in a business, and the ability to use that understanding to identify business opportunities. While that may involve one-off projects, more typically data science teams seek to identify key data assets that can be turned into data pipelines that feed maintainable tools and solutions. Examples include credit card fraud monitoring solutions used by banks, or tools used to optimize the placement of wind turbines in wind farms.

Incrementally, presentations that communicate what the team is up to are also important deliverables.

"Making sure they're communicating out results to the rest of the company is incredibly important," RiskIQ's Hunt says. "When a data science team goes dark for too long, it starts to get in a little trouble. Product managers take work for granted unless we're talking about it all the time, selling it internally."

Data science processes and methodologies

Production engineering teams work on sprint cycles, with projected timelines. That's often difficult for data science teams to do, Hunt says, because a lot of time upfront can be spent just determining whether a project is feasible.

"A lot of times, the first week, or even first month, is research — collecting the data, cleaning it," Hunt says. "Can we even answer the question? Can we do it efficiently? We spend a ton of time doing design and investigation, much more than a standard engineering team would perform."

For Hunt, data science should follow the scientific method, though he notes that it's not always the case, or even feasible.

"You're trying to extract some insight out of data. In order to do that repeatedly and confidently, and to make sure you're not just blowing smoke, you have to use the scientific method to accurately prove your hypothesis," Hunt says. "But I don't think many data scientists actually use any science whatsoever."

Real science takes time, Hunt says. You spend a little bit of time confirming your hypothesis and then a lot of time trying to disprove yourself.

"With data science, you're almost always in a for-profit company that doesn't want to take the time to dive deeply enough into the data to validate these hypotheses," Hunt says. "A lot of the questions we're trying to answer are short-lived. In security, for instance, we're trying to find the threat actor tomorrow, not next year — tomorrow, before he can release his threat to the wild."

As a result, data science can often mean going with the "good enough" answer rather than the best answer, Hunt says. The danger, though, is results can fall victim to confirmation bias or overfitting.

"If it's not actually science, meaning you're using scientific method to confirm a hypothesis, then what you're doing is just throwing data at some algorithms to confirm your own assumptions."

Data science tools

Data science teams make use of a wide range of tools, including SQL, Python, R, Java, and a cornucopia of open source projects like Hive, oozie, and TensorFlow. These tools are used for a variety of data-related tasks, ranging from extracting and cleaning data, to subjecting data to algorithmic analysis via statistical methods or machine learning.

"The first tool a data scientist needs is eyeballs and fingers," MapR's Dunning says. "It's very, very common that the simplest things provide value, especially when people are starting. Look critically at very simple aspects of the data. Look for hints about how things work."

Tools will help data science teams extend those eyeballs and fingers.

"You need good visualization tools. Programming tools — Python is an odds-on favorite at this point. You need the tools that will actually build interesting models. You can't survive with just one," Dunning says.

When MapR surveyed its customer data teams, Dunning says, the smallest number of modeling tools used by a team was five, and that didn't even get into visualization tools.

"Things are becoming more polyglot because people are more suspicious. Will this other modeling technique produce a better model?" Dunning says. 

Data science salaries

Here are some of the most popular job titles related to data science and the average salary for each position, according to data from PayScale:

  • Analytics manager: $82K-$120K
  • Business intelligence analyst: $55K-$81K
  • Data analyst: $45K-$68K
  • Data architect: $75K-$152K
  • Data engineer: $63K-$131K
  • Data scientist: $79K-$120K
  • Research analyst: $43K-$63K
  • Research scientist: $58K-$97K
  • Statistician: $58K-$90K

Data science skills

Data science is an evolving discipline, and there are many ways to become involved. While the number of data science degree programs are increasing at a rapid clip, they aren't necessarily what organizations look for when seeking data scientists.

eBates' Stange-Tregear says he looks for candidates that have a statistics background so they know whether they are looking at real results, domain knowledge to put results in context, and communication skills that allow them to communicate results to business users.

"If I've got a data scientist that can do all of those things, then I'll worry about getting that implemented through the data engineering team," he says.

RiskIQ's Hunt is attracted to candidates with PhDs.

"I'm biased toward people who have PhDs, but I wouldn't pass up someone who has a lot of experience," Hunt says. "What a PhD tells me is you're capable of doing very deep research on a topic, and you're able to disseminate that information to others. But having a solid background or personal project is incredibly interesting."

Hunt says he particularly looks for PhDs in physics, math, computer science, economics, or even social science. He wouldn't turn his nose up at applicants with degrees in data science or analytics, but he does have reservations. "My personal experience is I find they're very useful, but they focus too much on the operations of the models and not the mindset," he says.

MapR's Dunning cares less about the letters behind an applicant's name than their ability to show him something new. "What I interviewed for first and foremost [when hiring data scientists] was: Did the interviewee teach me something? I did not want to find people that knew how to do what I knew how to do," Dunning says. "I desperately wanted to find people that could do stuff that I couldn't do, or that could teach the team stuff."

Dunning notes that some of the best data scientists or leaders in data science groups have non-traditional backgrounds, noting that some of the best he's worked with include someone who spent six years working as a gardener before going to college, a person with a background in fine arts, another with a French literature degree, and yet another who was a journalism student and very little formal computer training.

"You want to test people in terms of data perception, not knowing formulas," Dunning says. "You want the ability to look at things and understand them."

Data science training

Given the current shortage of data science talent, many organizations are building out programs to develop internal data science talent.

Bootcamps are another fast-growing avenue for training workers to take on data science roles.

Data science degrees

According to US News and World Report, these are the top graduate degree programs in data science:

  • Master of Science in Statistics: Data Science at Stanford University
  • Master of Information and Data Science: Berkeley School of Information
  • Master of Computational Data Science: Carnegie Mellon University
  • Master of Science in Data Science: Harvard University John A. Paulson School of Engineering and Applied Sciences
  • Master of Science in Data Science: University of Washington
  • Master of Science in Data Science: John Hopkins University Whiting School of Engineering
  • MSc in Analytics: University of Chicago Graham School

Data science certifications

Organizations need data scientists and analysts with expertise in techniques for analyzing data. They also need big data systems architects to translate requirements into systems, data engineers to build and maintain data pipelines, developers who know their way around Hadoop clusters and other technologies, and system administrators and managers to tie everything together. Certifications are one way for candidates to show they have the right skillset.

Some of the top big data and data analytics certifications include:

  • Analytics: Optimizing Big Data Certificate
  • Certificate in Engineering Excellence Big Data Analytics and Optimization (CPEE)
  • Certification of Professional Achievement in Data Sciences
  • Certified Analytics Professional
  • Cloudera Certified Associate (CCA) Administrator
  • Cloudera Certified Associate (CCA) Data Analyst
  • Cloudera Certified Associate (CCA) Spark and Hadoop Developer
  • Cloudera Certified Professional (CCP): Data Engineer
  • EMC Proven Professional Data Scientist Associate (EMCDSA)
  • IBM Certified Data Architect – Big Data
  • IBM Certified Data Engineer – Big Data
  • Microsoft Certified Solutions Expert (MCSE): Data Management and Analytics
  • Mining Massive Data Sets Graduate Certificate
  • MongoDB Certified DBA Associate
  • MongoDB Certified Developer Associate
  • Oracle Business Intelligence Foundation Suite 11 Certified Implementation Specialist
  • SAS Certified Big Data Professional
  • SAS Certified Data Scientist Using SAS 9
  • Stanford Data Mining and Applications Graduate Certificate

Get your tickets for the Data Science Summit here:


Mapping the brain with data science

A visualization of the human connectome. (Purdue University image)

A visualization of the human connectome. (Purdue University image)

Visualizing neural connections gives scientists new insight into neurological diseases

WEST LAFAYETTE, Ind. — Patients with dementia and other neural diseases show physical symptoms such as stumbling and confusion, but identifying the problem isn’t as simple as taking an X-ray. A group of researchers at Purdue University are designing data-driven tools that will help clinicians better understand the progression of neurodegenerative diseases by identifying and tracking changes in the brain.

“We’re not to the point where we’re taking X-rays to see if you have a broken bone in your leg, but we’re at least at the stage where we’re saying, ‘Your gait is very funny,’” said Tom Talavage, professor of electrical and computer engineering and biomedical engineering, and a co-investigator for the project. “We can narrow it down to something wrong with your leg, and we can make inferences about what’s wrong with your leg. We can say, ‘You probably have a broken leg because of how you’re walking.’ That’s what we’re really getting at.”

The project is led by Joaquín Goñi, an assistant professor of biomedical engineering who studies the network of neural connections composing the human brain. This network is called the connectome, the focus of an emerging field of study known as brain connectomics. Brain-imaging techniques, such as diffusion weighted imaging and functional magnetic resonance imaging (fMRI), allow neuroscientists to model and examine the connectome to understand communication between different regions of the brain. This helps them see which parts of the brain are functioning normally - and which regions are not - by observing changes over time.

“What we’re really doing is starting to create the means to see symptoms of neurodegenerative diseases; they become physical in the graphical approaches we apply. It’s very visual,” Talavage said.

The team is using data from the Human Connectome Project, a collection of data sets from different projects focused on the connectome, to research and develop their method. Goñi and postdoctoral researcher Enrico Amico recently published a paper in Nature Scientific Reports using data from the Human Connectome Project to propose a data-driven method for assessing the connectome. Their work demonstrates that individual connectomes are unique enough to be identifiable – and could potentially be used to better understand the differences between people’s individual connectomes and how this relates to their health.

Their project uses brain-imaging data that shows both “structural” and “functional” connectivity – physical connections between different regions of the brain, and the communication between those regions as a subject performs a given task. Making sense of an individual person’s network of connections and understanding their cognitive health depends on images of brain activity collected during these tasks.

“When my brain is performing activities, is it more consistently, uniquely me? Or does the task actually make me look like everybody else?” Talavage said. “The end results indicate that we are more uniquely ourselves even when we’re performing the same task.”

Taking these detailed pictures of the connectome can be a challenge for neuroscientists. Often the neuro-images are affected by visual noise, which makes it more difficult to make sense of the data.

“We want accurate representations of both structural and functional connectivity. Individual estimations of brain connectivity are quite noisy, meaning that they are not necessarily accurate in representing the subject’s characteristics,” said Goñi, who is the principal investigator for the research project. “We know that if we take the average of all those mapping connections, we get fairly good representations of the human connectomes.”

One of the areas their research explores is to what extent the individual’s neural connections resemble each other because of genetics. Goñi and his colleagues are using the Humane Connectome Project’s data from pairs of identical (“monozygotic”) and fraternal (“dizygotic”) twins to see if genetic resemblance plays a role. If their method incorrectly identified one monozygotic twin as their identical counterpart, the team could then correct this mistake and improve the reliability of their results with each test.

“Joaquin is working on a large study with twin data to document whether the twins, who were monozygotic versus dizygotic, looked like one another in terms of brain connectivity -- and of course, they do.” Talavage said. “But they are still able to pull out the individual twin from the other twin. Even two identical twins do not look closer to one another in the end.”

Although the result did not surprise the team, it did reassure them.

“I don’t think it’s particularly shocking,” Talavage said. “But at the same time it is a very important illustration about the subtleties that he can start pulling out. Yes, you look more like your twin than everybody else, but you’re still differentiable.”

Ultimately the team hopes to release a proven tool for assessing the human connectome that will help clinicians track disease progression and neural health.

“In many diseases, it’s not like there is a unique progression of the disease that could be assumed for all the patients,” Goñi said. “Having individual, accurate structural and functional connectomes is crucial for better understanding how the progression is affecting different parts of the brain, different subsequets and to what extent how much these are affected. These have implications in behavioral and neuropsychological evaluations.”

“It’s really just allowing us to look at the overall functioning of the brain in a very concise manner,” Talavage said. “Then we can assess the state of the brain and the trajectory it’s following.”

The team’s project was one of eight that were chosen for Purdue’s Integrative Data Science Initiative after a competition for funding. The initiative will advance Purdue’s data science-related research and education by funding projects that target four focus areas: health care; defense; fundamentals, methods, and algorithms; and ethics, society and policy. The projects were chosen to integrate data science programs with campus-wide courses and curricula, and are hosted by Purdue’s Discovery Park.

The interdisciplinary research team is composed of four faculty members from several engineering departments at Purdue. Other members of the team include Yunjie Tong, assistant professor of biomedical engineering, and Mario Ventresca, assistant professor of industrial engineering. After the conclusion of the two-year funding period with the Integrative Data Science Initiative, the team hopes to work with clinicians on implementing their tool.

“When I came three years ago to Purdue, one of my most ambitious aims was that we get Purdue to be known for our work on brain connectomics,” Goñi said. “I really think that thanks to this project, we have a chance to make that happen. In my name and in the name of all the other team members, we are really grateful for this opportunity and this project. We are super excited to move forward.”

Book your tickets for the Data Science Summit here:


What Does Data Science Really Mean? More Rescued Kittens


When data scientist Joanne Lin was looking for patterns in adoption data from the Austin Animal Center, a surprising result jumped out: cats with names are much more likely to find a home than those without names. In fact, having a name turns out to be one of the most important determinants of whether a cat is adopted, second only to being spayed or neutered. Having a name is more important than how old the cat is or the color and pattern of its coat.  

I love this story, as it shows the power of data scientists to effect change. Joanne’s discovery that 63 percent of cats with names are adopted from the shelter while only 17 percent of cats without names enjoy the same positive outcome is incredibly helpful information that can be put to use immediately by shelters everywhere to improve outcomes for cats in their care.

Data science is more than just the buzzword de jour. The discipline underlies some of the most significant social and economic drivers of our age, and mastery of its concepts opens fantastic opportunities to the rising generation of tech workers. It’s estimated that by 2020, there will be 2.7 million jobs in the field. Data scientists are among the country’s best paid and happiest workers.

But the challenge to meet the demand is daunting

First, most data science roles require some form of post graduate education. While there is some debate within the data science community about the necessity of these programs, the proof lies in the job descriptions of current openings. Of the 80+ available entry level data science science jobs in Atlanta, for example, nearly every position requests more than a bachelor’s degree. 

Compounding the issue is that few have the prerequisite skills necessary to enroll in those post graduate programs. The typical masters or bootcamp student must know at least one programming language and have an understanding of statistics and probability. It’s estimated that there are about 1.26 million software engineers working in the U.S. today. If every single one of those folks decided to enroll in a post graduate data science program, we’d still come up more than 50 percent short.

And finally, few can actually define Data Science. The discipline is most commonly associated with the marketing machines of tech giants like Facebook and Google, and recent events like the Cambridge Analytica fiasco reinforce a perception of data science as a tool for manipulation. That’s why we at Thinkful launched the WTF is Data Science project, a free resource that presents the discipline in a friendly and accessible format. It’s also why we highlight our students’ work, like Joanne’s, or that of another graduate whose capstone project used neural network techniques to investigate improvements in the identification of cervix types which could lead to improvements in the treatment of precancerous conditions. 

In May my husband and I adopted our first cat — and yes, the shelter had named her. I’m glad they did: I wonder how we can use data science to get more cats into owners’ arms, where they belong!

Book your tickets for the Data Science Summit here:


Malaysia makes good progress in building analytics talent

The country’s talent development and funding schemes have brought the benefits of big data to many organisations, but the quality and volume of its analytics talent can be improved


Malaysia has made good progress in building up its big data analytics talent pool, but more can be done to improve the volume and quality of its talent, according to a new report by IDC.

In its Demand and talent review 2018, IDC lauded Malaysian-driven initiatives such as the ASEAN Data Analytics Exchange (Adax), a regional platform that brings together talent and development models and showcases the latest analytics technologies.

Since its inception in 2017, Adax has helped to train 1,800 people from 298 companies across 19 industries as data practitioners, data managers and data leaders. Malaysia has set a goal of nurturing 20,000 data professionals by 2020.

IDC also singled out Malaysia’s Human Resources Development Fund (HRDF), which has evolved to becoming a one-stop centre for small and medium-sized enterprises to build up their big data analytics capabilities.

According to HRDF, Malaysia’s workforce is 15 million strong, but less than 30% of employees have adequate access to the training needed to upskill and reskill for the changing nature of work. Also, about 70% of Malaysian workers are currently not covered by any structured training programmes.

Training an extra percentage point of a company’s workforce translates through to a one percentage point increase in productivity among Malaysian companies. In fact, HRDF-registered firms show a 3% increase in productivity, according to HRDF analysis.

“With the initiatives currently in progress, many organisations, both on the demand and supply sides of the big data analytics and AI [artificial intelligence] ecosystems have benefited in terms of investment, talent, advice and funding,” said IDC.

To close its talent gap further, Malaysia could follow the example of strategies employed by countries such as China and India, said IDC.

“China, in particular, was seen to be on ambitious talent programmes designed to bring in a massive number of fresh graduates trained in technical areas of data science roles,” it said, adding that Malaysia could consider similar programmes, although not necessarily on the same scale.

India, too, has embarked on similar initiatives and Nasscom, the country’s national IT trade body, has proposed that engineering institutes include big data and data analytics in their courses.

But with more employers looking beyond just technical skills, IDC said any review of existing academic curricula in tertiary institutions should include not only industry-specific expertise, but also business skills such as consulting.

“This will help bring the courses and modules into line with the demand from industry sectors and will also motivate students to take up similar courses in higher volume,” it said.

According to IDC, global spending on big data analytics and cognitive and AI systems was estimated at $159bn in 2017. This is expected to increase global GDP by up to 14% by 2030 because of the accelerating development and take-up of AI, a recent PricewaterhouseCoopers report revealed.

Book your tickets for the Data Science Summit here:


Ethical Data Science Is Good Data Science

Alexander Supertramp/Shutterstock

Alexander Supertramp/Shutterstock

There’s no doubt about it: The future will be machine driven, and central to this future are the advanced algorithms, which are fueled by the data they’re trained on. Every ad you see, every car driving itself, every medical diagnosis provided by a machine will be based on your data – and lots of it.

Without your data, we inherit a world without machine learning, and most would argue that companies without machine learning will fail. At least that’s where we’re heading; it sounds like a big problem, and it is.

Concepts around “big data” are completely incompatible with how people expect their data to be protected and how laws are shaping those protections. In fact, the GDPR, a data privacy regulation enacted in the EU, treats your data as if it’s an extension of your body.  And more regulations like GDPR are coming.

One of the key tenets of the GDPR, and of the new wave of data regulations, is limiting data usage to specific purposes. The GDPR does not simply restrict how or what data is collected, it restricts how the collected data is being used.

The GDPR requirements show us a pathway that can actually translate into better data privacy protections, and ultimately, better data science. Put simply, GDPR is a manifestation of the data governance initiatives all organizations should have been doing all along. Building data governance across machine learning activities will accelerate innovation – not stifle it.

So where does this leave organizations building products based on user data – and organizations running their businesses with algorithms powered by user data? Is the Facebook fiasco the beginning of the end of data-driven initiatives? What are the lessons and steps we can take in reaction to this?

The utility of the show on the Hill is that it could start a real conversation on how to protect both the innovations driven by algorithms and consumer’s privacy when it comes to their data. Here are three steps that businesses – and the technology companies supporting them – need to take:

End the Data Usage Free-for-All

Just because you’re able to collect massive amounts of data does not mean that every user in an organization should be able to use and touch all aspects of that data. GDPR terms this “privacy by design”, but I term it common sense.

Should all your data scientists see all your data subjects’ Social Security number when they’re building a fraud analytic? No.

When you work with 3rd parties, where your data is “better together,” should you share it all? No.

This means enforcing fine-grained controls on your data. Not just coarse-grained role-based access control (RBAC), but down to the column and row level of your data, based on user attributes and purpose (more on that below).  You need to employ techniques such as column masking, row redaction, limiting to an appropriate percentage of the data, and even better, differential privacy to ensure data anonymization.

In almost all cases, your data scientists will thank you for it. It provides accelerated, compliant access to data and with that a great deal of comfort, freedom, and collaboration that comes when everyone knows they are compliant in what they are doing and can share work more freely. This freedom to access and share data comes when data controls are enforced at the data layer consistently and dynamically across all users. It provides the strong foundation needed to enable a high performing data science team.

Purpose Based Restrictions Gives Accountability

All use of data should have a purpose – and data usage should be restricted to those purposes. This is critical to modernizing the way data is governed. While seemingly incompatible with the promise of big data, purpose-based restrictions represent the future of privacy.

This also creates a new level of accountability across an organization – providing a broader understanding of how – and why – data is being used.

Monitoring Your Models

Last is the importance of “bookkeeping” for machine learning. This sounds simple, but in practice is more difficult than it seems. Keep track of all the data that is going into all your models. Models need constant monitoring, but this should not be limited to the output. It also includes the inputs and understanding the risk and regulations associated with that data. 

Imagine if you could monitor, similar to a infrastructure dashboard, the risk associated with all your production models based on the data that went into training them. Or understand data value across your organization based on how often it’s being leveraged for predictions. Sounds useful – and it is! Equally important, this type of approach lets you manage change more easily. If a policy on how you can use data changes, this lets you understand what models it impacts and react appropriately.

The future of data privacy is about more than letting consumers know how and where there data is collected. Data-driven companies and the developers of tomorrow’s technology need to think beyond privacy checkboxes and build technology that allows us to manage how our data is used.

As the Facebook circus shows, there may not be a penalty now for how you’re using data, but the oncoming GDPR regulations will change everything. It’s simply a matter of time before the U.S. enacts similar regulations. Getting ahead of it now can save a lot of heartache tomorrow. You might literally not be able to afford to wait.

Book your tickets for the Data Science Summit:


Decoding Shopper Behavior with Data Science

Can data science and calculated marketing strategies really keep up with the fickle emotions of shoppers?

There’s no formulaic approach to help retailers predict the whims of consumers. But there are some behavioral patterns every business must know if they want to cover all bases, and avoid missing out on a potential POS. AI cannot read someone’s mind, but it can be taught to learn what triggers purchases and why we shop the way we do today: irrational decisions sometimes rationalized by days of comparative research and sometimes the morning after an impulse check-out at 1am.

Turns out, our brain and heart both matter when it comes to retail decision-making. Experts say, “we buy on emotion then justify with logic.” This interesting fact raises a loaded question: How do retailers design their merchandising and marketing strategies to fit into this pattern and convert into sales. Achieving the balance between emotions and rationality can help companies achieve a boost in sales and loyalty.

Irrational Science

Physical places may trigger impulse purchases more than you would’ve thought. For that matter, any scenario-virtual or physical-that gives a sense of urgency while overwhelming the shoppers’ emotions (in a positive way) is a key trigger to fuel purchases.

While traveling, we often feel the pressure of time sensitivity and a sense of compulsion to buy from airports’ Duty-Free shops. A fear of missing out (FOMO) kicks in, combined with confined spaces and lack of time that seems to make us buy something, anything. There’s no scope for doing a two-hour session of price verification across all stores online as we would do at home.
The travelers’ propensity for boredom as they await their flight, makes the duty-free atmosphere appealing, which induces consumers to buy. Research confirms time pressure at duty-free shops can increase impulse purchases, satisfaction, and repurchase intention.

Everyday FOMO exists, too. For instance, teens wanting to get the next cool thing are already shopping Snapchat & Instagram stories from sellers/brands they’ve never heard of, The ‘new and now, limited offer’ looms around every 5-second influencer fad.

If the makeover booths at beauty outlets are any proof, the appeal of magically transforming into a better version of themselves tugs at people’s emotions more than a $2 million billboard budget ever did.

Physical Presence

Consumers make an effort to shop in store for personal service during the buying process and the immediacy of being able to take home their desired product. In one study, nearly 80% of respondents said instant gratification was the key benefit to buying in person; 75% appreciated human connection of shopping face-to-face, including social shopping with friends.

Consumers prefer to buy the less familiar products in store to see and feel, and reduce the risk of having to return it. What this implies is that in-store options are weighed for a lot of factors, vs. online where the thumbnail’s appeal and the price can overpower all other aspects of the product.

For relatively riskier purchases – new, unfamiliar or expensive products – generous, hassle-free returns policy can help. Easy returns when coupled with rave reviews can reassure shoppers and convince them to buy. Categories that consumers prefer to buy in-store include: automotive, major appliances, hardware, jewellery and electronics. In short, high-value, risky purchases that need demos, product warranties, and an accommodating return policy. Does that mean perishable products like chocolates and fresh flowers are best sold online? What about check-out lane products like mints and single-serve snacks?

Of Britain’s 7 million online grocery shoppers, only 12 percent visit a confectionery-related webpage, and just half of those actually buy anything there, according to Kantar Media.

Not surprisingly, nonperishables enjoy the most online purchasing. Confectionary and biscuits were at less than 1 percent.

Is this where physical stores can tempt shoppers? Perhaps with the right sensory and emotional triggers, yes.

Pop-up shops & Flea markets can play into consumers’ desire for thrilling experiences and great deals. The pop-up industry has grown to approximately $10 billion in sales, thanks to unique and fun experiences, and localized assortments. Again, FOMO kicks in, as “Customers are attracted to exclusivity. They’re attracted to a ‘here today, gone tomorrow’ type of concept.”

The Left Brain

Online shoppers crave the convenience of saving time and effort, as retailers cater to them by bringing the store to their front door. These shoppers prefer to buy familiar, branded items with qualities they can reasonably predict. Customers buy online because they expect abundant variety, transparency about inventory levels and the ability to research prices, customer reviews and promotional offers.[vii]

Categories that consumers prefer to buy online include: books, toys and games, and entertainment. In Q3 2017, online orders placed from a desktop device had an average value of $84 US. During the same period, online orders placed from a tablet had an average value of $102 US.

Competitive considerations: Retailers must know their rivals. For instance, retailers selling a Herman Miller chair must know which e-commerce rivals offer exact and similar matches – and at what prices. (If you prefer, I can switch the image and caption to show product reviews as a rational basis for purchase decisions.)

To appeal to consumers’ logical side, Amazon uses a rather rational, information-driven approach. The company sells nearly 500 million products spanning multiple categories, at various prices. Retail companies of all sizes can appeal to consumer’s logic. For instance, they can list product benefits, like efficiency, and add reviews to convince consumers of a product’s reliability.

For instance, consumers want to know: Is the company legitimate, reputable and likely to deliver on time? Is the product well-reviewed, good quality and able to satisfy their needs? How consumer-friendly is the return policy?

Of course, consumers also care about the price of product and whether there’s free delivery. The total basket value is very important, including taxes, shipping, and handling. In one study, 93% of respondents said free shipping encouraged them to buy more online. That’s because they felt secure and successful at getting a great deal.

Savvy consumers take the pricing challenge even further, scouring the web to score the best deal. They search for online coupon codes on such sites as Groupon, SlickDeals to minimize the price of their desired product.

Figure vs. Value

While every customer has some degree of price sensitivity, the most price-sensitive consumers are exceptional comparison shoppers. Their rational behavior includes researching multiple websites to discover the lowest possible price before they buy.

There’s also a link between customer satisfaction and price sensitivity. To convince a price-sensitive customer to pay slightly more, retailers must add value with superior customer service. This can include online support, fast delivery, and convenient return and exchange policies.

In summary, retailers must appeal to consumers’ heads, hearts and wallets. By offering rational reasons for consumers to buy a product, such as quality, efficiency, and reputation you can lure the most rational of comparison shoppers. By appealing to untapped emotions at a POS, such as a sense of urgency, a feeling of exclusivity, and sheer excitement you can make the same shoppers find a justifiable reason to buy anything.

Get your passes for the Data Science Summit:


Northwestern Mutual partnering with UWM, Marquette to start data science institute

If Milwaukee is to grow and keep young workers with skills to turn big data into insights that companies can use to make better decisions, the city needs to step up its game.

Northwestern Mutual Life Insurance Co. said Wednesday it expects to pay out more than $5.3 billion in policy owner dividends in 2018. Rick Wood / Milwaukee Journal Sentinel

Northwestern Mutual Life Insurance Co. said Wednesday it expects to pay out more than $5.3 billion in policy owner dividends in 2018.
Rick Wood / Milwaukee Journal Sentinel

That's why Northwestern Mutual Life Insurance, Marquette University and the University of Wisconsin-Milwaukee on Wednesday announced they are launching a Northwestern Mutual Data Science Institute.

It's touted as both a cutting-edge partnership for the Midwest and a step toward advancing Milwaukee as a hub for technology, research, business and talent development — something Northwestern Mutual chairman and CEO John Schlifske has been talking about for months.

Data scientists are in big demand, and west coast companies now have a major advantage in attracting college graduates with those skills.

Over the next five years, Northwestern Mutual and its foundation will contribute $15 million to support an endowed professorship at each university, expand the universities' curricula around data science, fund research projects and develop learning opportunities for K-12 students, according to a company news release.

Northwestern Mutual also will bring students and professors to its downtown campus by providing classroom, event and office space in its new Cream City Labs — a 17,000-square-foot innovation lab under construction in the 733 N. Van Buren St. building.

For their part, Marquette and UWM will each invest about $12 million in data science education and research by existing faculty.

"Regions thrive when organizations creatively address common challenges from different perspectives," Marquette President Michael Lovell said.

"The physical proximity of UWM and Marquette to Northwestern Mutual is a win-win," said Karl Gouverneur, vice president of digital workplace, corporate solutions and head of digital innovation at Northwestern Mutual.

Not only will data science students and professors have a chance to help Northwestern Mutual solve business challenges and identify opportunities for growth, the company will help guide what college students learn about data science and how it's applied in the real world.

When students graduate, they'll be well-rounded potential employees for Northwestern Mutual and other companies in Milwaukee that use data to help inform decisions, including Robert W. Baird, Johnson Controls and Rockwell Automation.

What is data science? 

The volume of data collected in today's world is so huge, it's beyond the capacity of humans to analyze without the help of a machine. The average Facebook user knows algorithms are used to determine their interests, and guide targeted advertising their way.

Businesses like Northwestern Mutual harness data to help them make better-informed decisions, and to identify promising growth areas. Big data plays a role in fields ranging from marketing and ecommerce to security, image processing and genetic testing.

Data scientists create complex computer models with algorithms to find patterns that would otherwise be missed by humans. They use predictive analytics, artificial intelligence and machine learning.

Big data can be used to help describe, explain or solve a problem. It can make predictions, recognize speech and faces, improve drug development, and support investment and business decision-making.

UWM data scientists work in disciplines across six of the university’s schools and colleges. 

Marquette three semesters ago launched an undergraduate degree in data science, and so far has 30 students. The university expects to grow that number to 50 students in the next couple of semesters. Marquette also has master's and doctoral programs that train students for research and data science.

Why businesses need data scientists

Data scientists not only understand mathematical algorithms and computer coding, they apply scientific research techniques.

They also are expected to understand the needs of the business that employs them to hone in on a question or problem, and explain and visualize whatever the data tells them. 

Confidentiality of clients is protected in the use of any data collected, Gouverneur said. If it's collected by Northwestern Mutual, all identifying information is wiped, and clients are asked for permission first.

Northwestern Mutual formed its first formal data analytics team as part of its IT department in 2013. It's evolved into an Enterprise Data & Analytics department.

The company has about 5,600 employees; about 400 of them are Marquette alums and 900 are UWM alums.

Universities across the country are starting data-science programs to meet growing demand.

How the idea got started

Gouverneur said the concept for the institute was first floated about 18 months ago as part of the larger discussion about the need for Milwaukee to become a technology hub.

Milwaukee has ranked second-to-last, ahead of Pittsburgh, for the third straight year in startup activity as measured by the Ewing Marion Kauffman Foundation, one of the country’s leading entrepreneurship advocacy and research organizations. Wisconsin ranks 50th among the 50 states.

Last October, Northwestern Mutual and Aurora Health Care announced they would each commit $5 million to venture funds that will invest in startup companies in the Milwaukee area.

The two new funds — Northwestern Mutual's Cream City Venture Capital and Aurora Health Care's InvestMKE — could spur other large companies in the Milwaukee area to set up similar funds, company officials said.

That announcement was followed In November by another that Northwestern Mutual would partner with Rockwell Automation, Kohl's, Baird and Milwaukee Institute to provide funding and other services to support early-stage startup companies based in Milwaukee. 

That partnership will cover operating costs for gBETA, a seven-week mentoring and coaching accelerator program that will be held multiple times at UWM in 2018 and 2019. The program is led by gener8tor, a startup accelerator.

Book your tickets for the Data Science Summit  here:


6 Important tips to kickstart your career in Data Science


In a world dominated by data, Data Science is the ladder to building a promising career in unique and challenging job positions. Kickstarting your career in Data Science is now easier than ever thanks to the vast pool of online platforms offering Data Science courses. These courses are specially designed to walk you through the concepts and intricacies of Data Science.

But, do you know the exact way to climb the ladder? Fret not, for we’re here to show you how!

So, let’s begin, shall we?

1. Decide – What Role Would Best Fit You?

Data Science industry is a dynamic one, and hence, it demands a lot of varied roles. There are data scientists, data analysts, data engineers, statisticians, machine learning engineers, and so much more. Do your groundwork and research about the kind of skill set and job responsibilities each of these job positions demands. Choose the one that is closest to your background and professional experience.

If you feel overwhelmed and confused by the choices, you can always take help from outside. Try connecting with professionals in the industry and ask them about their job responsibilities and requirements. Find a trusted mentor and seek his/her guidance. This will help you gain an outsider’s perspective and help you choose the role that’d best fit you.

2. Choose – A Specialization Course

Although often the job responsibilities get blurred in the field of Data Science, each job has its distinct set of skill and specialization requirements. For instance, data scientists and data analysts both have to be extremely well-versed with high-end programming languages like R, Java, Python, Scala, JavaScript, etc. However, a data scientist must have a strong mathematical and statistical background whereas a data analyst needs to have excellent data mining and visualization skills to extract meaningful information from a vast amount of raw data.

So, now that you have sorted out the ideal job role for yourself, you need to find a data science course that caters to the specific demands of that job role. There are many helpful MOOCs that you could choose from.

3. Develop – Technical And Analytical Skills

If you choose to leap into the world of Data Science having the right set of technical and analytical skills is a must. SAS and SPSS are excellent open source platforms for nurturing your analytical skills, whereas open source tools like R and Python are great options for coding.

If you are a rookie at coding, we suggest you start off with GUI-based tools and then move on to sophisticated programming languages like Java, R, and Python. When working with these tools, always remember that the most efficient way of learning is through a hands-on approach. So, try to write your own codes and run them on dummy data sets. This way you’ll not only learn faster, but you will also get better with time.

Apart from programming languages, a good data professional is expected to have the basic know-how of SQL and MySQL.

4. Fortify – Your Statistics And Machine Learning Foundation

Dealing with enormous amounts of data on a daily basis and extracting meaningful information from it requires a strong statistical and mathematical background, especially for a data scientist. You will be awestruck and inspired to see what your statistical knowledge can do when combined with the right ML tools and algorithms.

The incorporation of statistics and ML algorithms to the emerging field of Data Science allows data analysts and scientists to extract information faster and obtain better results from Big Data projects.

Keeping this in mind, you should build on your statistical skills and knowledge and as for Machine Learning you can start with basic concepts such as normal distribution, hypothesis testing, central limit theorem, and move your way up to more advanced ML techniques like linear regression, logistics regression, decision trees, cluster analysis, to name a few.

5. Read – About The Business Applications of Data Science

Data Science is ever-evolving, with newer applications and technologies cropping up from time to time. Thus, to stay relevant in the field of Data science, it is essential that you continually assimilate knowledge about the changing dynamics of Data Science.

Being a Data Science professional is not just about possessing the requisite technical knowledge, you must also have extensive knowledge about the business implications of Data Science. Data scientists, data analysts, data visualizers all have something in common – to extract meaningful patterns and trends from massive chunks of data and use that information to transform businesses for the better. Thus, as a data professional it is expected that you can decide what kind of data approach should you take to solve a particular business issue.

Reading about business applications of Data Science such as how cluster analysis can be utilized for segregating customer segments; how market basket analysis helps retailers in product bundling, how logistic regression is a resourceful tool for fraud detection, and so o, can help you understand how the tools of Data Science function in the business landscape.

6. Invest – In Networking

Once you’ve gotten the hang of all the basics of Data Science, it’s time to expand your network. Data Science conferences, industry events, and tech meets are where you should be. These meets are nothing less than a talent pool, clubbed in one place. You can not only learn from the leaders in the industry, but you also get a chance to interact with peers and mentors and engage in fruitful conversations.

Also, the more events you attend, the more involved you become with the Data Science community. The contacts you make here might open the doors to new possibilities and career opportunities for you in the future.

As the demand for skilled and expert professionals in Data Science is on the rise, the key to success would be to follow a structured approach to the field. By following these steps, you can build the right set of skills that are demanded of Data Science specialists and with time, you can emerge as one of the best in your field of expertise. As Thomas Edison had wisely stated,

Genius is 1% inspiration and 99% perspiration.

So, sweat it out now and make your way to the top!

Book your tickets for the Data Science Summit here:


Organizations Striving To Close The Data Science Skills Gap

Image courtesy of Shutterstock

Image courtesy of Shutterstock

Big data is undoubtedly one of the hottest trends of our age, and the promise of the fundamental transformation possible as a result of this enormous amounts of data is considerable.  For many, however, the promise remains just that, with numerous barriers holding them back, whether it's a lack of board-level buy-in or poor quality data.

Arguably the most substantial drag on our efforts, however, has been a lack of skills.  It's a situation that is likely to see companies aim to triple the size of their data science teams in the next few years.  That's the finding of a recent paper from ESADE researchers.

The researchers examined over 100 Spanish companies from across a range of sectors, most of which had over €200 million in turnover.  The results revealed the long way we still have to go before data is at the heart of organizational behavior.

Slow progress

Despite big data being technologically feasible for several years, over half of the organizations revealed that they are yet to have a culture of data-based decision making, whilst 40% admitted that they don't have a specific leadership role for data.

This reticence is important, as the study found that companies with a more analytical culture performed better than those without.  This was reflected in both their financial performance and the perception of staff at the companies.  Indeed, some 78% of companies who were regarded as very analytical thought that this culture had a significant impact upon their performance.

The study found that data professionals tended to fall into one of two categories:

  1. Data scientists, who tend to perform advanced analyses.
  2. Data managers, who provide the business vision to connect these analyses to the strategy of the business.

The typical data team would have between 5 and 20 members, but pretty much every organization reported finding it difficult to find the talent they needed.  Despite these recruitment challenges, the majority of organizations wanted to considerably increase the size of their data teams in the next three years, with three times as many data scientists and 2.5 times the number of data managers.

Train or recruit?

The desire for data science skills is clear, but this study suggests that most companies want to hire in external talent, or in other words the finished article.  This strategy would be fine except by all accounts, that talent isn't currently existing in the marketplace, so there appears to be an inherent hope that external bodies will train people for them.

I've written previously about a similar issue when it comes to artificial intelligence skills, and data science and AI are so intertwined that the same surely applies.

Rather than attempting to hire in the finished article in an increasingly barren marketplace, companies are surely better off investing in data-science training and therefore upgrading their existing talent pool.  This approach has numerous advantages, not least of which is raising data skills across the board at a time when a growing number of organizations are attempting to democratize data science capabilities across the workforce rather than concentrate it within a data science function.

Organizations can achieve quick initial results by identifying employees with existing programming, analytical and quantitative skills and augmenting them with both the latest data-science skills and access to powerful tools, such as Python and Hadoop.

Spreading the availability of data education across the business, into marketing, finance, engineering and various other functions provides data literacy to people from various backgrounds.  This in turn will help to spread the data-driven culture that data advocates so crave.

A good example of this in practice is the Data University that Airbnb has created to provide anyone who wants to learn about data an opportunity to do so.  Already the company has trained over 500 (or one-eighth of the workforce) employees, with dividends already being reaped in the shift towards data-based decision making.

There has never been a better time to invest in the skills and talents of your workforce, with data promising to transform functions and processes throughout organizations that are already experimenting with a range of data science and machine learning initiatives.  Expertise is the principle barrier holding these back, so now really is the time to invest in the training that will bridge that gap.

Get your tickets for the Data Science Summit here:


Analytics and data science industry in India growing at CAGR of 33.5 per cent, says new study

The study suggests that a sizeable 22% of the total revenue generated can be attributed to big data; whereas advanced analytics, predictive modelling and data science togethe contribute 11%.

BENGALURU: According to a new study by Analytics India Magazine in association with AnalytixLabs, the data science, analytics and big data industry generates over $2.71 billion in revenues annually.

It is also estimated to be growing at a healthy rate of 33.5% Compound Annual Growth Rate (CAGR).

The study suggests that a sizeable 22% of the total revenue generated can be attributed to big data; whereas advanced analytics, predictive modelling and data science together contribute a total of 11%.

Just like last year, the maximum revenue from analytics exports comes from the US, amounting to 64% of the revenue generated, which has increased by 45% year-on-year. 

UK comes a distant second at 9.6%, and only 4.7% of analytics revenues are coming from Indian firms.

According to the study, banking and finance continues to be the largest sector being served by analytics in India, contributing about $1 billion in revenues.

It is followed by marketing and advertising, e-commerce, and others.

Travel and hospitality industry saw the biggest jump in analytics revenues, from $34 million to $54 million, a jump of 61 per cent.

In terms of cities, a sizeable $759 million comes from Delhi and NCR, which is followed by Bengaluru at 27% per cent.

The study suggests interesting numbers in terms of work experience of analytics professionals.

The average work experience of analytics professionals in India is 7.9 years, which is up from 7.7 years from last year.

Also, 16,000 freshers were added to analytics workforce in India this year.

Of the total analytics professionals, almost 40 per cent in India is employed with large-sized companies, which is followed by 32 per cent and 28 per cent respectively, for mid-sized organisations and startups.

Sumeet Bansal, Founder and CEO, AnalytixLabs said, "A thriving CAGR of over 33% reaffirms the spike of data science adoption we have seen in last one year. Recently we have seen strong demand even from sectors like manufacturing, infrastructure, power and energy, which traditionally used to have relatively lower analytics penetration."

Addition of 16000 fresher candidates in analytics workforce is another beacon of sustained growth and encouraging trend for data science professionals.

Bhasker Gupta, Founder and CEO, Analytics India Magazine said, "The numbers suggest that analytics and data science industry in India is growing at an exponential rate, with the industry expected to grow seven times in the next seven years."

Startups have contributed significantly to the overall output of analytics in India.

Also, though small in absolute terms, the overall impact has increased significantly with small to midsize organizations in India. Overall, the study paints a positive picture for Indian analytics industry and suggests that startups and large-sized companies contribute to the overall output of analytics in India.

Book your tickets for the Data Science Summit here:


CAES launches Certificate in Agricultural Data Science

Photo by Clint Thompson

Photo by Clint Thompson

From remote moisture sensors that produce a real-time feed of soil conditions to drones that use optical data to spot plant disease, new streams of data will fuel the next green revolution.

Remote sensing technologies will offer farmers the ability to customize irrigation and fertilizer applications for areas that have unique characteristics within fields, which will reduce ecological impacts and costs. However, putting precision agriculture strategies into practice requires agricultural scientists who are equipped to interpret the data that these sensors generate.

In fall 2018, the University of Georgia College of Agricultural and Environmental Sciences will launch an Interdisciplinary Certificate in Agricultural Data Science to equip CAES graduate students with the data analysis expertise that they will need to capitalize on this big data revolution.

“In other disciplines—business and health care—programs that are focused on data science have already taken off,” said Harald Scherm, professor and head of UGA’s department of plant pathology. “But there is no such formal program in agricultural data science. We think there is a need for that.”

CAES’ certificate program will be one of the first of its kind in the nation.

CAES faculty have heard from students, researchers and employers that there is a need for data analysis expertise in agricultural research and applied agricultural science, said Scherm, who worked with colleagues in the UGA statistics and computer science departments and in the UGA College of Engineering to develop the certificate program.

Through the certificate, current and future CAES graduate students will plan a schedule of elective and related courses that will complement their agricultural research and expose them to a wide range of principles and practices of data analysis.

“The goal of the graduate certificate is to develop a curriculum that will produce cross-disciplinary and cross-functional, data-smart graduates who can bridge the gaps between the generation, analysis and interpretation of complex data in the agricultural field,” Scherm said. “We’re not looking to train computer scientists, but we want them to be able to discuss data issues and incorporate analysis into their practice.”

A summer 2017 survey of CAES graduate students showed that almost 90 percent were interested in the certificate program, and almost 50 percent said they were definitely interested in learning to integrate big data science into their disciplines. The certificate program will be open to all graduate students at UGA but will be most helpful to those studying agriculture or environmental sciences, Scherm said.

CAES’ Interdisciplinary Certificate in Agricultural Data Science will leverage UGA’s strength in agricultural research and UGA’s campus-wide informatics initiative to build a reputation as a leader in agricultural data science, Scherm said. Elective courses will be drawn from four colleges (CAES,Franklin College of Arts and Sciences, Warnell School of Forestry and Natural Resources and the College of Family and Consumer Sciences) and two institutes (GII and Institute of Bioinformatics).

Many areas of agricultural research and practice generate big data streams, from consumer analytics to crop modeling, statistical genetics and precision agriculture, among others. Precision agriculture refers to farming in which data, collected from an ever-expanding array of sensors ranging from satellites to soil-moisture sensors, helps farmers decide how to vary the application of agricultural inputs like irrigation, pesticides and fertilizers within a field to meet crop needs rather than applying these inputs uniformly across the field.

This more judicious approach to using inputs is critical to helping farms increase their efficiency and profitability while reducing their ecological footprint, said George Vellidis, precision agriculture researcher, professor of crop and soil sciences, and director of academic programs at the UGA Tifton campus.

“With the increasing number of sensors that we use on a daily basis in agriculture, we are collecting terabytes of data each growing season, and precision agriculture has morphed into information agriculture,” Vellidis said. “At the moment, we do not have the systems in place to fully mine these tremendous data sets and capture all the knowledge that is embedded in them. Our certificate will allow our graduates to do this.”

More info

Book your tickets for the Data Science Summit here:


NIH Data Science Plan Aims to Boost Data Analytics, Access

NIH’s Strategic Plan for Data Science will provide a roadmap for improved healthcare data analytics, access, and sharing.

Source: Thinkstock

Source: Thinkstock

June 07, 2018 - The National Institutes of Health (NIH) has released the final draft of its Strategic Plan for Data Science, which seeks to enhance biomedical research by boosting healthcare data analytics capabilities, data access, and data sharing.

In order for researchers to facilitate medical breakthroughs and improve health outcomes, their data resources must be clean and accessible. However, as NIH noted, this is not an easy task.

“The generation of most biomedical data is highly distributed and is accomplished mainly by individual scientists or relatively small groups of researchers,” NIH wrote in the document.

“Moreover, data also exist in a wide variety of formats, which complicates the ability of researchers to find and use biomedical research data generated by others and creates the need for extensive data ‘cleaning.’”

The organization cited a 2016 survey that found data scientists spend about 80 percent of their work time collecting and organizing existing data. This leaves little time for them to mine data for patterns that could lead to new discoveries.

The Strategic Plan for Data Science maps a general path to improve the biomedical data ecosystem over the next five years. NIH released a draft of the plan for public comment in March 2018, and in this finalized version the organization details how it will maximize the value of research-generated data.

NIH stated that it plans to prioritize the development and distribution of health IT tools that will accelerate data management and analytics.

NIH will also help establish a more competitive marketplace for tool developers and providers, aiming to make new technology more available and less costly for the research community. 

In addition, the organization will establish programs that allow engineers to optimize and refine tools developed in academia, making them more efficient, cost-effective, and useful for biomedical research.

NIH will also work to develop and adopt health IT tools that will improve the collection and integration of data from disparate sources. These new tools have the potential to transform big data into actionable clinical information, allowing researchers to identify patient needs and predict poor outcomes in vulnerable populations.

NIH will aim to increase researchers’ access to data as well. The organization intends to improve data accessibility by utilizing large-scale cloud computing platforms, which have the potential to streamline NIH data use by allowing rapid and seamless access.

NIH will leverage partnerships with cloud-service providers to facilitate access to large, high-value NIH datasets, and will ensure that these cloud environments are stable and secure to protect against data compromise.

Additionally, the organization plans to make smaller datasets from individual laboratories more accessible. NIH will create an environment in which individual laboratories can use intuitive interfaces to link datasets to publications in the National Center for Biotechnology Information (NCBI) database.

Data sharing is also a high priority for NIH. According to NIH, more than 3,000 different groups and individuals submit data to NCBI systems each day. These data can include human genome sequences, chemical structures and properties, or clinical trial results.

NIH will work to build a framework to ensure that these datasets can exist together, instead of isolated data silos. NIH will connect new data resources to other systems upon implementation, and when appropriate, develop connections to non-NIH data resources.

NIH expects that expanded data sharing will benefit not only biomedical researchers, but also policymakers and the public.   

Ultimately, the organization anticipates that its Plan for Data Science will foster breakthroughs in research to improve health outcomes.

“Data science holds significant potential for accelerating the pace of biomedical research,” NIH concluded.

“To this end, NIH will continue to leverage its roles as an influential convener and major funding agency to encourage rapid, open sharing of data and greater harmonization of scientific efforts.”

Book your tickets for the Data Science Summit here:


Data Science: Why Retail Will Reap the Biggest Rewards

In this special guest feature, Sarah Kampman, VP of Product at Square Root, discusses why retail is positioned to reap the biggest benefits of data analytics today. Sarah shares why machine learning and AI may be the secret weapon to solving for the the challenge of operating hundreds — even thousands — of disparate store locations, and how the approaches can be used to drive store performance, increase alignment and impact decision making at every level of a retail org. As the VP of Product at Square Root, Sarah finds solutions for customers that they didn’t even realize they needed. With 15 years in product management and more than 20 years in technology overall, Sarah specializes in creating long-term focus groups with valued clients, helping them use technology to meet their business needs. She understands that building relationships through empathy leads to the most dynamic ideas and strategies. Driven by a passion for behavioral economics and a desire to study how people make decisions, Sarah received a BA in Cognitive Science from University of California at Berkeley.

Data science is a big buzzword in business today. Beyond the hype, organizations are using advanced analytics to do everything from understanding their customers to improving forecasting, driving better, faster results. While the impact of these approaches is being felt across nearly every industry, retail stands to reap the biggest benefits. With more big box retailers announcing layoffs, store closures, and bankruptcy, data science may just be the secret weapon for success.

Well-positioned to win

Retail organizations are among the most complex, with thousands of employees across multi-layered teams, and hundreds or thousands of disparate locations. Competition is increasing every day, with new market entrants ranging from brick and mortar stores to home delivery services. Brands must also deal with the complexities of omni-channel integration — seamlessly integrating a constantly growing number of shopping channels — from online to in-store to mobile. They’re expected to deliver a consistent brand experience, while also providing a personalized experience across all channels.

Retail is an innately people-oriented business which makes it ripe for data science impact. Despite the massive amounts of data available, many of today’s decisions are driven by human observation and opinion. This leaves room for bias and error, and wastes time in human-directed data analysis that could be better spent taking action on the insights.

Success in retail operations relies on equipping teams with critical information, empowering them to take swift action. However, the industry has fallen behind in providing the necessary tools and technology. Retail teams are often stuck using outdated tools like manual spreadsheets, legacy technology — even pen and paper — to analyze data across the business. In a recent survey, nearly half of all Store Managers and more than half of all District Managers reported that they rely on aging technology to perform major functions of their role. Topping Store Managers’ wish list was better software, with 25% reporting it would positively impact store performance.

Putting data science into action

Data science has the potential to unlock insights to win and retain customers, drive business efficiencies, and ultimately improve performance. It can also help retailers uncover trends, but more importantly, data science can identify the KPIs’ drivers to make smarter, faster decisions.

Consider a retailer’s cross-promotional marketing efforts. Despite the same external promotions, sales for a particular brand of jeans are up at one store, while at another store, sales of the same jeans are flat. The store manager may assume it’s simply consumer preference. But advanced analytics can reveal that the increase in sales was related to an in-store cross-promotion with a sneaker brand. That actionable insight can now be shared with other stores to improve sales.

Customer experience is another area where data science can support data-driven decisions. Decisions today are heavily influenced by human bias, driven by what retail leaders believe. But managers are often only half right when it comes to understanding in-store problems and customer behavior. Data science can help combat that bias, arming managers with data insights and best practices to make tailored improvements to the customer experience.

Lastly, an often overlooked area for data science and one of the biggest opportunities to influence performance is employee satisfaction. In an industry where people are at the center of success, retailers must get employee satisfaction right. When reviewing eNPS, corporate teams can leverage natural language processing and correlational analysis to uncover what’s driving low satisfaction and help their stores solve for those challenges, improving satisfaction and retention.

Although the finance and logistics arms of retail have already embraced data science, applying advanced analytics to store operations is an as-yet untapped area of opportunity. Data science can help retail operations leaders make smarter, faster decisions. Those who get it right will find themselves quickly pulling ahead of the competition, with the insights needed to win customer loyalty, drive business efficiencies, and ultimately improve performance.

Book your tickets for the Data Science Summit here:


Leveraging blockchain power with data science

© Shutterstock / zoommachine

© Shutterstock / zoommachine

Articles about blockchain, big data and data science have flooded tech news for some years now. The real question is what the link between these technologies is. How can these be used to create value, decrease transaction costs and increase transparency and reliability? Also, are there other potential applications of looking at data from existing blockchains?

However, first of all, let’s understand the working model of blockchain. References to bitcoin will only be used as examples, to avoid the confusion that blockchain technology only applies to cryptocurrencies and specifically bitcoin.

In fact, blockchain is an alliance between a network of computers, called nodes to collaborate on keeping a shared ledger.  The aim is to keep an updated version including all transactions that have been verified by the majority of the participants. This verification step is the most resource intensive since once approved; an operation is impossible to delete or modify. This is why blockchain is considered extremely safe and reliable.

How Can Blockchain Data Become Valuable?

Since it is a ledger, blockchain contains vast amounts of transactional data which can be analyzed to extract patterns, as a specialist from data science company InData Labs explains. This could be helpful for predicting sales trends, preventing fraud or even studying social events.

Enforce Trust

Since once a block has been added to the ledger it is impossible to change and has passed a thorough verification process, it offers a seal of quality and reliability. Through its public nature, blockchain provides anyone the opportunity to trace any specific transaction or all the activity of a certain participant. This could be used to create scores for merchants and to have on public display the reputation of vendors.

Uncover Patterns

In a time when data is becoming a form of currency, any blockchain can be considered a vault. Estimations go as high as $100 billion in annual revenue by 2030. The way to get the info out of the rough recordings is to motivate people to perform data mining. There is even a crypto coin, called SpreadCoin which aims to reinvent bitcoin mining into data mining. They seek to create a “Big Data Market”: a layer built on bitcoin mining.

Share Knowledge

Machine learning algorithms need data to learn from. An adequately labeled blockchain record can act as training material for such algorithms. This approach speeds up development and prevents teams from reinventing the wheel or making the same mistake twice. If a group of researchers has already used a set of data to solve a problem, their findings can act as the base for further developments. To make this into a workable model, financial incentives are required. Groups that have data can sell it on a platform and receive proper compensation.

Fraud Prevention

The beauty of blockchain is that due to its open nature it can offer any contributing party the ability to verify transactions by studying patterns. When a participant detects abnormal behavior of the ledger, all other parties are notified immediately, and the malicious node is removed. The highly distributed nature also means that no one actor has the potential to make a significant impact or cause a real problem. There is only one situation when it could become an issue, and that is the case of mining pools which can have considerable computing power.

Predict Social Data

Our transactions reflect our behaviors. Therefore, by looking at data from the ledger, one can identify social trends and power centers. As generations change, the users of digital instruments like cryptocurrency, smart contracts and other artifacts built on top of blockchain will be able to accurately predict social patterns. The logic behind this relies on the fact that cryptocurrency users are roughly the same as social media users, trades are mostly by individuals, not organizations and the value is only influenced by demand.

Potential Blockchain Drawbacks

Like most emerging technologies, blockchain has a few drawbacks that need to be considered before investing.

The technical limitations derive from the cost of the electrical energy necessary to power blockchain. With the ever-increasing complexity of the problems to be solved, the amount of electricity required can be a real concern. For example, the bitcoin blockchain is expected to need more power than Argentina, which makes it unsustainable in the long run. Another technical limitation is the latency of the verification process.

A second problem has to do with the acceptance and adoption of this new approach. It has yet to make the transition from the stage of innovation towards mainstream acceptance. Most companies are aware of its existence, but few can see immediate applications of blockchain for their daily operations or are willing to invest. For now, organizations are waiting for pioneers to make the first mistakes and to learn at the expense of others.

The novelty of blockchain means that there are pending security and privacy issues to be solved. Although the underlying algorithm is deemed one of the safest possible, the connecting APIs and data processing can be subject to hacker’s attacks.  A connected problem is the lack of regulations and compliance rules in this area which makes it challenging to use blockchain models for more conservative areas such as finance and healthcare.

New Business Models

The most significant advantage of blockchain is that it creates a framework to develop new data monetization opportunities. Access to transaction-level data could mean that companies will pinpoint their marketing efforts, reduce costs and increase revenue.

Looking at the ledger and tracing transactions back to their origin could mean better supply chain management, enhanced trust and even identifying vendors who underdeliver. For example, adding a “return” flag to a transaction could help build scoring models.

Blockchain is also the best environment to create data markets. Here, companies could leverage their data in any way they seem fit and create additional revenue streams.

Book your tickets for the Data Science Summit here:


6 Steps for Applying Data Science to Security

Two experts share their data science know-how in a tutorial focusing on internal DNS query analysis.

Image Source: Ryzhi via Shutterstock

Image Source: Ryzhi via Shutterstock

Security practitioners are being told that they have to get smarter about how they use data. The problem is that many data scientists are lost in their world of math and algorithms and don’t always explain the value they bring from a business perspective.

Dr. Kenneth Sanford, analytics architect and sales engineering lead at Dataiku, says security pros have to work more closely with data scientists to understand what the business is trying to accomplish. For example, is compliance the goal? Or is the company looking to determine what it might cost if they experienced a ransomware attack?

"It’s really important to define the business problem," Sanford says. "Something like what downtime would cost the business, or what the monetary fine would be if the company were out of compliance."

Bob Rudis, chief data scientist at Rapid7, adds that companies need to take a step back and look at their processes and decide what could be done better via data science.

"Companies need to ask themselves how the security problem is associated with the business problem," Rudis says.

Sanford and Rudis created a six-step process for how to build a model to analyze internal DNS queries – the goal of which would be to reduce or eliminate malicious code from the queries. 

Image Source: Fabrik Bilder via Shutterstock

Image Source: Fabrik Bilder via Shutterstock

1. Define the business problem

Too often security practitioners get lost in the details of the technology and they don’t always think through the business issue at hand. For example, if the goal is to analyze DNS requests, it’s important to decide if you want to focus on the thousands or possibly millions of internalDNS requests or the external DNS requests on a web site or ecommerce site. Once you decide what’s more important, a data scientist can build a model to analyze those activities.

Image Source: Sergey D via Shutterstock

Image Source: Sergey D via Shutterstock

2. Decide what data sources would be best to solve the problem

Here’s where you would decide what the model would look like to solve the business problem. For example, if the company decides it wants to stop internal users from clicking on links that result in phishing attacks, it needs to build a model of all internal DNS requests. In terms of the data required, you will need a set of legitimate emails, a set of corrupted emails and the IP addresses and domains of where those emails originate. The data scientist needs to be creative to imagine a world where all the data are available.

Image Source: everything possible via Shutterstock

Image Source: everything possible via Shutterstock

3. Take an inventory of the data

Here’s where you have to take an inventory of the data that’s available. While you should aim for perfection, recognize the constraints. Keeping with the DNS theme, most DNS data comes from routers, mobile phones, servers and workstations. Take an inventory of the type of queries being made and then determine if it’s in a format you can work with and whether you have the IT infrastructure available to store it and access it properly. For example, if you don’t have adequate storage, you’ll need to figure out what you need and what that investment will cost.

Image Source: Lagarto Film via Shutterstock

Image Source: Lagarto Film via Shutterstock

4. Experiment with many data science techniques

Now it’s time to put your hands to the keyboard and experiment with which data science technique works best. You may decide on a highly explainable linear model or a deep learning algorithm, but whatever you do, the idea is not to deploy an algorithm for the sake of doing high math. The goal should always be to pick the best way for the machine to deliver analysis that a human couldn’t do that will let the business make good decisions. In the case of our DNS example, you will want to build models that can consistently tell you with high confidence that a DNS request is malicious.

Image Source: everything possible via Shutterstock

Image Source: everything possible via Shutterstock

5. Test for a real-world perspective

When testing, the team will want to determine if the model generates too many false positives, too many false negatives and if the analysis happens fast enough to be of use to the business. It’s always important to have a real-world perspective on the purpose of the model you are building. In the DNS example, you should ask if the model will reduce the number of malicious DNS queries the company makes internally?

Image Source: Alfa Photo via Shutterstock

Image Source: Alfa Photo via Shutterstock

6. Follow-up and continuous improvement

Once the testing is complete, a process that can take several weeks, it’s time to put the model into production. However, it’s really important to understand that these models require constant monitoring and continuous improvement. It’s not like deploying antivirus software where every couple of weeks you will get new signatures you can update. The model has to be continuously monitored to ensure that it’s meeting the company’s goal of stopping malicious DNS queries hitting the internal network.

Book your tickets for the Data Science Summit here:


Prime Minister challenges UK to transform care through AI and data science

We’ve today backed a challenge from the Prime Minister, Theresa May, to make the UK a world leader in the use of data and artificial intelligence to help transform the diagnosis and treatment of chronic diseases in the UK.

Speaking in Macclesfield, the Prime Minister challenged the NHS, leading health charities and industry to accelerate progress in using Artificial Intelligence (AI) to quicken the diagnosis of conditions including heart and circulatory disease, cancer and dementia. 

The speech supports the Government’s Industrial Strategy, which includes four Grand Challenges to put the UK at the forefront of future technologies and industries. This includes growing the artificial intelligence and data driven economy and managing an ageing society. 

Cutting edge science 

Data science is the use of maths, statistics and computer science to get answers from large, complex data sets, while AI is the use of computer algorithms to draw conclusions from this type of data without direct human input. 

Applying these techniques to secure health data has already shown huge promise in improving diagnosis, including for people at high risk of heart attack, stroke and heart failure. 

Our Chief Executive, Simon Gillespie, welcomed the challenge from Macclesfield: 

“Accelerating research using health data and artificial intelligence will build on the UK’s reputation for cutting-edge science, and lead to transformative improvements in treating patients within the NHS.” 

“Our research, including through initiatives like UK Biobank, is already showing the huge potential of data science to transform care for the millions of people living with heart and circulatory disease in the UK. For example, there is promising evidence that using artificial intelligence to analyse CT scans could spot early signs of heart disease which may be missed by current techniques. This could lead to a quicker diagnosis with more personalised treatment that could ultimately save lives.”

“Through investment in innovation  we will also accelerate the adoption of new data-led technologies, for instance  to detect and monitor conditions like atrial fibrillation, diabetes and high blood pressure, all of which significantly increase the risk of a deadly heart attack or stroke.”

Leading the data revolution

Our research is already revealing how data science and AI could transform the diagnosis of heart and circulatory diseases. 

For example, research we’re funding at the John Radcliffe Hospital in Oxford is using AI techniques to develop a new test using CT scans that could identify people at high risk of heart attacks and strokes earlier than is currently possible. It could also identify people who are missed by current techniques. This could allow far more people to be given preventative treatments that could ultimately save their life.

Register for your tickets for the Data Science Summit here:


Oracle buys Culver City data science firm


Oracle Corp. has acquired a Culver City, California-based data science platform. centralizes data science tools, projects and infrastructure for enterprise operations. Data science teams use the platform to organize work, access data and computing resources, and execute end-to-end model development workflows.

The company’s clients including Amgen, Rio Tinto and Sonos use the platform to improve productivity, reduce operational costs and deploy machine learning solutions faster.

"Data science requires a comprehensive platform to simplify operations and deliver value at scale," said CEO Ian Swanson in a statement. "With, customers leverage a robust, easy-to-use platform that removes barriers to deploying valuable machine learning models in production."

Oracle (NYSE: ORCL) embeds artificial intelligence and machine-learning capabilities across its software-as-a-service and platform-as-a-service solutions, including big data, analytics and security operations. The company plans to integrate into its solutions to provide customers with a single data science platform.

Terms of the deal were not disclosed.

"Every organization is now exploring data science and machine learning as a key way to proactively develop competitive advantage, but the lack of comprehensive tooling and integrated machine learning capabilities can cause these projects to fall short," said Amit Zavery, executive vice president of Oracle Cloud Platform. "With the combination of Oracle and, customers will be able to harness a single data science platform to more effectively leverage machine learning and big data for predictive analysis and improved business results."

Register for your tickets for the Data Science Summit here:


Harvard's New Data Science Program Signals a Big Shift for Businesses

Data science informs entrepreneurs in a way that listening to their gut just can't.

Image credit: monsitj | Getty Images

Image credit: monsitj | Getty Images

Harvard hosts some of the most prestigious programs in the world, especially in business and law. So it was big news in the data science industry when the university announced a new master's program in data science in March 2017 -- and it was bigger news when the university recently credited the program for a 2 percent increase in international applications to the Graduate School of Arts and Sciences.

The fact that elite universities are now investing in -- and seeing results from -- data science programs should send a signal to entrepreneurs: It’s time to start seriously considering the implications of data science in every industry. Like blockchain, data science has quickly emerged from virtually nowhere to find applications in every sector. Until now, bigger universities have been slow to keep up, which has allowed independent, alternative educators such as Kaggle, Udemy and Coursera to drive the industry.

The establishment of a data science curriculum at one of the oldest and most reputable universities in America hints at the potential impact data science will have on entrepreneurialism -- after all, cryptocurrency and blockchain are still relegated to extracurricular clubs at Harvard, even though they've attracted fervent interest from startups and venture capitalists.

Despite some skepticism from those already in the data science industry, the increase in applications shows that the program at Harvard has staying power. But the true value of data science will be realized by entrepreneurs who are willing to embrace the vast possibilities for disruption that the industry offers.

Data is leveling the playing field.

The economy basically demanded a graduate course in data science from an elite university. IBM analysts predict that job openings for data and analytic talent will increase by 364,000 in the U.S. by 2020, and jobs for advanced data scientists will reach nearly 62,000 by the same year.

Data science is an appealing career path because of its flexibility. Data science tactics can be used in everything from corporate business and law to startups in new technologies such as the Internet of Things, virtual reality and SEO. As startups race to use new tech to gather as much data as possible, data scientists are the ones who will leverage all that data to create value. Now, every disruptive company needs to have a data science component.

But incorporating data science can signal a foundational change in company structure. As more data becomes freely available from both people and devices, it will become a natural resource anyone can harvest; a solo entrepreneur gathering and analyzing data effectively can compete with a giant company that uses its data ineffectively. In this sense, data is a market equalizer; scaling will become less of a priority for companies with sound data practices.

And those ignoring data streams are already being quickly outmatched. In retail, for example, McKinsey & Company found that a business leveraging big data efficiently can increase its operating margin by 60 percent. Data science can unlock new potential for businesses in any industry. Focusing on four fundamental objectives can help entrepreneurs leverage data and keep up with the competition:

1. Decode market shifts using advanced metrics.

Data science is most useful as a way to view and organize large data sets to optimize insights. Don't rely on conventional wisdom; use data to uncover actual market conditions such as need, demand, competition performance and other industry-specific metrics.

Be sure to engage critically with the data and its implications; it's not enough to glance at a spreadsheet. Eighty-five percent of respondents to an executive survey by New Vantage said they use big data, but only 37 percent have found success with it. Execution is more important than intent when it comes to data integration.

2. Build out robust customer profiles.

Most startups understand that customer retention is a priority over customer acquisition -- and fortunately, existing customers offer a lot more data than potential customers. This data can help uncover customer behavior patterns, which help entrepreneurs develop effective retainment strategies.

When applied to consumer behavior, data science techniques can help organizations of any size. The town of Derry, New Hampshire, hired customer analytics startup Buxton to study its citizens' consumer behaviors and make recommendations to help recruit new businesses. By leveraging that data set, the town (and, in turn, the businesses located there) were able to better understand their customer base.

3. Uncover what works best about a product or service.

Sometimes consumers act differently than they speak. Data science is useful in uncovering realities about a product rather than perceptions. By uncovering who is using a product and for what reason, startups will be able to make tweaks -- or pivot entirely -- with lower risk and more efficiency.

Zurich Insurance, for example, recently implemented data analytics AI to reduce operational inefficiencies in its injury claims system. According to a case study presented to Gartner, the company saved $5 million per year by using the program to reduce medical report assessment times from one hour to a few seconds.

4. Fail more effectively.

Data science allows entrepreneurs to be know-it-alls, a major perk of which is finding information that was previously hidden. Trial and error is a key process of entrepreneurship -- the "right" idea often comes on the fourth or fifth unique attempt. Data helps business owners learn more from their failures and maximize future successes.

UPS, for example, uses data to tweak its massive distribution network and save hundreds of millions of dollars using its On-Road Integrated Optimization and Navigation system. Data science provides a broad overview of the huge number of possibilities that can improve efficiency in a business.

Entrepreneurs should have a reason behind every move they make, and data science gives them the best reasons to make the best moves. Harvard isn’t the first organization to jump on board the data science bandwagon. Its prestige, however, will carry over into industries both old and new.

Register for your tickets for the Data Science Summit here:

Solving global business problems with data analytics

David Simchi-Levi leads the Accenture and MIT Alliance in Business Analytics to develop novel solutions to the most pressing challenges faced by global companies.

Photo: David Sella

Photo: David Sella

David Simchi-Levi is a professor of engineering systems with appointments at the Institute for Data, Systems, and Society and the Department of Civil and Environmental Engineering (CEE) at MIT. His research focuses on developing and implementing robust and efficient techniques for supply chains and revenue management. He has founded three companies in the fields of supply chain and business analytics: LogicTools, a venture focused on supply chain analytics, which became a part of IBM; OPS Rules, a business analytics venture that was acquired by Accenture Analytics; and Opalytics, which focuses on cloud computing for business analytics.

In addition to his role as a professor of engineering systems, Simchi-Levi leads the Accenture and MIT Alliance in Business Analytics. The alliance brings together MIT faculty, PhD students, and a host of partner companies to solve some of the most pressing challenges global organizations are facing today. The alliance is cross-industry, collaborating with companies in sectors ranging from retail space, to government and financial services, to the airline industry. This diversity enables the alliance to be cross-functional, with projects that focus on everything from supply chain optimization to revenue generation and from predictive maintenance to fraud detection. In many cases, these endeavors have led to companywide adoption of MIT technology, analytics, and algorithms to increase productivity and profits.

Putting theory to practice, Simchi-Levi and his team worked with a large mining company in Latin America to improve its mining operations. Their algorithm receives data every five seconds from thousands of sensors and predicts product quality 10, 15, and 20 hours prior to product completion. Specifically, they used these data to identify impurities, such as silica level in the finished product, and to suggest corrective strategies to improve quality.

In the realm of price optimization, Simchi-Levi’s alliance has worked with a number of major online retailers, including Groupon; B2W, Latin America’s largest online retailer; and Rue La La. Rue La La operates in the flash-sale industry, in which online retailers use events to temporarily discount products.

“But how do you price a product on the website the first time if you have no historical data?” Simchi-Levi asks. “We applied machine learning algorithms to learn from similar products and then optimization algorithms to price products the company never sold before, and the impact was dramatic, increasing revenue by about 11 percent.”

It’s a deceptively simple answer. But for Simchi-Levi, well known as a visionary thought leader in his field, solving tough problems is at the heart of the work of the Accenture and MIT Alliance in Business Analytics.

“In the case of Groupon and B2W, we developed a three-step process to optimize and automate pricing decisions,” he says. First, they utilize machine learning to combine internal historical data with external data to create a complete profile of consumer behavior. Second, they post pricing decisions on the website and observe consumer behavior. Third, they learn and improve pricing decisions based on that behavior in order to optimize the final price. “In all of these cases, we made a big impact on the bottom line: increasing revenue, increasing profit, and increasing market share,” he says.

At any point in time, Simchi-Levi’s business analytics alliance, which has been going strong since 2013, has between 10 and 20 projects running simultaneously. He suggests the reason so many companies are turning to MIT for their business challenges has a lot to do with recent technology trends and the Alliance’s role at the forefront of those developments.

Specifically, he mentions three technology trends: digitization; automation; and analytics, including the application of machine learning and artificial intelligence algorithms. However, he observes that initially it is difficult for executives to accept that black box analytics can do a better job at pricing a product than the merchants who know the product and have been working in the industry for 25 years. While Simchi-Levi concedes that this is partially true, he notes that with thousands upon thousands of products to price, merchants can focus only on the top 10 percent, whereas MIT’s analytics can achieve the same performance on the top 10 percent, while achieving an equally impressive performance on the middle 50 percent and equally similar performance on the long tail.

More precisely, “While the company merchant will focus on a small portion, we can focus on the entire company portfolio,” he says. “We’re talking about the ability to use data and analytics to optimize prices for thousands of products.”

“Business analytics is a very exciting area. If you open any business journal you will see references to data science and data analytics,” Simchi-Levi says. But his expertise has led him to explore a deeper truth about this obsession with data analytics: “My experience is that while there is a lot of excitement around this area, industry actually does very little [in the way of] using data and analytics to automate and improve processes.”

He says there are three main challenges industry faces in the area of data analytics: data quality, information silos, and internal resistance. “What we do at MIT is bring all of these opportunities together by improving the data quality, convincing executives to start experimenting with some of the technology, and connecting different data sources into an effective platform for analytics.”

Join the Data Science Summit here: