I meet a lot of aspiring
data scientists, people starting out who are often switching from academia or
finance. They are all keen-eyed and bushy tailed, drawn in by the tales of
advanced algorithms from Netflix, the latest competition at Kaggle or the shiny
new visualization from Facebook. However, when it comes to e-Commerce, they are
kind of stumped. They don't really grasp the scope of how data science can help
a business that sells physical “stuff”. They get the idea of recommendation
engines baked into almost every chunk of Amazon's website of course but beyond
that, they find it hard to imagine how else data scientists may spend their
days in such companies.
The purpose of this post,
then, is a brief, almost superficial, overview of some of the different aspects
of a typical e-Commerce business where data scientists can add value.
Before I start, however, I
want to mention a couple of caveats:
• All of the areas below are serviced by a swathe of
specialized vendors. They can do a great job --- potentially far superior than
an in-house data science team because their business is so focused and
specialized and their tools so developed --- but it usually comes at a price.
At small company scale, an individual data scientist or team may be able to
provide something that is sufficiently good to meet the company's needs or to
demonstrate the need for a specialized service. At larger scale, it may make
sense to build such systems in-house using the data science team.
• In the list below, there is a broad overlap between
the responsibilities of a typical analyst and a data scientist. Some aspects,
such as “implement a recommendation engine” are clearly in the data science
camp. Other areas, such as those relating to customer insights, are usually performed
by analysts. In this case, however, the data scientists may be able to help the
business and analysts with more sophisticated statistical approaches (say feature
reduction or unsupervised clustering of customers rather than a priori
slicing and dicing based on age, gender, zip etc), in other words advanced
analytics, or more programmatic approaches (e.g. use an API to pull down
supplementary data).
Few individual companies
will use data scientists for all of these aspects. The point here is to
highlight different aspects where data scientists can and do get involved and
provide some value and insight.
Recommendation and
Personalization
Let's get the obvious one
out of the way. Consumers are increasingly reliant on recommendations these
days, whether it is for news, restaurants, bands or items to purchase. Many, if
not most, e-Commerce sites have some sort of recommendation engine under the
hood and it is typically the data scientist's role to help conceive the type,
features, weights and in many case implement it. These engines are used for
cross-sell (“you are ordering this iPad so you probably want one of these cases
to protect it”), up-sell (“you have been looking at this camera, here is the
next level up which is even more awesome”) and personalization. It is the data
scientist's role to learn the attributes and relationships among products and
when possible to learn the tastes and anticipate needs of the customers. They
can then help tailor the customer's experience. This might involve changing the
ordering of products in the search results or galley pages specifically for the
customer.
Many of us have
supermarket loyalty cards. So do some e-Commerce sites (think Amazon prime).
They are a source of extremely valuable data (so much so that it may even be
worth making some amount of loss on those customers). Coupons and discounts can
drive new purchase behavior and provide insights for whole segments of
customers not in the loyalty program itself. Those programs need to be
conceived, managed and maximal use made of the data.
Product strategy
All e-Commerce sites have
to tackle the questions: what should we sell, at what price and when. Data
scientists can help define and optimize the product mix. In some cases, such as
my current employer Warby Parker, the company may design and manufacture their
products. That is, they own the whole process from produce conception to final
sale to a customer. While there is a typically a product team that owns that
design process, data scientists can and do help with forecasting. Is there a
hole in our product mix, what should we make and when should we sell it? How
many units should we order in the initial batch from the factory? When should
we retire products? Analysts will typically tackle the retrospective analysis
(how much did we sell, what are the duds) whereas data scientists can help with
the more advanced prescriptive and predictive analytics.
Supply chain
If an e-Commerce is to
sell “stuff,” it needs the right amount of the right stuff in the right place
at the right time. Supply chain is a particularly complex and important part of
the business. It is complex because it often involves multiple vendors and
factories, significant time lags for international shipping, significant
shipping costs (especially if one gets it wrong and has to expedite pallets of
good to warehouses) and significant capex. Also, there can be very narrow
windows of demand for a product and if you miss that window, you might be stuck
with a big pile of useless inventory (think of “Happy New Year 2014” products
on Jan 2). Finally, demand van be highly unpredictable and might correlate
strongly with exogenous factors such as above-average weather. Ideally,
e-Commerce will work with specialized vendors to handle supply chain or hire an
expert in-house operations research team. However, in many e-Commerce sites,
especially when small, there is plenty of scope for data scientists to perform
detailed analysis and develop predictive models than can help minimize
risk, inform strategy and optimize customer satisfaction.
Customer Service
A company that puts the
customers first is going to have a great customer service team that handles
issues, deals with returns and complaints and generally tries to keep the
customers happy. These teams generate a trove of data from phone calls, instant
messages and email interactions with the customers and back end systems. They
also tend to be fairly metric driven: how long on average does it take to
answer the phone, to resolve a case, what is the size of the case backlog etc.?
Data scientists can help with predictive models and visualization. They can
also use their skills with natural language processing. For instance, they
could use keyword extraction and topical modeling to understand the types of
complaints and issues being filed.
Fraud
Fraud is, unfortunately,
very common and the strategies employed by the thieves varied and in some cases
sophisticated. It can range from the use of stolen credit cards, non-returned
items or items returned which are shrink-wrapped but which do not contain the
original product. Again, there is potential for data scientists to develop
models or monitoring or alerting systems
HR
Hiring is tough,
especially in technology where it is extremely competitive to hire good
engineers (and data scientists of course). Hiring is time consuming and
expensive because of the cost of recruiters, fees, and time spent interviewing.
In addition, a bad hire can be counter-productive to the team or company and
expensive to manage. Increasingly, companies are interested in honing their recruiting
process: what makes a good fit for our company, where can we streamline the
interview process, what are good discriminating interview questions and so on.
Models can be used to understand attrition and retention, identify who should
be rejected at the resume phase, and analyze and optimize the interview
pipeline.
Customer insights
An e-Commerce site has
stuff to sell but who are the people buying it? What are they interested in?
Where do they live? How can we serve them better? What makes them tick? These
questions are typically answered by analysts in a group akin to customer
insights or, as the company scales, to specialized teams that might work within
just one realm within the product space. As above, data scientists can help
here with more advanced analytics (classifiers, predictive modeling,
unsupervised clustering and segmentation and so on). This team is often
responsible for customer surveys and so there is ample opportunity to help them
with natural language processing including keyword extraction and topic
modeling. (I wrote about this earlier in my post about matching misspelled brand names.)
Marketing
OK, so the site has
product to sell and they know something about their customers. An obvious next
question for them to ask is how do they get more customers or encourage
existing customers to purchase more? Here we enter the realm of marketing. Once
again, there is lots of scope for data scientists to contribute. This might
range from adword buying optimization, channel mix optimization (by that I mean
print vs web vs TV), ad retargeting optimization, and SEO. Most e-Commerce sites
send out a lot of emails, especially if they are in flash sales. There is lot
of scope for understanding, optimizing and A/B testing subject lines, content,
send times and so on. (At One Kings Lane, a home decor flash sales site, we
sent customers up to 17 emails per week.) There is a careful balance between
reminding customers about your presence and what you offer and turning people
off by being a nuisance. In many e-Commerce sites, cart abandonment rates reach
dizzy heights and understanding and addressing that can pay rich rewards. Data
scientists can often comprise a core part of personalization programs.
Web analytics
Another obvious area where
data scientists can contribute is web analytics. How do people come to the site
(this relates to SEO, search terms and referer URL analysis)? What paths do
they take? When and where do they bounce? What stages of the checkout funnel do
we lose most customers? How can we make the experience more frictionless,
enjoyable and relevant? Which products are customers entering into our search box,
which we do not currently, but should, supply? These are all areas which should
be covered by a specialized analyst team as the company scales but where data
scientists can help with data munging, visualization, advanced clickstream
analysis, A/B testing as well as contributing data products for the site
(personalization and recommender APIs).
And
there you have it: a whirlwind tour of an e-Commerce site from the perspective
of data scientists. It is this breadth that makes being a data scientist fun,
rewarding and challenging. You get to work with a board spectrum of partners
across the organization, dip into different domains, and make a difference in a
variety of ways.