Deloitte UK Data Scientist Interview: Applied AI Case Study Breakdown

德勤英国数据科学家面试：应用AI案例全流程拆解

5 January 2025

5 min read

Anonymous Candidate

2025 Deloitte UK Applied AI Interviewee

摘要 Summary

Real interview experience from Deloitte's Applied AI division, featuring a demand forecasting case for a UK supermarket chain with end-to-end ML pipeline thinking.

德勤应用AI部门真实面试经历，详解英国连锁超市需求预测案例，展示端到端机器学习项目思维。

Interview Overview| 面试概述

I just finished my interview for the Data Scientist position in Deloitte UK's Applied AI division, and I feel like I've been put through the wringer. The interviewer didn't bother with fancy algorithm discussions—instead, they threw me into a business scenario and watched to see if I could lead a data science project from start to finish like a real Consultant.

刚面完Deloitte UK的Applied AI部门的DS岗，感觉自己被扒了一层皮。面试官根本不跟你聊那些花里胡哨的算法，而是直接把你扔到一个商业场景里，看你能不能像个真正的Consultant一样，从头到尾地主导一个数据科学项目。

The Interview Scenario| 面试场景

The interviewer played the role of a client—a large UK supermarket chain. They opened with a sigh: "Our fresh produce, like fruits and vegetables, has a terrible spoilage rate. We're throwing away so much every day—it's killing us financially. You're the AI experts—can you build a model to predict how much of each vegetable each store will need for the next week?"

面试官扮演客户，是一家英国的大型连锁超市。他上来就唉声叹气：「我们最近的生鲜产品（Fresh Produce），比如蔬菜水果，浪费率（Spoilage Rate）太高了，每天都得扔掉好多，亏死了。你们是搞AI的，能不能帮我们建个模型，预测一下未来一周，我们每个门店的每种蔬菜，大概需要备多少货？」

Classic Demand Forecasting project. Game on.

得，一个经典的Demand Forecasting项目就这么来了。

My Approach: Thinking Like a Consultant| 我的「表演」开始：像顾问一样思考

I didn't jump straight into discussing LSTM or Prophet models. Instead, I took control of the conversation and started with a "requirements clarification" round, treating the interviewer like a real client.

我没急着跟他聊LSTM或者Prophet模型，而是先反客为主，把他当成客户，开始了一轮「需求澄清」。

Step 1: Business & Data Understanding| 第一步：定义问题与目标 (Business & Data Understanding)

I told the interviewer: "Building a model is the easy part. First, let's make sure we're solving the same problem. I want to confirm a few things:"

我跟面试官说：「模型好建，但我们得先确保我们解决的是同一个问题。我想先确认几个点：」

Prediction Granularity: Are we predicting at the "store-SKU-day" level? For example, "London Store #1 - Broccoli - Next Wednesday" demand?
预测的颗粒度 (Granularity)：我们是预测到「门店-SKU-天」这个级别吗？比如，「伦敦一号店-西兰花-下周三」的需求量？
Key Metrics: How do we measure success? Pure prediction accuracy (like MAPE), or the actual waste reduction in monetary terms, or profit improvement?
关键指标 (Metrics)：我们这个项目的成功，是用什么来衡量的？是单纯的预测准确率（比如MAPE），还是最终能帮你们减少多少浪费金额，或者提高多少利润？
Data Availability: To make these predictions, I'll need historical sales data, inventory data, promotional data, even weather data. Do we have these? What's the quality like?
数据的可用性 (Data Availability)：要做这个预测，我需要历史销售数据、库存数据、促销活动数据、甚至是天气数据。这些数据我们有吗？质量怎么样？

The interviewer acknowledged my questions positively—clearly I wasn't just a model-building bookworm.

面试官对我的提问表示认可，感觉我不是一个只会建模的书呆子。

Step 2: Data & Feature Engineering| 第二步：数据与特征工程 (Data & Feature Engineering)

Once the problem was clear, I started "requesting data" and explaining how I'd process it.

问题明确后，我开始跟他「要数据」，并说明我会怎么处理这些数据。

Required Data:| 所需数据：

Historical Sales Data: At least two years of daily sales by store and SKU
历史销售数据 (Historical Sales Data)：过去至少两年的，每个门店、每个SKU、每一天的销售量
Product Attributes: Is this vegetable locally sourced or imported? Organic? What's the shelf life?
产品信息 (Product Attributes)：比如，这个蔬菜是本地产的还是进口的，是不是有机的，保质期多长
Promotional Data: Any past discounts, buy-one-get-one-free deals? What impact did they have on sales?
促销信息 (Promotional Data)：过去有没有搞过打折、买一送一的活动？这些活动对销量的影响有多大？
External Factors: Weather data (people might want root vegetables for stew when it's cold), holiday information (Christmas turkey side dishes will spike)
外部数据 (External Factors)：我还会引入天气数据（天冷了大家可能更想买炖汤的根茎蔬菜）、节假日信息（圣诞节的火鸡配菜需求肯定大增）等

Feature Engineering:| 特征工程 (Feature Engineering)：

Time Features: Day of week, beginning/end of month, payday
时间特征：星期几、是不是月初/月末、是不是发薪日
Lag Features: Average sales over the past 7 days
滞后特征 (Lag Features)：比如，过去7天的平均销量
Price Elasticity: How much does a 1% price change affect sales volume?
价格弹性 (Price Elasticity)：价格变动1%，对销量有多大影响

Step 3: Modeling & Interpretability| 第三步：模型选择与解释 (Modeling & Interpretability)

My approach: "For modeling, I'd go from simple to complex in phases. First, use a classic ARIMA or Exponential Smoothing model as our Baseline Model—see how accurate we can get using just the time series information itself."

我的思路：「模型方面，我会从简单到复杂，分步进行。先用一个经典的ARIMA或者Exponential Smoothing模型，作为我们的基线模型 (Baseline Model)，看看只用时间序列本身的信息，能做到多准。」

"Then, I'd try LightGBM or XGBoost—Tree-based models. The advantage is we can incorporate all those features we discussed (product, promotional, weather, etc.), which typically performs better."

「然后，我会尝试用LightGBM或XGBoost这种Tree-based的模型。这类模型的好处是，可以把我们刚才提到的所有特征（产品、促销、天气等）都加进去，通常效果会更好。」

Crucially, I emphasized Model Interpretability: "A number isn't enough—we need store managers to trust the prediction. I'd use tools like SHAP to explain the model's predictions. For example, if the model says to stock more broccoli tomorrow, SHAP would explain: 'Because rain is forecast tomorrow, and there's a broccoli promotion this week.'"
我特别强调了模型的可解释性 (Interpretability)：「模型预测出一个数字还不够，我们得让门店经理相信这个数字。我会用SHAP这样的工具，来解释模型的预测结果。比如，模型告诉你明天要多备货西兰花，SHAP会告诉你，这是因为它预测明天会下雨，而且这周正好有西兰花的促销活动。」

Step 4: Deployment & Business Impact| 第四步：落地与商业价值 (Deployment & Business Impact)

Finally, I needed to show that I deliver solutions that create ongoing value—not just a pile of code.

最后，我得让他看到，我交付的不是一堆代码，而是一个能持续产生价值的解决方案。

"Once the model is built, we'd deploy it as an automated Data Pipeline. Every day it automatically extracts the latest data from the database, retrains the model, and outputs next week's predictions to a BI dashboard. Every store manager can see clear stocking recommendations when they open their computer."

「模型建好后，我们会把它部署成一个自动化的Data Pipeline。每天自动地从数据库里抽取最新的数据，重新训练模型，然后把未来一周的预测结果，输出到一个BI仪表盘上。每个门店的经理，打开电脑就能看到清晰的备货建议。」

"We'd also continuously monitor model performance (Model Monitoring). If prediction accuracy drops, it triggers an alert for data scientists to investigate. This is a continuous iteration and optimization process."

「我们还会持续地监控模型的表现（Model Monitoring），如果发现预测准确率下降了，就会触发警报，让数据科学家介入调查。这是一个持续迭代优化的过程。」

Key Takeaways| 面试心得总结

Deloitte's DS interview is really looking for a "full-stack" data scientist. You need to understand algorithms and write code, but you also need to understand business, communicate well, and turn a technical solution into a compelling business story that wins client confidence.

Deloitte的DS面试，真的是在寻找一个「全栈型」的数据科学家。你不仅要懂算法、会写代码，你还得懂业务、会沟通、能把一个技术方案，讲成一个能让客户信服的商业故事。

Preparation Advice: Don't just tune parameters in Jupyter Notebook. Take several business problems and think through them end-to-end: How would you define the problem? Find data? Build models? Evaluate value? This end-to-end problem-solving ability is what they value most.
备考建议：别光顾着在Jupyter Notebook里调参了。多找几个商业问题，从头到尾地想一遍，如果你是项目负责人，你会怎么定义问题，怎么找数据，怎么建模型，怎么评估价值。这种端到端的problem-solving能力，才是他们最看重的。

Good luck to everyone!

祝大家好运！