## Week 8 (16.11.-20.11.)

**Topic**: Markov Simulation**Lessons**: Markov Chains, Vector & Matrix Multiplication, Python Classes, NumPy Arrays, Program Design, Composition & Inheritance, PEP8**Project**: Simulate the paths of new customers in a supermarket**Dataset**: internal**Code**: GitHub

The goal of this project was to create an Markov Chain Monte Carlo (MCMC) simulation of new customers in a supermarket, based on data about customer paths from entrance to checkout through four aisles (fruit, drinks, dairy, and spices) recorded on five days from 7 am to 10 pm.

This project was particularly challenging for two reasons: it involved OOP and team work. So in this post I’ll give you an overview of our workflow.

### Planning the project

I teamed up with two colleagues and right from the start we had to plan and divide our tasks. There are many things to track when working on a team project, but they can be classified into three main categories:

Idea | Design | Implementation |

user story | state diagram | code-debug cycle |

requirements.txt | ML model architecture | skeleton code |

workflow tracker | class diagram | prototype |

entity-relationship diagram | baseline models | |

flowchart | ||

data flow diagram | ||

pseudocode |

### Creating the simulation

**Markov-Chain Monte Carlo (MCMC) **is a method used to approximate the posterior distribution of a parameter of interest by random sampling in a probabilistic space. As the name says, it combines two properties:

*Markov-Chain*uses each random sample (generated by a sequential process) as a stepping stone to generate the next random sample. While each new sample depends on the one before it, new samples do*not*depend on any samples before the previous one.*Monte Carlo*estimates the properties of a distribution by examining random samples from the distribution.

The first step in our project was to calculate the** transition probability matrix**, i.e. the probability that a customer moves from one aisle to another. For example, in our data there was a 28% probability that a customer goes from drinks to dairy, or 51% from fruit to checkout.

Next, we had to generate new customers and simulate their paths based on the calculated transition probabilities. And here comes OOP! We created three classes:

- a
*Customer class*that simulates a new customers and their path from entrance to checkout, - a
*Supermarket class*that manages multiple Customer instances that are currently in the market, - a
*SupermarketMap class*which visualizes the supermarket background. We used**numpy**and**OpenCV**to make the supermarket visualization, by basically creating and slicing arrays, and inserting icons in specific tiles.

To calculate the customers’ path from entrance to checkout, we used the **A* (star) algorithm**. This is a search algorithm used for finding the shortest path in environments where there are obstacles along the way (like walls or aisles in the supermarket). It works as a weighted graph, which in this case is our supermarket with 8 nodes (entrance, fruit, spices, dairy, drinks, three checkouts) and edges (paths) between them. A* starts from a specific starting node of a graph (entrance) and aims to find a path to the given goal node (checkout) having the smallest cost (least distance travelled, shortest time, etc.). Specifically, A* selects the path that minimizes

**f (n) = g(n) + h(n)**

*n*is the next node on the path,*g(n)*is the cost of the path from the start node to n,*h(n)*is a heuristic function that estimates the cost of the cheapest path from n to the goal.

A* terminates when the path it chooses to extend is a path from start to goal or if there are no paths eligible to be extended.

## Friday Lightning Talk

At the end of the week, we gave a team presentation of our project. I presented the exploratory data analysis and visualization with Plotly, while my colleagues talked about the implementation of the A* algorithm and our experience and lessons learned from pair programming.