Random Forest Decision Tree

Day 2


Now let us look at another problem. Let's suppose that we are a company that remodels kitchens. We do everything from a $20,000 job to ones cost over $100,000.

It takes anywhere form 3 to 16 weeks to complete the remodeling jobs.

We keep track to see if we are on schedule and on budget. We also keep track of the bid: if it was awarded to us or if we did not get the contract.

We keep track of our jobs on the following spreadsheet.

Kitchen Remodel

Download this spreadsheet and print this out as we will need to create a dataset in Python using this information.

As you can see we keep track of: cost, weeks to complete the job, if we guarantee that we will be on schedule, and not go over the budget.

We have coded some of the fields to reflect true or false. A 0 means false and a 1 means true.

For example, looking at index 3, our $45,000 kitchen was estimated to take 4 weeks to complete. We did not guarantee completion within 4 weeks. We were, however, able to guarantee that we would be on budget

Our Python model using the Random Forset algorithum will attempt to analayze this data and determine the most important factor in awarding contracts and give us a tool for predicting the success of getting future contracts.

Code for this assignment came from an article entitled Data to Fish Example of Random Forest in Python.3

I modified it to use in our kitchen remodel assignment.

Load Python and Jupyter notebook for this problem. Open a new file.


Use the Copy Text button to put the above Python code on the clipboard.

Paste it into your Jupyter Notebook.

Save it.


Copy the information from the spreadsheet just below the code you just copied.

Use the Copy Text button to put the above Python code on the clipboard.

Paste it into your Jupyter Notebook.

Save it.


In Jupyter Notebook, Click on Insert then Insert cell below to get a new box for the next slice of code.

Use the Copy Text button to put the above Python code on the clipboard.

Paste it into your Jupyter Notebook.

Save it.


Day 3


Now run your code either cell by cell or Kernel and restart and run all

You should get the entire data frame printed out on your screen.

Use the Copy Text button to put the above Python code on the clipboard.

Paste it into your Jupyter Notebook in a new cell

Save it.


This code is where the Test X and train Y are created. They represent 25% of all cases. and they are chosen at random.

Run this cell just to check the synatx.

Use the Copy Text button to put the above Python code on the clipboard.

Paste it into your Jupyter Notebook in a new cell

Save it.


When you run all the cells, here is the output you will get. Let's examine these predictions and see what they mean.


IndexCostWeeksonScheduleonBudget
226500611
2010000911
2528500411
435600401
104100411
1578200611
2867250611
115500900
1853500411
2987500810

[1 1 1 1 1 1 1 1 1 0]