Multiple Linear Regression Models

Day 3: Getting the program File

There two models that we are going to use.





  1. Copy salesCalls.ipynb to clipboard.
  2. Click on Jupyter Notebook(anaconda3) program.
  3. Click on New.
  4. Select python3 as the file type.
  5. Click in the first frame.
  6. Press CTRL V to paste text into python.
  7. Click on file and save it salesCalls.ipynb.

Day 4: Running the salesCalls.ipynb

from sklearn import linear_model

From the sklearn module we will use the LinearRegression() method to create a linear regression object.

This object has a method called fit() that takes the independent and dependent values as parameters and fills the regression object with data that describes the relationship

regr = linear_model.LinearRegression()

regr.fit(X, y)

Now we have a regression object that is ready to predict sales based on a number of calls and items pitched.

#predict the sales where the calls equal 6, and the number of items pitched = 24

predictedSales = regr.predict([[6, 24]])

  1. Click on Run to see what the program does.
  2. The 57,4 shows the shape of the file. print(df.shape)
  3. It has 57 rows one for each employee.
  4. print(df) prints four columns: sales persons' name, number of calls, number of items pitched and the resulting actual sales.
  5. print(df.describe) gives us some statistics that are very useful: mean, standarad deviation, minimum and maximum ans values in between.
  6. The mean represents the average for calls, items pitched and sales.
  7. To find the mean, add up all calls and divide by 57, the number of transactions.
  8. To find the mean of the items pitched, add up all the items pitched and divide by 57.
  9. To find the mean sales number, add up all the sales and divide by 57.
  10. Standard deviation is a number that describes how spread out the observations are from the mean.
  11. A low standard deviation means that most of the numbers are close to the mean.
  12. A high standard deviation means the value are more spread out from the mean.
    • To calculate the standard deviation for calls made
    • Get the mean by adding up all the calls and dividing by 57, the number of calls
    • Subtract the mean, 3.73, from each call
    • Square the resulting number
    • Divide the result by 57
    • Get the square root of this number to get the standard deviation
    • Excel spreadsheet showing standard deviation
  13. The minimum number represents the worst performance.
  14. The maximum number represents the best performing sales person.
  15. Twenty-five percent of sales persons (14) make less than 3 calls
  16. Fifty percent represents the median, the most frequently occurring number.
  17. Fifty percent of sales persons make 3 calls, pitching 16 items and generating $5,900 worth of sales
  18. Seventy-five percent of sales persons make less then 5 calls.
  19. Your worst performing saleperson with 2 calls, pitching 6 items generating $2,450 in sales is Robert Orlov.
  20. The best perfoming salesperson is Piers Thron with 8 calls, pitching 32 items and generating $10,000 worth of sales.
  21. The 2839.05 is a prediction of what a salesperson would generate making 2 calls and pitching 8 items.
  22. The line that states :predictedSales = regr.predict([[2, 8]]) is the one that generates this prediction.
  23. Try changing the numbers in this line to 6 and 24.
  24. What are the sales predicted using these numbers?
  25. Does this number look like a decent prediction? Hint compare with actual sales in the table above.
  26. print(regr.coef_) generates predictions that show what would happen if the salesperson increased calls by one and pitches by one.
  27. [522.16909967 204.03645735]

In regression with multiple independent variables, the coeficient tells you how much the dependent variable, sales in our case, is expected to increase by one, holding all other independent vaiables constant.

As Sales Manager, let's look at how these coefficients will impact our sales.

We have 57 salespeople and if each one can increase their sales by $522.17 by making one additional call, our overall sales can be calculated by multiplying 522.17 times 57.

522.17 times 57 = 29,763.69. Add 29,763.69 to 302,685, our original total sales, we get $332,448.69.

That is an 8% increase. (29,763 divided by 332,114.69).

The salesforce believes that it might be easier to just add another pitch to each call.

Our model predicted that if one additional pitch was made by each salesperson, that it would increase sales by $204.04.

204.14 times 57 = 11,630.28.

302,685.00 + 11,630.28 equals $314,315.28.

That is a 3% increase. (11,630.28 divided by 314,315.28).

Armed with this information, you can revise the marketing strategy for our grocery products company.

Under Performing Salespersons

You, in your job as Sales Manager, you have decided you need to address those salespersons making only 2 calls a month.

You want to see what they would generate in sales by adding one additional call and pitching 12 products during that call.

The line we are going to change is: predictedSales = regr.predict([[3, 12]])

A number of salespersons, twelve, are only making 2 calls.

You believe that they can do a better job and need to make more calls,4 and present at least 10 items instead of 12.

Make the change. What are the predicted sales now for these employees?