One challenge for teachers is to find information in artificial intelligence problems that they can relate to. I found, that for me to understand how things work, that I needed to have a practical business application for the information that was being analyzed.

I have been lucky enough to have taught Life Science and Biology for about 15 years of my 35 years in the classroom, in addition to my business and computer courses and can relate to the iris data set that is used a lot in demonstrating how things work.

The challenge that I have found in developing these units for students and teachers, is that there is a lack of data that relates to business problems.

In many of the preceeding lessons, I have created my own data bases: Toyota auto sales, McDonalds food purchases, computer dating survey, food truck sales, Generation X and Y preferences, Midas Touch Jewelry and Action Sporting goods.

The purpose of this unit is to help teachers and students to develop relevant data sets for their studies in AI.

First we will look at creating the frequent item dataset within the actual Python code, in Excel spreadsheets, in csv files and files that reside on Google Drive.

I have always found that working with data that is familiar to you gives you an advantage in understanidng what the algorithums actually do.

For my example, I will use song titles from the Billboard charts of 1961 and by making up downloaded songs for a week, I will attempt to see the most popular combination of downloaded songs to create a playlist.

I used the Wikipedia site for information using this link for the most popular songs of 1961.

Top 100 songs of 1961

I used the apriori algorithum in Python to find these frequently downloaded songs.

Basically, frequent itemsets generation algorithms search the dataset to determine which combination of items occur together frequently. We will be looking for groups of songs that appear frequently. This is the market basket algorithum, Apriori

For a fixed threshold support, the algorithm determines which sets of items, of a given size k, are contained in at least s of the t transactions.

Day 1: Creating the dataset by manually typing in the data.

There’s multiple ways to create DataFrames of data in Python, and the simplest way is through typing the data into Python manually, which obviously only works for tiny datasets. Using Python dictionaries and lists to create DataFrames only works for small datasets that you can type out manually.

Below is the dataset that I made up using songs from 1961. Each row in the dataset represents one customer's downloaded songs. For example, customer one downloaded three songs: I Fall to Pieces, Exodus and Hit the Road Jack

Before you start work on creating the dataset, let's look at some trivia about songs and artists of 1961.

If you drag the images from the first box over to the middle box and drop the image in the middle box, the third box will give you some information about these times in our musical history.

You can also listen to two of the songs in our dataset: Travelin Man by Ricky Nelson and Angel Baby by Rosie and the Originals.

Angel Baby by Rosie and the Originals

Drag the images from the first box to the second box and then release them.

music = [
['I Fall to Pieces', 'Exodus', 'Hit the Road Jack'],
['Runaway', ' On the Rebound', ' Quarter to Three'],
['I like it Like That'],
['There is a Moon Out Tonight'],
['Surrender', 'Walk Rigt Back', 'The Way You Look Tonight'],        
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],        
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],        
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],        
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],        
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],        
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],        
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],        
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],        
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],                
['Tossing and Turnin', 'Raindrops', 'Take Good Care of My Baby'],
['Where the Boys Are', 'Pony Time'],
['Mother-in-Law', 'A Hundred Pounds of Clay', 'Shop Around', 'Calendar Girl', 'Blue Moon', 'Little Sister', 'Hello Mary Lou'],
['Calendar Girl', 'Runaround Sue', 'A Hundred Pounds of Clay', 'Walk Tight Back', 'Stand By Me', 'Hats Off to Larry'],
['Big Bad John', 'Pretty Little Angel Eyes', 'Raindrops', 'Calendar Girl', 'Blue Moon', 'A Hundred Pounds of Clay', 'The Mountains High'],
['Bristol Stomp', 'Traveling Man', 'Mother-in-Law', 'Michael', 'Tossing and Turning', 'I Fall to Pieces'],
['Runaway', 'Crying', 'Running Scared', 'Dedicated to the One I Love', 'Will you Love Me Tomorrow', 'Exodus', 'Hit the Road Jack'],
['Where the Boys Are', 'Traveling Man', 'Mother-in-Law', 'RainDrops', 'Shop Around', 'The Mountains High'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Quarter to Three', 'Hello Mary Lou', 'Surrender', 'I love How You Love Me', 'School is Out', 'The Way You Look Tonight'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Quarter to Three', 'Hello Mary Lou', 'Surrender', 'I love How You Love Me', 'School is Out', 'The Way You Look Tonight'],
['Wooden Heart', 'Running Scared', 'Take Good Care of My Baby'],
['A Hundred Pounds of Clay', 'Traveling Man','Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['I Fall to Pieces', 'Will You Love Me Tomorrow', 'Sad Movies', 'Traveling Man', 'Hello Mary Lou'],
['Where the Boys Are', 'Shop Around', 'A Hundred Pounds of Clay', 'The Mountains High', 'Calendar Girl'],
['Little Sister', 'Runaround Sue', 'Walk Right Back', 'The Way You Look Tonight'],
['Mother-in-Law', 'A Hundred Pounds of Clay', 'Shop Around', 'Calendar Girl', 'Blue Moon', 'Little Sister', 'Hello Mary Lou'],
['Calendar Girl', 'Runaround Sue', 'A Hundred Pounds of Clay', 'Walk Tight Back', 'Stand By Me', 'Hats Off to Larry'],
['Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['Traveling Man'],
['Traveling Man'],
['Hello Mary Lou', 'Little Sister', 'Runaround Sue'],
['Tossing and Turning'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['Runaway','Raindrops', 'Dedicated to the One I Love'],
['Michael', 'Crying'],
['Traveling Man', 'Hello Mary Lou'],
['Take Good Care of My Baby', 'Raindrops', 'Will You Love Me Tomorrow'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['Where the Boys Are', 'Bristol Stomp', 'A Hundred Pounds of Clay', 'The Mountains High'],
['Stand By Me', 'Those Oldies but Goodies', 'His Latest Flame', 'Spanish Harlem'],
['Mama Said', 'Take Good Care of My Baby', 'Moody River'],
['Traveling Man', 'Hello Mary Lou'],
['Traveling Man', 'Hello Mary Lou'],
['Traveling Man', 'Hello Mary Lou'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Mama Said'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['Angel Baby', 'Little Sister', 'Runaround Sue', 'Surrender'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['School Is Out', 'Moody River', 'Traveling Man', 'Hello Mary Lou'],
['Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['Take Good Care of My Baby', 'RainDrops', 'Where the Boys Are'],
['Will You Love Me Tomorrow', 'Dedicated to the One I Love'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['Take Good Care of My Baby', 'RainDrops', 'Where the Boys Are'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['Will You Love Me Tomorrow', 'Dedicated to the One I Love'],
['Ive Told Every Little Star', 'Stand By Me', 'Baby Blue'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['Take Good Care of My Baby', 'RainDrops', 'Where the Boys Are'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['Will You Love Me Tomorrow', 'Dedicated to the One I Love'],
['Traveling Man', 'Hello Mary Lou'],
['Take Good Care of My Baby', 'RainDrops', 'Where the Boys Are'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['Take Good Care of My Baby', 'RainDrops', 'Where the Boys Are'],
['Will You Love Me Tomorrow', 'Dedicated to the One I Love'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou'],
['Take Good Care of My Baby', 'RainDrops', 'Where the Boys Are'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['A Hundred Pounds of Clay', 'Traveling Man', 'Hello Mary Lou', 'Calendar Girl'],
['Walk Right Back', 'Moody River', 'Hats Off to Larry']
]

Click on the button to select and copy all songs to the clipboard.

Run Spyder and create a new file.
First come up with a descriptive name for the dataset: music was used in this example
The entire dataset is encloded in square brackets.
Each customer's downloaded songs are enclosed in single quotation marks and square brackets.
There is a comma separating each song.
At the end of each line is a comma. All except the last item.
Spacing matters. Don't leave extra spaces inside the single quotes.
Put the downloaded dataset onto your clipboard by clicking on the Copy text button.
Using CTRL+V paste the code into Spyder.
Save your Python file using a .py extension in your working folder.

Adding the code to the project.

Click on the button to select and copy all songs to the clipboard.

Click on the Copy text button to select all the code and put it on the clipboard.
Paste the code into Spyder just after the dataset containing the music.
Save the file using a .py extension.
To run the program, highlight the entire dataset then press the F9 key.
Next move down to the line to import pandas and press F9.
Continue pressing F9 to execute each line of code.
The results will appear in the console.
Now let's examine those results.

Out[7]: 
     A Hundred Pounds of Clay      ...       Wooden Heart
0                       False      ...              False
1                       False      ...              False
2                       False      ...              False
3                       False      ...              False
4                       False      ...              False
5                       False      ...              False
6                       False      ...              False
7                       False      ...              False
8                       False      ...              False
9                       False      ...              False
10                      False      ...              False
11                      False      ...              False
12                      False      ...              False
13                      False      ...              False
14                      False      ...              False
15                      False      ...              False
16                      False      ...              False
17                      False      ...              False
18                      False      ...              False
19                      False      ...              False
20                      False      ...              False
21                      False      ...              False
22                      False      ...              False
23                      False      ...              False
24                      False      ...              False
25                       True      ...              False
26                       True      ...              False
27                       True      ...              False
28                       True      ...              False
29                       True      ...              False
..                        ...      ...                ...
148                      True      ...              False
149                      True      ...              False
150                      True      ...              False
151                     False      ...              False
152                     False      ...              False
153                      True      ...              False
154                     False      ...              False
155                      True      ...              False
156                      True      ...              False
157                     False      ...              False
158                     False      ...              False
159                     False      ...              False
160                      True      ...              False
161                     False      ...              False
162                     False      ...              False
163                      True      ...              False
164                      True      ...              False
165                      True      ...              False
166                      True      ...              False
167                      True      ...              False
168                      True      ...              False
169                      True      ...              False
170                     False      ...              False
171                      True      ...              False
172                      True      ...              False
173                      True      ...              False
174                      True      ...              False
175                      True      ...              False
176                      True      ...              False
177                     False      ...              False

[178 rows x 54 columns]

Pressing F9 key on 'df' shows a portion of the dataframe.

Column one shows the number of items 0 through 177.

There are 178 customers' downloaded songs in the dataframe.

The next column represents the first song alphabetically, A Hundred Pounds of Clay.

The next column, ... represents all the additional columns (songs) between the first one and last one.

The next column shows the last song alphabetically listed in the dataset, Wooden Heart.

Customer 0, did not download either A Hundred Pounds of Clay or Wooden Heart.

Customer 25 did download A Hundred Pounds of Clay. So did customer 26, 27, 28 and 29

Press F9 again to see items downloaded at least 40% of the time.

frequent_itemsets = apriori(df, min_support=0.40,use_colnames=True)

frequent_itemsets
Out[11]:
support itemsets
0 0.415730 (A Hundred Pounds of Clay)
1 0.432584 (Hello Mary Lou)
2 0.438202 (Traveling Man)
3 0.415730 (Hello Mary Lou, Traveling Man)

A Hundred Pounds of Clay was downloaded by 41.5730 percent of all customers in the dataset.

Hello Mary Lou was downloaded by 43.2584 percent of all customers.

Traveling Man was downloaded by 43.8202 percent of all listeners.

41.4730 of all listeners download both Hello Mary Lou and Traveling Man.

To construct a playlist, we would want to pay close attention to songs that were downloaded by over 40 percent of our listners.

Can we see any relationships between songs in this group?

Upon closer examination, we can discover that Traveling Man and Hello Mary Lou were recorded by the same artist, Ricky Nelson.

frequent_itemsets = apriori(df, min_support=0.20,use_colnames=True)
frequent_itemsets
Out[15]:
support itemsets
0 0.415730 (A Hundred Pounds of Clay)
1 0.342697 (Angel Baby)
2 0.241573 (Calendar Girl)
3 0.432584 (Hello Mary Lou)
4 0.365169 (Little Sister)
5 0.365169 (Runaround Sue)
6 0.359551 (Surrender)
7 0.438202 (Traveling Man)
8 0.230337 (A Hundred Pounds of Clay, Calendar Girl)
9 0.387640 (Hello Mary Lou, A Hundred Pounds of Clay)
10 0.376404 (Traveling Man, A Hundred Pounds of Clay)
11 0.342697 (Angel Baby, Little Sister)
12 0.342697 (Angel Baby, Runaround Sue)
13 0.342697 (Angel Baby, Surrender)
14 0.207865 (Hello Mary Lou, Calendar Girl)
15 0.415730 (Hello Mary Lou, Traveling Man)
16 0.353933 (Little Sister, Runaround Sue)
17 0.342697 (Little Sister, Surrender)
18 0.342697 (Runaround Sue, Surrender)
19 0.207865 (Hello Mary Lou, A Hundred Pounds of Clay, Cal...
20 0.376404 (Hello Mary Lou, Traveling Man, A Hundred Poun...
21 0.342697 (Angel Baby, Runaround Sue, Little Sister)
22 0.342697 (Angel Baby, Surrender, Little Sister)
23 0.342697 (Angel Baby, Runaround Sue, Surrender)
24 0.342697 (Little Sister, Runaround Sue, Surrender)
25 0.342697 (Angel Baby, Runaround Sue, Surrender, Little ...

If we change the minimum support to 20% we get the above results.

We can also see some triplets, ie three songs that were downloaded.

Our playlist might include those three songs streamed back to back to make it easier for our listeners to download them.

We would want our playlist to also repeat the most popular songs.

frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x :len(x))

frequent_itemsets
Out[15]:
support itemsets length
0 0.415730 (A Hundred Pounds of Clay) 1
1 0.342697 (Angel Baby) 1
2 0.241573 (Calendar Girl) 1
3 0.432584 (Hello Mary Lou) 1
4 0.365169 (Little Sister) 1
5 0.365169 (Runaround Sue) 1
6 0.359551 (Surrender) 1
7 0.438202 (Traveling Man) 1
8 0.230337 (Calendar Girl, A Hundred Pounds of Clay) 2
9 0.387640 (Hello Mary Lou, A Hundred Pounds of Clay) 2
10 0.376404 (Traveling Man, A Hundred Pounds of Clay) 2
11 0.342697 (Little Sister, Angel Baby) 2
12 0.342697 (Angel Baby, Runaround Sue) 2
13 0.342697 (Surrender, Angel Baby) 2
14 0.207865 (Hello Mary Lou, Calendar Girl) 2
15 0.415730 (Traveling Man, Hello Mary Lou) 2
16 0.353933 (Little Sister, Runaround Sue) 2
17 0.342697 (Surrender, Little Sister) 2
18 0.342697 (Surrender, Runaround Sue) 2
19 0.207865 (Hello Mary Lou, Calendar Girl, A Hundred Poun... 3
20 0.376404 (Traveling Man, Hello Mary Lou, A Hundred Poun... 3
21 0.342697 (Little Sister, Angel Baby, Runaround Sue) 3
22 0.342697 (Little Sister, Surrender, Angel Baby) 3
23 0.342697 (Surrender, Angel Baby, Runaround Sue) 3
24 0.342697 (Surrender, Little Sister, Runaround Sue) 3
25 0.342697 (Little Sister, Surrender, Angel Baby, Runarou... 4

Pressing F9 key two more times produces a list where 35% of listeners downloaded these songs and it also shows the length, which translates into the number of songs in the itemset that match this criteria.

For example, single downloads include: A Hundred pounds of Clay, Angel Baby, Calendar Girl, Hello Mary Lou, Little Sister, Runaround Sue, Surrender and Traveling Man.

There were 11 customers that downloaded two songs, five that downloaded 3 songs and one that downloaded 4 songs.

Press F9 two more times and you should get the following results.

frequent_itemsets[ (frequent_itemsets['length'] == 2) & (frequent_itemsets['support'] >=0.35)]
Out[16]:
support itemsets length
9 0.387640 (Hello Mary Lou, A Hundred Pounds of Clay) 2
10 0.376404 (A Hundred Pounds of Clay, Traveling Man) 2
15 0.415730 (Hello Mary Lou, Traveling Man) 2
16 0.353933 (Runaround Sue, Little Sister) 2

For example, from this data you can determine that .415730 of your listeners, downloaded Hello Mary Lou and Traveling Man.

To see three songs downloaded by 25% of your listeners, press F9 to highlight that line.

You should see the following results.

n [17]: frequent_itemsets[ (frequent_itemsets['length'] == 3) & (frequent_itemsets['support'] >=0.25)]
Out[17]:
support itemsets length
20 0.376404 (Hello Mary Lou, A Hundred Pounds of Clay, Tra... 3
21 0.342697 (Runaround Sue, Little Sister, Angel Baby) 3
22 0.342697 (Surrender, Little Sister, Angel Baby) 3
23 0.342697 (Surrender, Runaround Sue, Angel Baby) 3
24 0.342697 (Surrender, Runaround Sue, Little Sister) 3

The next information shows you that 25% of your listeners, downloaded 4 songs: Surrender, Runaround Sue, Little Sister and Angel Baby.

frequent_itemsets[ (frequent_itemsets['length'] == 4) & (frequent_itemsets['support'] >=0.20)]
Out[18]:
support itemsets length
25 0.342697 (Surrender, Runaround Sue, Little Sister, Ange... 4

Day 2 : Creating your own dataset in Python

Now it is time for you to create your own dataset in Python.

First decide on a category of products that people buy either in store or on line.

Here are a few suggestions.

Appliances
Apps and Games
Arts and Crafts
Automotive Parts
Beauty Products
Books
CD's & Vinyl
Cell Phones & Accessories
Clothing
Digital Music
Electronics
Luggage
Movies & TV
Office Products
Pet Supplies
Software
Sports & Outdoors
Tools
Toys & Games
Video Games

Once you have decided on a category, Open Spyder, Key in the name of your dataset, type an equal sign, key in a [ and press enter.

Begin each customer's order with [ and end with ],

Put each item ordered in single quotes.

Keep saving the project as you go.

Once you get a reprentative amount of items for the itemset, use CTRL + C to put a section on the clipboard and then paste those at the end of the dataset.

Create at least 200 individual customer orders with multiple purchases.

When creating the dataset, make sure that there are multiple customers purchasing the same items to get useable results.

Add the additional Python code to the project.

The code is the same as the oldie's music code.

Run you program.

Day 3: Pandas DataFrames, Creating the dataset using csv format using Notepad

Instead of coding the dataset directly into Python code, the dataset can be read into the program using a comma separated value file.

A csv file can easily be created using Spyder, Notepad, Notepad++ or any othe text editor.

I used the same subject matter, Billboard's popular songs and created the dataset using Notepad text editor.

The file appears below. Put it on the clipboard and make a copy of it for yourself. Remember to use the .csv extension when saving the file.

Notice that the file has a heading: Song, Artist, Year, GEneres, Number.

Song,Artist,Year,Genres,Number
Tossing and Turning,Bobby Lewis,1961,Rythum and Blues,Number 1
I Fall to Pieces,Patsy Cline,1961,Country,Number 2
Michael,The Highwaymen,1961,Folk,Number 3
Crying,Roy Orbison,1961,Ballads,Number 4
Runaway,Del Shannon,1961,Rock and Roll,Number 5
My True Story,The Jive Five,1961,Rythum and Blues,Number 6
Pony Time,Chubby Checker,1961,Rock and Roll,Number 7
Wheels,The StringALongs,1961,Instrumental,Number 8
Raindrops,Dee Clark,1961,Rhythum and Blues,Number 9
Wooden Heart,Joe Dowell,1961,Rock and Roll,Number 10
Calcutta,Lawrence Welk,1961,Instrumental,Number 11
Take Good Care of My Baby,Bobbie Vee,1961,Rock and Roll,Number 12
Running Scared,Roy Orbison,1961,Ballads,Number 13
Dedicated to the One I Love,The Shirelles,1961,Doo-Wap,Number 14
Last Night,The Mar-Keys,1961,Instrumental,Number 15
Will You Love Me Tomorrow,The Shirelles,1961,Doo-Wap,Number 16
Exodus,Ferrante & Techier,1961,Instrumental,Number 17
Where the Boys Are,Connie Francis,1961,Pop,Number 18
Hit The Road Jack,Ray Charles,1961,Rhytum and Blues,Number 19
Sad Movies Make Me Cry,Sue Thompson,1961,Pop,Number 20
Mother-in-Law,Ernie K-Doe,1961,Rhythum and Blues,Number 21
Bristol Stomp,The Dovells,1961,Doo-Wap,Number 22
Traveling Man, Ricky Nelson,1961,Pop,Number 23
Shop Around, The Miracles,1961,Rhythum and Blues,Number24
The Boll Weevil Song,Brock Benton,1961,Blues,Number 25
A Hundred Pounds of Clay, Gene McDaniels,1961,Rock and Roll,Number 26
The Mountains High, Dick and Dee Dee,1961,Rhythum and Blues,Number 27
Don't Worry, Marty Robbins,1961,Country/Pop,Number 28
On the Rebound,Floyd Cramer,1961,Instrumental,Number 29
Portrait of My Love,Steve Lawrence,1961,Pop,Number 30
Quarter to Three,Gary U.S. Bonds,1961,Rock and Roll,Number 31
Who Put the Bomp (in the Bomp_Bomp_Bomp),Barry Mann,1961,Doo-Wap,Number 32
Calendar Girl,Neil Sedaka,1961,Pop,Number 33
I Like it Like That,Chris Kenner,1961,Rhythum and Blues,Number 34
Apache,Jergen Ingmann,1961,Instrumental,Number 35
Don't Bet Money Honey,Linda Scott,1961,Pop,Number 36
Without You,Johnny Tilotsen,1961,Country,Number 37
Wings of a Dove,Ferlin Husky,1961,Country,Number 38
Little Sister,Elvis Presley,1961,Rock and Roll,Number 39
Blue Moon,The Marcels,1961,Doo-Wap,Number 40
Daddy's Home,Shep and the Limelites,1961,Doo-Wap,Number 41
This Time,Troy Shondell,1961,Pop,Number 42
I Don't Know Why but I Do,Clarence Frogman Henry,1961,Pop,Number 43
Asia Minor,Kokomo,1961,Instrumental,Number 44
Hello Walls,Faron Young,Country,1961,Number 45
Runaround Sue,Dion,1961,Rock and Roll,Number 46
Yello Bird,Arthur Lyman,Jazz,1961,Number 47
Hurt,Timi Youro,Rhythum and Blues,1961,Number 48
Hello Mary Lou,Ricky Nelson,1961,Rock and Roll,Number 49
There's a Moon Out tonight,The Capris,1961,Doo-Wap,Number 50
Surrender,Elvis Presley,1961,Rock and Roll,Number 51
I Love How You Love Me,The Paris Sisters,1961,Girl Group,Number 52
YaYa,Lee Dorsey,1961,Pop,Number 53
School Is Out,Gary U.S. Bonds,1961,Rock and Roll,Number 54
Mexico,Bob Moore,1961,Instrumental,Number 55
You Don't Know What You've Got(Until You Lose It,Ral Donner,1961,Rock and Roll,Number 56
Walk Right Back,The Everly Bothers,1961,Rock and Roll,Number 57
The Way You Look Tonight,The Letterman,1961,Pop,Number 58
Moody River,Pat Boone,1961,Pop,Number 59
One Mint Julip,Ray Charles,1961,Rhythum and Blues,Number 60
Take Good Care of Her,Adam Wade,1961,Rhythum and Blues,Number 61
Gee Whiz,Carla Thomas,1961,Soul, Number 62
Stand By Me, Ben E King,1961,Rhythum and Blues,Number 63
Spanish Harlem,Ben E King,1961,Rhythum and Blues,Number 64
It's Gonna Work Out Fine,Ike and Tina Turner,1961,Rock and Roll,Number 65
Baby Blue,The Echoes,1961,Rock and Roll,Number 66
Baby Sittin Boogie,Buzz Cliford,1961,Rhythim and Blues,Number 67
Hats Off to Lary,Del Shannon,1961,Rock and Roll,Number 68
Those Oldies but Goodies,Little Caesar & The Romans,1961,Pop,Number 69
The Fly,Chubby Checker,1961,Rock and Roll,Number 70
Marie's the Name of His Latest Flame,Elvis Presley,1961,Rock and Roll,Number 71
Wonderland by Night,Bert Kaempfert,1961,Jazz,Number 72
Bless You,Tony Orlando,1961,Pop,Number 73
I've Told Every Little Star,Linda Scott,1961,Rock and Roll,Number 74
One Track Mind,Bobby Lewis,1961,Rock and Roll,Number 75
Angel Baby,Rosie and the Originals,1961,Doo-Wap,Number 76
Pretty Little Angel Eyes,Curtis Lee,1961,Doo-Wap,Number 77
Think Twice,Brook Benton,1961,Rock and Roll,Number 78
Does Your Chewing Gum Lose it Flavour,Lonnie Donegan,1961,Pop,Number 79
Breakin' in a Brand New Broken Heart,Connie Francis,1961,Pop,Number 80
Mama Said,The Shirelles,1961,Doo-Wap,Number 81
Let the Four Winds Blow,Fats Domino,1961,Rock and Roll,Number 82
The Writing on The Wall,Adam Wade,1961,Pop,Number 83
My Kind of Girl,Mat Monro,1961,Pop,Number 84
Tonight My Love,Paul Anka,1961,Pop,Number 85
San Antonio Rose,Floyd Cramer,1961,Instrumental,Number 86
Big Bad John,Jimmy Rodgers,1961,Country,Number 87
Good Time Baby,Bobby Rydell,1961,Rock and Roll,Number 88
Rubber Ball,Bobby Vee,1961,Rock and Roll,Number 89
Missing You, Ray Peterson,1961,Pop,Number 90
Dum Dum,Brenda Lee,1961,Pop,Number 91
I'm Gonna Knock on Your Door,Eddie Hodges,1961,Pop,Number 92
You Can Depend on Me,Brenda Lee,1961,Pop,Number 93
Let's Twist Again,Chubby Checker,1961,Rock and Roll,Number 94
Take Five,The Dave Brubeck Quartet,1961,Jazz,Number 95
Are You Lonesome Tonight,Elvis Presley,1961,Rock and Roll,Number 96
Sea of Heartbreak,Don Gibson,1961,Country,Number 97
More Money for You and Me,The Four Preps,1961,Pop,Number 98
You Must Have Been a Beautifl Baby,Bobby Darin,1961,Rock and Roll,Number 99
Please Stay,The Drifters,1961,Doo-Wap,Number 100

The file is arranged in columns with Song, Artist, Year, Genres, Number. After the column headings, each song is listed on a line of their own followed by the name of the Artist, the Year, the Genres, Number, which is how high it scored on Bilboard's chart for 1961.

Save the file and call it "oldies.csv"

The next information needed is the Python code, which you can see below. Put it on the clipboard and paste it into Spyder and save with a .py extension.

In Python there are a few terms that are helpful in understanding how the program works. We will use different code for this next example. Here are some key terms.

Sets or itemsets are an unordered collection of items. Every element is unique. There are no duplicates. A set is created by placing all the items inside curly braces{} separated by commas or by reading it ito the program using csv files or Excel. An example of a set is like the one above. using Song,Artist, Year, Genres, and Number

Data Frames are like tables. They are organized in rows and columns. Data Frames can load data through a number of different data structures and files, including lists and dictionaries, csv files, Excel files and database recods.

# -*- coding: utf-8 -*-
"""
Created on Sun Jan 12 10:50:51 2020

@author: jerrybelch
"""

import pandas as pd
import sys
sys.__stdout__= sys.stdout
dataset = pd.read_csv('oldies.csv')
print(dataset)
print(dataset.loc[30:69])# prints rows 30 to 69 the missing ones from above
dataset.sample(10) #prints 10 records randomly from the rows
dataset[dataset.Song=='Traveling Man']#print a song
dataset[dataset.Artist=='Chubby Checker']#print out an artist
print(dataset.loc[:,['Song','Artist']])# print all songs and their artists 
print(dataset.loc[:,['Song','Number']]) #print all songs and theeir number in the chart standings
print(dataset.loc[:,['Artist','Genres']])# print all artist and the kind of music they make
dataset[dataset.Genres=='Doo-Wap']# print all Doo-Wap songs
dataset[dataset.Genres=='Rock and Roll']#print all rock and roll songs
dataset[dataset.Genres=='Rhythum and Blues']#print all r&b tunes
dataset[dataset.Genres=='Jazz']#print al jazz items
dataset[dataset.Genres=='Instrumental']#print all instrumental songs
dataset[dataset.Genres=='Country']#print all country songs
dataset[dataset.Genres=='Pop']#print out all pop songs
print(dataset.loc[[0,1,2,3,4,5,6,7,8,9]]) # print specific rows The top ten

Running the Code

Highlight all the code and press F5. After running the code, you can also highlight certain sections of code and press F9 to see the output for that line of code.

Look at the first results in the console. These results were the output of the line that says Print(dataset)

Answer the following questions.

How many songs are in the dataset?
What song was number 21?

You will notice that the entire data set does not appear in the console, just lines 0-29.

print(dataset.loc[30:69])# prints rows 30 to 69 the missing ones from above.

Answer these questions for rows 30 to 69.

Little Sister was number?
What song was hit number 49?

dataset.sample(10) prints at random ten lines.

What ten songs did you get when you looked at this block of output in the console?

You can can also search the dataset for specific information like a particular song.

The line that states dataset[dataset.Song=='Traveling Man'] will search the dataset and output the line that Traveling Man appears on.

Who sang the song and what genres was it?

dataset[dataset.Artist=='Chubby Checker'] produces the following output.

dataset[dataset.Artist=='Chubby Checker']#print out an artist Out[4]:

Song	Artist	Year	Genres	Number
Pony Time	Chubby Checker	1961	Rock and Roll	Number 7
The Fly	Chubby Checker	1961	Rock and Roll	Number 70
Let's Twist Again	Chubby Checker	1961	Rock and Roll	Number 94

The .loc function, slices part of your data for additional scrutiny.

print(dataset.loc[:,['Song','Artist']] outputs songs and artists: the first 30, 0 -29 then 70 -99

Looking at the next two lines you see that you can slice out lists containing songs and number on the chart and artist and genres.

You can aditionally look for all genres that are classified as Doo-Wap, but using the following code. dataset[dataset.Genres=='Doo-Wap']

Similarily you can obtain output for other genres using the next six lines of code.

The last line of Python code prints out the top ten songs which happen to be the first ten rows of data in the dataset.

print(dataset.loc[[0,1,2,3,4,5,6,7,8,9]]) # print specific rows The top ten
Index Song Number
0 Tossing and Turning Number 1
1 I Fall to Pieces Number 2
2 Michael Number 3
3 Crying Number 4
4 Runaway Number 5
5 My True Story Number 6
6 Pony Time Number 7
7 Wheels Number 8
8 Raindrops Number 9
9 Wooden Heart Number 10

[10 rows x 5 columns]

Day 4: Creating Your Own Dataset using csv format.

Pick a topic and create a Dataset containg at least 100 rows and five columns.
Change the python code to:

Read your csv file, print the Dataset.
Print any missing rows in the above printout.
Print a random sample of 5 items.
Search for a particular item in the first column.
Search for a particular item in the second column.
Printout the first two columns.
Print out the first and last columns of the dataset.
Search for an items in the fourth column.
Print out a series of rows.

Day 5: Example of Dataset created using Excel

Let's assume that you are the Vice President in charge of Human Relations for your company.

You have access to your company's payroll register, which is in the form of an Excel spreadsheet called payroll service.xls

You are going to work with Python to make the following decisions.

Should the Event Planner, Marilyn Eckert, be given a raise after her yearly evaluation?
Should all married employees be given the opportunity for a new life insurance policy that has been made available to your company?
Research shows that your Marketing Associates make less than their counterparts in other companies. Should they get a $2.00 an hour raise?
Management is concerned about overtime. You need to produce a report showing overtime worked over the last month.

Below is the code in Python using Panda dataframes to help you find answers to your questions.

Download the Excel spreadsheet and save it in your working directory with your Python programs.

Payroll Download

Run the Python program to see your results.

You can see how helpful Python can be in analyzing data to make good business decisions.

Day 6: Student Creating dataset in Excel saved as an Excel file

Now its your turn to create an Excel spreadsheet showing sales data for 24 salespersons, for the month to help analyze data and make sound business decisions.

Day 7: Example of file hosted on my Google Drive
In this lesson you will see how to take an Excel spreadsheet and make it available on Google Drive to use with your Python Program. You will also convert the Excel spreadsheet file to a csv file and upload it to Google Drive.

Let's assume that you are in the IT department and The Sales Manager has asked you to create a Python program to help analyze sales data from the past month for its 24 salespersons and to have a way of making a presentation to the company.

There a number of ways that you can make the information available to the Sales Manager:

Send an email and attach the file, (either Excel.xlsx or csv).
Have the file executed from the company website and import it into Python.
Upload the file to Google Drive and have the Sales Manager, get a link to it in an email.

Here are the steps to upload a file from a Google Drive.

Go to Google's Chrome Browser.
Click on the Apps - Upper left-hand corner: a` series of multi-colored dots.
Select Google Drive
Click on New

Click on File Upload
Click on your Documents folder
Click on the folder where you want the file to be downloaded to
Click on the file, Toyota Sales.xlsx
Click Open

Sharing a file from Google Drive

Go to Google's Chrome Browser.
Click on the Apps - Upper left-hand corner: a series of multi-colored dots.
Select Google Drive
Right Click the file or folder you want to share.
Click Share
In the people box, key in the email address of the person you want to have the file.
Add a note to the email explaining what the file is and how you want them to use it.
Click Done

How to Use the shared file in Python

Go to the email account where the file was sent.
Find the email that was sent.
Click on it.
The attachment is labeled, in this case it is Toyota_Sales.xlsx
Open it. It should look like the image below.

Click download icon, located in upper-right hand corner.
XLSX spreadsheet file is sent to taskbar.
Click on it to open it in Excel.
Click on Enable editing
Click File.
Click Save As
Select working folder that contains your Python work
Click Save.

Using the Spreadsheet file in Python

Now let's see how we can analyze the data in this Excel file to make some decisions as to the work of the salespersons at the auto dealership.

Listed in the box below is the Python code. Copy it and paste it into Spyder.

The Sales Managers, during their presentations, want to highlight the following:

Examine Jerry's performance for the month.
Look at sales of Toyota trucks: Tacoma and Tundra.
Check out hybred sales of Camry and Prius.
Go over SUV sales: Highlander and Sequoia.
Discuss a special on Camry's and to see the salesperson who sold the most of this model.
Sort the dataset by total sales to see top salesperson to the worst.

Here is the code to import the same Excel file directly from a web page into Python.

import numpy as np
import pandas as pd
import sys
sys.__stdout__= sys.stdout
#Create a variable for the dataset url
salespersons_url = "http://janetbelch.com/Toyota Sales.xlsx"
# Assign column names to the dataset
car_names = ['Corolla', 'Camry', 'Tacoma', 'Highlander', 'LandCruiser', 'Avalon', 'Prius', 'Rav4', 'Tundra','Sequoia','Salesperson']
#Load the dataset from th url into a pandas dataframe
dataset = pd.read_csv(salespersons_url,names=car_names)
print(dataset)
dataset[dataset.Salesperson=='Jerry_Belch']#print a salesperson to see their sales
print(dataset.loc[:,['Salesperson','LandCruiser','Tundra','Tacoma']]) #print all salespeople and their truck  sales
print(dataset.loc[:,['Salesperson','Camry','Prius']])# print all hybreds
print(dataset.loc[:,['Salesperson','Highlander','Sequoia']])# print all Suv's
dataset.sort_values(by=['Camry'], inplace=True,ascending=False)#sort to see who sold the most Camrys
print(dataset)
dataset.sort_values(by=['Totals'], inplace=True,ascending=False)#sort to see who sold the most total cars
print(dataset)

Day 8: Student exercise

Create an Excel file similar to the one above using 25 sales people,(rows) and 10 items, (columns) including totals
Save it as and Excel spreadsheet and also a csv file.
Upload both files to Google Drive
In Google Drive ,send link to yourself
Copy file into Python directory
Create a program similar to the one above in Python using the file.