In the last article, we talked about the basics of the zip function and delved into an example. In this article, we go through two more examples of the zip function in real scripts. We will also learn about writing CSV files and touch on the pandas dataframe.
Example 2
When creating a script for films data I wanted to be able to write rows into a CSV file. Please see RealPython’s great article on writing CSV files here before reading on!
We use the CSV package to write rows one at a time. This is much easier if we zipped the data we want so each tuple represents the rows of data we want to input.
Here is the code below
import csv
with open('list.csv', 'w') as film:
writer = csv.writer(film)
writer.writerow(['Title','Date','link','imdb_link','synopsis'])
zip = [l for l in zip(title,dates,links, new_imdb2,news)]
for l in zip:
writer.writerow(l)
print('Successful grab')
Let’s go through this line by line!
import csv
with open('list.csv', 'w', newline='') as film:
We import the CSV package that comes with python. The ‘with’ statement opens the file name list.csv. Notice inside the brackets newline=’’, this ensures that data writes in a new line.
writer = csv.writer(film)
We define the variable writer here and call upon the CSV writer method to write a csv file.
writer.writerow(['Title','Date','link','imdb_link','synopsis'])
The writerow method takes a list and inputs this into a CSV file as a row. This is the main way we write data into a CSV.
Say we had lots of rows of titles, dates, links, etc… we could use for loops to write each row from the lists. Also, we can use a zip function to create a tuple that can be into as a row of data
for l in zip(title,dates,links, imdb,synopsis)]
writer.writerow(l)
We have zipped the lists ‘title’, ‘dates’, ‘links’, ’imdb’,’ synopsis’. Each tuple from the zip function will contain the corresponding title, date etc…
We then use a for loop of each tuple in the zip function and write each tuple representing a row in the CSV file.
Example 3
So, I was needing to clean up some financial statements data to use in a pandas dataframe. To feed the pandas dataframe, I had a list of several years of financial statement figures. I wanted to make a list of tuples with 6 items at a time. This meant that when I fed the list to the pandas data frame, the tuples created would become my rows. Now to create these tuples which could represent rows of a dataframe, the zip function was a great way to do that!
Here is the code down below, I have only provided a small part of the data list for readability.
data = [Total Revenue, 129,814,000, 125,843,000, 110,360,000, 89,950,000, 85,320,000, Cost of Revenue, 43,411,000, 42,910,000, 38,353,000,34,261,000, 32,780,000]
income_data= list(zip*[(iter(data)]*6)
print(income_data)
Output:
[('Total Revenue',
'129,814,000',
'125,843,000',
'110,360,000',
'89,950,000',
'85,320,000'),
('Cost of Revenue',
'43,411,000',
'42,910,000',
'38,353,000',
'34,261,000',
'32,780,000')]
Now let’s unpack this a bit.
We have a list of financial data which we want to create a tuple of the 6 items each.
Remember the iter() method? Let’s take this step by step.
[iter(data)]
Output:
[<list_iterator at 0x2362f646a88>]
Here we create an iterator from the data list and put that iterator into a list.
Next,
[iter(data)]*6
Output:
[<list_iterator at 0x2362f646148>,
<list_iterator at 0x2362f646148>,
<list_iterator at 0x2362f646148>,
<list_iterator at 0x2362f646148>,
<list_iterator at 0x2362f646148>,
<list_iterator at 0x2362f646148>]
Here we have multiplied the list of iterators by 6 times.
Now, this is the confusing thing about iterators, they are also iterables! So because of this, we can use them in the zip argument. Here we interpret the zip function.
zip([iter(data)]*6)
Output:
<zip at 0x2362f678588>
Now a zip function returns an iterator. Remember, to get any of the data out of a zip function we need to provide a wrapper to consume this iterator.
We can apply the asterisks operator to unpack the iterator. Now for simplicity say we were to unpack only one iterator instead of six, we would get a zip function with one argument only.
list(zip(*[iter(data)]))
The output is the following
[('Total Revenue',),
('129,814,000',),
('125,843,000',),
('110,360,000',),
('89,950,000',),
('85,320,000',),
('Cost of Revenue',),
('43,411,000',),
('42,910,000',),
('38,353,000',),
('34,261,000',),
('32,780,000',)]
See how each item of the data list at the start is now put into a tuple. The zip function has had only one iterable passed to it (our iterator).
So when we multiply the [(iter(data)] by 6, the zip function processes 6 arguments, the first 6 items of the data list and put that into a tuple.
list(zip*[(iter(data)]*6)
Using a list function around this, we get a list of tuples each with 6 items inside as we see below.
The Output is:
[('Total Revenue',
'129,814,000',
'125,843,000',
'110,360,000',
'89,950,000',
'85,320,000'),
('Cost of Revenue',
'43,411,000',
'42,910,000',
'38,353,000',
'34,261,000',
'32,780,000')]
Phew! Now we’ve come a long way in getting to the final bit. Please feel free to read over this a few times to get it into your heads.
The reason we went through this is now this is it’s easy to work with for the pandas data frame.
This wraps up the articles on the zip function! I hope that you have learned something from this and hope to see you soon in the future!
Further reading
1. For more information on the iterator and iterables confusion please see here
2. For more information on looping and what we call the iterator protocol please see here and here
Please see here for further details about what I’m up to project-wise on my blog and other posts. I’d be grateful for any comments or if you want to collaborate or need help with python please do get in touch.
If you want to get in contact with me, please do so here asmith53@ed.ac.uk.