Using the Zip function in Python Part 2

In the last article in part 1, we went through the basics of the zip function and now we’re in a position to delve in deeper and talk about how the zip function handles different arguments and how the zip function unpacks. We will also delve into an example of how I have used the function in my own scripts.

The zip function with no arguments

The zip function without any arguments will produce an empty iterator

zip()

Output:

<zip at 0x18b85ba4f88>

A zip function with no arguments when we apply the list function this will return an empty list

list(zip())

Output:

[]

This is the simplest zip function we can talk about, but doesn’t really show is much about what you can do with the zip function.

The zip function with one argument

Zipping with one iterable shows us how the zip function works as the zip function wont another iterable to combine with. So how will python interpret this?

zip(name)
list(zip(name))

The output:

[('A',), ('a',), ('r',), ('o',), ('n',)]

We could’ve probably guessed that python would interpret this way, the second index of the tuple is blank.

Now importantly the zip function can accept an infinite amount of arguments and therefore is useful when we want to combine data.

The zip function with more than two arguments

So we covered the zip function with two arguments at the start, to give you a sense of what the zip function is all about. But what about other arguments? How will python deal with this?

name = 'Aaron'
name2 = 'Marky'
name3 = 'Smith'list(zip(name1,name2,name3))

Output:

[('A', 'M', 'S'), 
('a', 'a', 'm'), 
('r', 'r', 'i'),
('o', 'k', 't'),
('n', 'y', 'h')]

See how each first index of the string is combined into a tuple? This can be used to combine data for an infinite amount of data structures.

The zip function with unequal length iterables

Now we have gone through the basics of using the zip function, what about if the iterables are uneven? Now, this requires a bit more explanation and it is necessary to use the itertools package.

Here we import ‘izip_longest’ from itertools package. We then want to combine three lists of different lengths.

from itertools import izip_longest

a = [1,2,3,4,5]
b = ['a','b','c','d']
c = ['x','y','z']
print list(izip_longest(a,b,c))

Output:

[(1, 'a', 'x'), (2, 'b', 'y'), (3, 'c', 'z'), (4, 'd', None), (5, None, None)]

See that None items in the tuples represent missing data from the specific lists being combined.

You can also specify a different fillvalueNone is the default:

print list(izip_longest(a, b, c,fillvalue=0))
[(1, 'a', 'x'), (2, 'b', 'y'), (3, 'c', 'z'), (4, 'd', 0), (5, 0, 0)]




Unpacking of the zip function

Remember that * part of the syntax for the zip function in the first article? This is called an asterisk operator.

name = 'Aaron'
name2 = 'Smith'
pairs = list(zip(name,name2))

Now we know the output of this:

[('A', 'S'), ('a', 'm'), ('r', 'i'), ('o', 't'), ('n', 'h')]

So, we can unpack this using the asterisk operator, this will unpack each tuple.

first, last = zip(*pairs)
print(first)
print(last)

Output:

('A','a','r','o','n')
('S','m','i','t','h')

I have only briefly discussed the power of the asterisk operator. If you want to know more please see here.

So we can see now that by combining data into a zip function this can be useful when we want to be able to access two or more variables at the same time, particularly when we want to do a loop over multiple iterables. Or if we want to write data into a CSV?

Here we will put this into practice.

Example 1

The first place I used this function was in a script I was doing to download pdf’s from a website.

Now the script was written to grab multiple pdfs and I needed to loop over multiple pdf links and have access to its corresponding filename at the same time. The pdf links were in lists and the corresponding filenames were in another list. This was the perfect opportunity to use the zip function! This would allow me to access both the pdf link and its corresponding filename at the same time. We could then loop through this zip function to grab each pdf link and it’s the corresponding filename to download the pdf.

Here is the piece of code that is essential for grabbing the pdf.

import requests
folder = 'c:/Users/Aaron/'def grab_pdf(pdf_links,names):
    for pdf, name in zip(pdf_links,names):
        myfile = requests.get(pdf, allow_redirects=True)
        file_name = folder + name + '.pdf'
    
        with open(file_name,'wb') as Pypdf:
            Pypdf.write(myfile.content)

Let’s go through this code line by line.

import requests
folder = 'c:/Users/Aaron/'

Here we have imported the requests package (if you’re not familiar with the request package, please see here) and we’ve also defined the folder name.

def grab_pdf(pdf_links,names):

Here we have defined the function grab_pdf which the lists pdf_links and names are fed into the function.

for pdf, name in zip(pdf_links,names):

We instigate a for loop here over the zip function, we assign the variables pdf and name to the zip function. By doing this, we can now access each object inside the tuple by the variables pdf and name.

If we were to print pdf and name in the first loop iteration we would have separated the first tuple the zip function creates. This is the key to the zip function, we can access both the individual pdf link and the name of that pdf at the same time and do something with both of those variables.

myfile = requests.get(pdf, allow_redirects=True)
        file_name = folder + name + '.pdf'

We then assign the variable myfile and grab the individual pdf. Note the allow_redirects=True part means that if the link provided redirects to another URL, the request will follow that link. We have also created a file_name variable accessing the corresponding pdf title.

with open(file_name,'wb') as Pypdf:
            Pypdf.write(myfile.content)

Here we use the with statement, this makes dealing with the open function simpler as we don’t need to define when to close the open function. Please see here for more on the with statement.

The open function is used with our newly created filename and we write the binary data grabbed by the requests module to that filename. Note that myfile.content is essential for grabbing the binary data. If you’re not familiar with the open function please see the file handling section on W3School.

Also if you’re not familiar with grabbing pdf’s with python please see here for a useful article on that.

This is the second article of three, please stick around for the third article which will give some more examples of how the zip function can be used in practice!

Please see here for further details about what I’m up to project-wise on my blog and other posts. I’d be grateful for any comments or if you want to collaborate or need help with python please do get in touch.

If you want to get in contact with me, please do so here asmith53@ed.ac.uk.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s