Skip to article frontmatterSkip to article content

Python for Data 6: Tuples and Strings

In the last lesson we learned about lists, Python’s jack-of-all trades sequence data type. In this lesson we’ll take a look at 2 more Python sequences: tuples and strings.

Tuples

Tuples are an immutable sequence data type that are commonly used to hold short collections of related data. For instance, if you wanted to store latitude and longitude coordinates for cities, tuples might be a good choice, because the values are related and not likely to change. Like lists, tuples can store objects of different types.

Construct a tuple with a comma separated sequence of objects within parentheses:

my_tuple = (1,3,5)

print(my_tuple)
(1, 3, 5)

Alternatively, you can construct a tuple by passing an iterable into the tuple() function:

my_list = [2,3,1,4]

another_tuple = tuple(my_list)

another_tuple
(2, 3, 1, 4)

Tuples generally support the same indexing and slicing operations as lists and they also support some of the same functions, with the caveat that tuples cannot be changed after they are created. This means we can do things like find the length, max or min of a tuple, but we can’t append new values to them or remove values from them:

another_tuple[2]     # You can index into tuples
1
another_tuple[2:4]   # You can slice tuples
(1, 4)
# You can use common sequence functions on tuples:

print( len(another_tuple) )   
print( min(another_tuple) )  
print( max(another_tuple) )  
print( sum(another_tuple) )  
4
1
4
10
## AttributeError: 'tuple' object has no attribute 'append'
# another_tuple.append(1)    # You can't append to a tuple
in: del another_tuple[1]      # You can't delete from a tuple

out: TypeError: 'tuple' object doesn't support item deletion

You can sort the objects in tuple using the sorted() function, but doing so creates a new list containing the result rather than sorting the original tuple itself like the list.sort() function does with lists:

sorted(another_tuple)
[1, 2, 3, 4]
list1 = [1,2,3]

tuple1 = ("Tuples are Immutable", list1)

tuple2 = tuple1[:]                       # Make a shallow copy

list1.append("But lists are mutable")

print( tuple2 )                          # Print the copy
('Tuples are Immutable', [1, 2, 3, 'But lists are mutable'])

To avoid this behavior, make a deepcopy using the copy library:

import copy

list1 = [1,2,3]

tuple1 = ("Tuples are Immutable", list1)

tuple2 = copy.deepcopy(tuple1)           # Make a deep copy

list1.append("But lists are mutable")

print( tuple2 )                          # Print the copy
('Tuples are Immutable', [1, 2, 3])

Strings

We already learned a little bit about strings in the lesson on basic data types, but strings are technically sequences: immutable sequences of text characters. As sequences, they support indexing operations where the first character of a string is index 0. This means we can get individual letters or slices of letters with indexing:

my_string = "Hello world"

my_string[3]    # Get the character at index 3
'l'

my_string[3:]   # Slice from the third index to the end
'lo world'

my_string[::-1]  # Reverse the string
'dlrow olleH'

In addition, certain sequence functions like len() and count() work on strings:

len(my_string)
11
my_string.count("l")  # Count the l's in the string
3

As immutable objects, you can’t change a string itself: every time you transform a string with a function, Python makes a new string object, rather than actually altering the original string that exists in your computer’s memory.

Strings have many associated functions. Some basic string functions include:

# str.lower()     

my_string.lower()   # Make all characters lowercase
'hello world'
# str.upper()     

my_string.upper()   # Make all characters uppercase
'HELLO WORLD'
# str.title()

my_string.title()   # Make the first letter of each word uppercase
'Hello World'

Find the index of the first appearing substring within a string using str.find(). If the substring does not appear, find() returns -1:

my_string.find("W")
-1

Notice that since strings are immutable, we never actually changed the original value of my_string with any of the code above, but instead generated new strings that were printed to the console. This means “W” does not exist in my_string even though our call to str.title() produced the output ‘Hello World’. The original lowercase “w” still exists at index position 6:

my_string.find("w")
6

Find and replace a target substring within a string using str.replace():

my_string.replace("world",    # Substring to replace
                  "friend")   # New substring
'Hello friend'

Split a string into a list of substrings based on a given separating character with str.split():

my_string.split()     # str.split() splits on spaces by default
['Hello', 'world']
my_string.split("l")  # Supply a substring to split on other values
['He', '', 'o wor', 'd']

Split a multi-line string into a list of lines using str.splitlines():

multiline_string = """I am
a multiline 
string!
"""

multiline_string.splitlines()
['I am', 'a multiline ', 'string!']

Strip leading and trailing characters from both ends of a string with str.strip().

# str.strip() removes whitespace by default

"    strip white space!   ".strip() 
'strip white space!'

Override the default by supplying a string containing all characters you’d like to strip as an argument to the function:

"xXxxBuyNOWxxXx".strip("xX")
'BuyNOW'

You can strip characters from the left or right sides only with str.lstrip() and str.rstrip() respectively.

To join or concatenate two strings together, you can us the plus (+) operator:

"Hello " + "World"
'Hello World'

Convert the a list of strings into a single string separated by a given delimiter with str.join():

" ".join(["Hello", "World!", "Join", "Me!"])
'Hello World! Join Me!'

Although the + operator works for string concatenation, things can get messy if you start trying to join more than a couple values together with pluses.

name = "Joe"
age = 10
city = "Paris"

"My name is " + name + " I am " + str(age) + " and I live in " + "Paris"
'My name is Joe I am 10 and I live in Paris'

For complex string operations of this sort is preferable to use the str.format() function or formatted strings. str.format() takes in a template string with curly braces as placeholders for values you provide to the function as the arguments. The arguments are then filled into the appropriate placeholders in the string:

template_string = "My name is {} I am {} and I live in {}"

template_string.format(name, age, city)
'My name is Joe I am 10 and I live in Paris'

Formatted strings or f-strings for short are an alternative, relatively new (as of Python version 3.6) method for string formatting. F-strings are strings prefixed with “f” (or “F”) that allow you to insert existing variables into string by name by placing them within curly braces:

# Remaking the example above using an f-string

f"My name is {name} I am {age} and I live in {city}"
'My name is Joe I am 10 and I live in Paris'

As you can see, being able to directly access variable values via variable names can make f-strings more interpretable and intuitive than older formatting methods. For more on f-strings, see the official documentation.

Wrap Up

Basic sequences like lists, tuples and strings appear everywhere in Python code, so it is essential to understand the basics of how they work before we can start using Python for data analysis. We’re almost ready to dive into data structures designed specifically data analysis, but before we do, we need to cover two more useful built in Python data structures: dictionaries and sets.