String Operations

In the first lesson you learnt about String data type. A String data type holds text value. Value can be enclosed in a single, double or triple quote. Triple single or double quote strings are mostly used to represent strings that span multiple lines.

double_quote_word = "Python"
print(double_quote_word)

single_quote_word = 'Python'
print(single_quote_word)

multi_line_word = '''First line
  Second line''' # any white spaces are maintained 
print(multi_line_word)

multi_line_word = """First line
  Second line""" # same as the last example
print(multi_line_word)

There are many operations that you can invoke on a String object.

Here is the list of some of the operations used on a string variable;

word = 'Python'

Operations	Example	Output	Comments
Positive positional index	word[0]	'P'	positional index starts with 0, prints first character
Negative positional index	word[-1]	'n'	first position from the end. -0 is same as 0
Slicing	word[0:2]	'Py'	characters from position 0 (included) to 2 (excluded)
Slicing with default start position	word[:2]	'Py'	starts from 0 - default and 2nd position (excluded)
Slicing with default end position	word[2:]	'thon'	starts from position 2 to end of the string
Concatenation	"My " + word + " 3.6"	'My Python 3.6'	You can join multiple strings with a + operator
Build in function len	len(word)	6	len is one of the built in functions which is part of the Python interpreter similar to print(). This returns the size of the string

Points to note

Strings are immutable - A String once created, cannot be altered in anyway. In all the above operations a new String is returned after applying the operations. Hence assigning a new value to an indexed position results in an error. E.g., word[0] = 'K' is not allowed.
Trying to get an indexed position which is out of range results in IndexError. E.g., word[9] will throw IndexError.
However slice operations on out of range positions does not throw IndexError. E.g., word[9:12] gracefully exits without any error and with empty output.

String concatenation

Two strings can be added together with a plus (+) operator.


a = 'abc' + "def"
print(a)

Output:
abcdef

However you cannot add a string and any other data type.

a = 'abc' + 10 is invalid a = 'abc' + True is invalid

However you can convert other data types using the 'str' function


a = 'abc' + str(10) + str(True)
print(a)

Output:br> abc10True

String Special Cases

Sometimes, you want Python interpreter to parse certain characters in a string differently. This may be because these characters mean different to the parser. Here are some examples

Single or Double Quotes part of String

You want to add a single or double quote within a String literal

a = 'I'm having fun!'

In the above String literal since there is a single quote used to represent I'm the interpreter will throw an error while evaluating the String literal as it will parse the String literal upto the second single quote and thinks the rest of the String is in error. You can fix this in two ways

Enclose the String within double quotes instead of single quotes. This can be applied to the corollary as well. I.e., having a double quote within a String enclosed within single quotes.
Using Escape Sequence - described next.

Escape Sequence (\)

Add a '\' character to single quote. By adding this backslash character, you are essentially telling the interpreter to ignore the character next to it in a traditional way and consider that as a String literal.

a = 'I\'m having fun'

This backslash is called escape sequence, because the backslash causes the subsequent character sequence to “escape” its usual meaning.

Here are more examples of applying escape sequence


a = 'foo \
bar'      # The escape here is to ignore the newline and consider the bar within the same line.
print(a)

print("a\tb")  # adds a tab between a and b
print("a\nb")  # adds a newline between a and b

Output:
foo bar

Raw String

If you do not want to escape the characters and want to represent them as-is then you use the 'r' or 'R'

Here is an example


print(r"a\tb")

Output:
a\tb

More Examples

Code	Output	Notes
print("First line \n Second line")	First line Second line	note the new line character \n which breaks the line and prints the part of sentence after \n in the second line
print(r"c:\mydir\numberedDir")	c:\mydir\numberedDir	note of the use of r to ensure \n part of \numberedDir is not interpreted as a new line character
print( "hi" + " there")	hi there	using + operator we can concatenate strings
print( 3 * "yah! ")	yah! yah! yah!	using * we can repeat the string multiple times
text = ("multi line" " improves readability" ) print(text)	multi line improves readability	putting strings within parenthesis will join the strings.

Strings are also a type of sequence. There is a built in function str to create strings. str is also used to get textual representation of any object.

Here are a few more commonly used convenience methods on Strings.

Operations ('str' represents a String object)	Example	Output	Comments
str.split()	t = 'py,book,3.6'.split(',') print(t)	['py', 'book', '3.6']	returns a list with word split at the specified delimiter (,). You can specify any delimiter. Assumes space if no delimiter is given.
str.splitlines()	t = 'py\n3.6'.splitlines() print(t)	['py', '3.6']	returns a list with word split on newline
str.capitalize()	'python'.capitalize()	'Python'	returns a string with first letter in upper case
str.count(sub[,start[,end]])	'america'.count('a')	2	counts how many times 'a' occurs in a string
str.endswith(sub[,start[,end]])	'america'.endswith('ca')	True	returns True if the word ends with a specific letter or word else False. Complementary function startswith
str.find(sub[,start[,end]])	'america'.find('me')	1	returns the index of the first substring which matches the given sub string.
str.lower()	'PYTHON'.lower()	'python'	creates and returns a new string by converting the given string to lowercase
str.upper()	'python'.upper()	'PYTHON'	creates and returns a new string by converting the given string to uppercase
str.join(iterable)	'$'.join(['a','b','c'])	'a$b$c'	creates and returns a new string by joining each element of the given iterable separated by the given string
in	'ri' in 'america'	True	Returns True if there is an 'ri' substring in 'america'. The 'in' operator can also be used to check a substring in a string

Using the in keyword with Strings in an 'if' Conditional

When it comes to finding substrings in a string, there are indeed many ways. You can use the find method on the string object. This method returns the index position of the first substring it finds. If there was no such string, then it returns -1. You can also use startsWith or endsWith methods to find the substring in the beginning or end of the string as the function name suggests.

However, there is another cool way of finding and that is by using the 'if' statement. If supposing you want to know if 'cat' substring is present in a 'caterpillar', then you can use the below code:


if 'cat' in 'catterpillar':
    print("Yes, this works too")

When you run the above code you will notice that the if conditional is satisfied with this expression.

format function

str.format() is a string formatting method in Python3. A String formatter works by putting in one or more placeholders defined by a pair of curly braces { } into a string and calling the str.format(). The value we wish to put into the placeholders are passed as arguments into the format function.

Syntax


str.format(positional_argument, keyword_argument)

Positional_argument can be any data type or any variables.
Keyword_argument is name/value pair passed in as argument.

Here are some examples:

Example	Output	Comments
'{}, Welcome to Programming'.format('Hi')	'Hi, Welcome to Programming'	{} is replaced with what is passed as a argument.
'My age is {}'.format(21)	'My age is 21'	Argument passed as number is converted to a String
'{}, welcome to {} programming'.format('Jo', 'Python')	'Jo, welcome to Python programming'	Placeholders can be multiple and spread across
"I'm {} and I'm {} years old".format('Jo', 21)	'I'm Jo and I'm 21 years old'	Comma separated arguments can be of any type. Can use double quotes or triple quotes to construct your formatter as it is a String
"{1} and {0}".format("Java", "Python")	'Python and Java'	Can you index positions of the arguments passed in for the formatter
"{} and {lang}".format("Java", 'C++', lang='Python')	'Java and Python'	Can pass in keyword arguments (lang='Python') after all positional arguments
'{x}{y}'.format(x='abc', y='xyz')	'abcxyz'	x, and y place holders are replaced with variable values
data = {'first': 'Jane', 'last': 'Doe'} '{first} {last}'.format(**data)	'Jane Doe'	place holders applied to dictionary object
data = {'first': 'Jane', 'last': 'Doe'} '{p[first]} {p[last]}'.format(p=data)	'Jane Doe'	another method for the same output given for convenience
data = (10, 12, 17, 18, 25, 42) '{d[4]} {d[5]}'.format(d=data)	25 42	prints out the 5th and 6th element from the tuple. Can be applied to list in a similar fashion
coord = (5, 8) 'X: {0[0]}; Y: {0[1]}'.format(coord)	'X: 5; Y: 8'	prints out the coordinates from the tuple

Note: From Python 3.6 and upwards you should use 'f-strings' where ever applicable, in which the format expression is simplified. You can use 'f' or 'F' instead of using the format function. Here are some examples:


age = 21
message = f'My age is {age}'
print(message)
print(F'My age is {age}')
data = {'first': 'Jane', 'last': 'Doe'}
message = f'{data["first"]} {data["last"]}'
print(message)

With the f-strings notation, you can use any variable declared prior, within your expressions.

Convert numerical data to String using formatter

There are instances when you would like to format a numerical values in certain ways for readability reasons. For e.g., add thousands separator with a comma. Round a decimal number to 2 digits etc.. You can use formatters to achieve these.

Syntax :

{positional_argument_if_any:conversion_code}.format(value)

Here are some examples:

Code	Example(s)	Output	Comments
'f'	'{:f}'.format(3.141592653589793) f'{3.141593:f}'	'3.141593'	using f restricts the output of the decimal portion to 6 digits. Second example is the same output using f-string
'f'	'{1:.2f}'.format(3.14159, 4.347) f'{4.347:.2f}'	'4.35'	Takes the second argument as the positional_argument is 1 and rounds to 2 decimal places due to using .2f. Second example is the same output using f-string
'f'	'{:06.2f}'.format(3.141592653589793)	'003.14'	in this example we want the output to be at least 6 characters with 2 decimal places and filling 0 in the beginning
'e'	'{:e}'.format(3141592653589793)	'3.141593e+15'	represents a very large number in exponent notation
'e'	'{:10.2e}'.format(3141592653589793)	' 3.14e+15'	represents a very large number in exponent notation with at least 10 characters by filling spaces if required.
'd'	'{:04d}'.format(52)	'0052'	decimal number printed with at least 4 characters by adding 0 as required
'd'	'{: d}'.format(-52)	'-52'	use one space to specify sign; positive numbers will have one space and negative number starts with negative (-) sign
'd'	'{:=+4d}'.format(52)	'+ 52'	specifies the minimum size should be 4 along with displaying + sign for positive numbers and - for negative
','	'{:,}'.format(1234567890)	'1,234,567,890'	using comma for thousands separator

Conversion Codes

'f', 'e', ',' and 'd' shown above are not the only conversion codes. Here is the complete rundown.

s – strings
d – decimal integers (base-10)
f – floating point display
c – character
b – binary
o – octal
x – hexadecimal with lowercase letters after 9
X – hexadecimal with uppercase letters after 9
e – exponent notation

Remember for all floating point numbers, you should use 'f' and for all whole numbers, you should use 'd'. You can add the comma for thousands separator along with 'f'

Errors: A ValueError occurs when the conversion is unsuccessful

Official reference

Refer https://docs.python.org/3/library/string.html#format-string-syntax for more formatting examples

Strings