String Operations
In the first lesson you learnt about String data type. A String data type holds text value. Value can be enclosed in a single, double or triple quote. Triple single or double quote strings are mostly used to represent strings that span multiple lines.
double_quote_word = "Python"
print(double_quote_word)
single_quote_word = 'Python'
print(single_quote_word)
multi_line_word = '''First line
Second line''' # any white spaces are maintained
print(multi_line_word)
multi_line_word = """First line
Second line""" # same as the last example
print(multi_line_word)
There are many operations that you can invoke on a String object.
Here is the list of some of the operations used on a string variable;
word = 'Python'
Operations | Example | Output | Comments |
---|---|---|---|
Positive positional index | word[0] | 'P' | positional index starts with 0, prints first character |
Negative positional index | word[-1] | 'n' | first position from the end. -0 is same as 0 |
Slicing | word[0:2] | 'Py' | characters from position 0 (included) to 2 (excluded) |
Slicing with default start position | word[:2] | 'Py' | starts from 0 - default and 2nd position (excluded) |
Slicing with default end position | word[2:] | 'thon' | starts from position 2 to end of the string |
Concatenation | "My " + word + " 3.6" |
'My Python 3.6' |
You can join multiple strings with a + operator |
Build in function len | len(word) | 6 | len is one of the built in functions which is part of the Python interpreter similar to print(). This returns the size of the string |
Points to note
- Strings are immutable - A String once created, cannot be altered in anyway. In all the above operations a new String is returned after applying the operations. Hence assigning a new value to an indexed position results in an error. E.g., word[0] = 'K' is not allowed.
- Trying to get an indexed position which is out of range results in IndexError. E.g., word[9] will throw IndexError.
- However slice operations on out of range positions does not throw IndexError. E.g., word[9:12] gracefully exits without any error and with empty output.
String concatenation
Two strings can be added together with a plus (+) operator.
a = 'abc' + "def"
print(a)
Output:
abcdef
However you cannot add a string and any other data type.
a = 'abc' + 10
is invalid
a = 'abc' + True
is invalid
However you can convert other data types using the 'str' function
a = 'abc' + str(10) + str(True)
print(a)
Output:br> abc10True
String Special Cases
Sometimes, you want Python interpreter to parse certain characters in a string differently. This may be because these characters mean different to the parser. Here are some examples
Single or Double Quotes part of String
You want to add a single or double quote within a String literal
a = 'I'm having fun!'
In the above String literal since there is a single quote used to represent I'm the interpreter will throw an error while evaluating the String literal as it will parse the String literal upto the second single quote and thinks the rest of the String is in error. You can fix this in two ways
- Enclose the String within double quotes instead of single quotes. This can be applied to the corollary as well. I.e., having a double quote within a String enclosed within single quotes.
- Using Escape Sequence - described next.
Escape Sequence (\)
Add a '\' character to single quote. By adding this backslash character, you are essentially telling the interpreter to ignore the character next to it in a traditional way and consider that as a String literal.
a = 'I\'m having fun'
This backslash is called escape sequence, because the backslash causes the subsequent character sequence to “escape” its usual meaning.
Here are more examples of applying escape sequence
a = 'foo \
bar' # The escape here is to ignore the newline and consider the bar within the same line.
print(a)
print("a\tb") # adds a tab between a and b
print("a\nb") # adds a newline between a and b
Output:
foo bar
Raw String
If you do not want to escape the characters and want to represent them as-is then you use the 'r' or 'R'
Here is an example
print(r"a\tb")
Output:
a\tb
More Examples
Code | Output | Notes |
---|---|---|
print("First line \n Second line") |
First line Second line |
note the new line character \n which breaks the line and prints the part of sentence after \n in the second line |
print(r"c:\mydir\numberedDir") | c:\mydir\numberedDir | note of the use of r to ensure \n part of \numberedDir is not interpreted as a new line character |
print( "hi" + " there") | hi there | using + operator we can concatenate strings |
print( 3 * "yah! ") | yah! yah! yah! | using * we can repeat the string multiple times |
text = ("multi line" " improves readability" ) print(text) |
multi line improves readability |
putting strings within parenthesis will join the strings. |
Strings are also a type of sequence. There is a built in function str to create strings. str is also used to get textual representation of any object.
Here are a few more commonly used convenience methods on Strings.
Operations ('str' represents a String object) |
Example | Output | Comments |
---|---|---|---|
str.split() | t = 'py,book,3.6'.split(',') print(t) |
['py', 'book', '3.6'] | returns a list with word split at the specified delimiter (,). You can specify any delimiter. Assumes space if no delimiter is given. |
str.splitlines() | t = 'py\n3.6'.splitlines() print(t) |
['py', '3.6'] | returns a list with word split on newline |
str.capitalize() | 'python'.capitalize() | 'Python' | returns a string with first letter in upper case |
str.count(sub[,start[,end]]) | 'america'.count('a') | 2 | counts how many times 'a' occurs in a string |
str.endswith(sub[,start[,end]]) | 'america'.endswith('ca') |
True | returns True if the word ends with a specific letter or word else False. Complementary function startswith |
str.find(sub[,start[,end]]) | 'america'.find('me') | 1 | returns the index of the first substring which matches the given sub string. |
str.lower() | 'PYTHON'.lower() | 'python' | creates and returns a new string by converting the given string to lowercase |
str.upper() | 'python'.upper() | 'PYTHON' | creates and returns a new string by converting the given string to uppercase |
str.join(iterable) | '$'.join(['a','b','c']) | 'a$b$c' | creates and returns a new string by joining each element of the given iterable separated by the given string |
in | 'ri' in 'america' | True | Returns True if there is an 'ri' substring in 'america'. The 'in' operator can also be used to check a substring in a string |
Using the in keyword with Strings in an 'if' Conditional
When it comes to finding substrings in a string, there are indeed many ways. You can use the find method on the string object. This method returns the index position of the first substring it finds. If there was no such string, then it returns -1. You can also use startsWith or endsWith methods to find the substring in the beginning or end of the string as the function name suggests.
However, there is another cool way of finding and that is by using the 'if' statement. If supposing you want to know if 'cat' substring is present in a 'caterpillar', then you can use the below code:
if 'cat' in 'catterpillar':
print("Yes, this works too")
When you run the above code you will notice that the if conditional is satisfied with this expression.
format function
str.format() is a string formatting method in Python3. A String formatter works by putting in one or more placeholders defined by a pair of curly braces { } into a string and calling the str.format(). The value we wish to put into the placeholders are passed as arguments into the format function.
Syntax
str.format(positional_argument, keyword_argument) Positional_argument can be any data type or any variables. Keyword_argument is name/value pair passed in as argument.
Here are some examples:
Example | Output | Comments |
---|---|---|
'{}, Welcome to Programming'.format('Hi') |
'Hi, Welcome to Programming' |
{} is replaced with what is passed as a argument. |
'My age is {}'.format(21) | 'My age is 21' | Argument passed as number is converted to a String |
'{}, welcome to {} programming'.format('Jo', 'Python') | 'Jo, welcome to Python programming' | Placeholders can be multiple and spread across |
"I'm {} and I'm {} years old".format('Jo', 21) | 'I'm Jo and I'm 21 years old' | Comma separated arguments can be of any type. Can use double quotes or triple quotes to construct your formatter as it is a String |
"{1} and {0}".format("Java", "Python") | 'Python and Java' | Can you index positions of the arguments passed in for the formatter |
"{} and {lang}".format("Java", 'C++', lang='Python') |
'Java and Python' | Can pass in keyword arguments (lang='Python') after all positional arguments |
'{x}{y}'.format(x='abc', y='xyz') | 'abcxyz' | x, and y place holders are replaced with variable values |
data = {'first': 'Jane', 'last': 'Doe'} '{first} {last}'.format(**data) |
'Jane Doe' | place holders applied to dictionary object |
data = {'first': 'Jane', 'last': 'Doe'} '{p[first]} {p[last]}'.format(p=data) |
'Jane Doe' | another method for the same output given for convenience |
data = (10, 12, 17, 18, 25, 42) '{d[4]} {d[5]}'.format(d=data) |
25 42 | prints out the 5th and 6th element from the tuple. Can be applied to list in a similar fashion |
coord = (5, 8) 'X: {0[0]}; Y: {0[1]}'.format(coord) |
'X: 5; Y: 8' | prints out the coordinates from the tuple |
Note: From Python 3.6 and upwards you should use 'f-strings' where ever applicable, in which the format expression is simplified. You can use 'f' or 'F' instead of using the format function. Here are some examples:
age = 21
message = f'My age is {age}'
print(message)
print(F'My age is {age}')
data = {'first': 'Jane', 'last': 'Doe'}
message = f'{data["first"]} {data["last"]}'
print(message)
With the f-strings notation, you can use any variable declared prior, within your expressions.
Convert numerical data to String using formatter
There are instances when you would like to format a numerical values in certain ways for readability reasons. For e.g., add thousands separator with a comma. Round a decimal number to 2 digits etc.. You can use formatters to achieve these.
Syntax :
{positional_argument_if_any:conversion_code}.format(value)
Here are some examples:
Code | Example(s) | Output | Comments |
---|---|---|---|
'f' | '{:f}'.format(3.141592653589793) f'{3.141593:f}' |
'3.141593' | using f restricts the output of the decimal portion to 6 digits. Second example is the same output using f-string |
'f' | '{1:.2f}'.format(3.14159, 4.347) f'{4.347:.2f}' |
'4.35' | Takes the second argument as the positional_argument is 1 and rounds to 2 decimal places due to using .2f. Second example is the same output using f-string |
'f' | '{:06.2f}'.format(3.141592653589793) | '003.14' | in this example we want the output to be at least 6 characters with 2 decimal places and filling 0 in the beginning |
'e' | '{:e}'.format(3141592653589793) | '3.141593e+15' | represents a very large number in exponent notation |
'e' | '{:10.2e}'.format(3141592653589793) | ' 3.14e+15' | represents a very large number in exponent notation with at least 10 characters by filling spaces if required. |
'd' | '{:04d}'.format(52) | '0052' | decimal number printed with at least 4 characters by adding 0 as required |
'd' | '{: d}'.format(-52) | '-52' | use one space to specify sign; positive numbers will have one space and negative number starts with negative (-) sign |
'd' | '{:=+4d}'.format(52) | '+ 52' | specifies the minimum size should be 4 along with displaying + sign for positive numbers and - for negative |
',' | '{:,}'.format(1234567890) | '1,234,567,890' | using comma for thousands separator |
Conversion Codes
'f', 'e', ',' and 'd' shown above are not the only conversion codes. Here is the complete rundown.
- s – strings
- d – decimal integers (base-10)
- f – floating point display
- c – character
- b – binary
- o – octal
- x – hexadecimal with lowercase letters after 9
- X – hexadecimal with uppercase letters after 9
- e – exponent notation
Remember for all floating point numbers, you should use 'f' and for all whole numbers, you should use 'd'. You can add the comma for thousands separator along with 'f'
Errors: A ValueError occurs when the conversion is unsuccessful
Official reference
- Refer https://docs.python.org/3/library/string.html#format-string-syntax for more formatting examples