Data Types, Variables and Arithmetic Operators
Let us write a simple equation in math to calculate mean of a set of numbers:
a = 15, b = 35, c = 55
mean = (a+b+c) / 3 = 35
To do this simple calculation, you may be using mental math or a calculator. But if you are writing a Python program to do so, then you first have to understand how to declare variables a, b, c and mean, understand the data types (integers, real numbers, text etc..) that can be assigned to your variables and finally, understand the various arithmetic operators that you can use. In this lesson we will learn all of these simple concepts.
What is a data type?
In its simplest form a computer program written in any programming language does some computing of the variables used in the program. In the simple mean calculation above, you see variables a, b and c declared which hold some values. Before the computer can start computing the data saved in the variables, it needs to know what type of data it is. Data that is saved in a variable can be a integer or a regular english word or some other literal. You obviously cannot multiple two words but you can multiply two numbers. But how will the computer program understand which variables it can multiply successfully and which variables it should not even attempt to do? That is done by understanding the data types.
Data typing feature also helps the program to decide how much space to allocate in memory to hold the values assigned to the variables. For e.g., if variables are declared of type integer, then the program can perform all types of arithmetic operations between such variables. And to also allocate enough memory in the computer to hold the operands and also the result of the such operations.
In some programming languages like Java, C etc., the programmer has to declare the data type of the variable, before using the variables in any expression. If you do not declare the data type for a variable and try to use it, the program will throw a compile error.
In Python however, you do not explicitly assign the data type as Python figures out the type of the variable automatically by understanding the literal values assigned to the variables. This is called dynamic typing. Dynamic typing are also referred as Duck typing - The name comes from the phrase 'If it looks like a duck and quacks like a duck, it's a duck'.
Python's common data types
The most popular data types used in Python are given below:
Data type | Name | Example | Allowed Values |
---|---|---|---|
str | String | name = "Joe" | Any text |
bool | Boolean | dogs_bark = True | True/False |
int | Integer | days_in_a_year = 365 | Whole digits |
float | Float | height = 52.4 | Real numbers |
NoneType | Null equivalent | a = None | None only |
None datatype is 'null' equivalent of other programming languages. It represents empty or no data and is represented as NoneType data type.
Simple Statements
Let us now get our hands dirty by keying in the program shown below in the Code cell, to compute the mean in Python:
a = 15
b = 35
c = 55
mean = (a+b+c) / 3
print(mean)
print(type(mean))
Key in the above statements one at a time in the Code input cell in firstConcept.ipynb file opened in the previous lesson. Although for your convenience a Copy button is given, if you are new to programming it is recommended that you key in the values, one statement at a time, instead of using the Copy button. To run this program, ensure that the cursor is inside the Code input cell and then press control+enter for Mac or ctrl+enter for Windows and notice the output.
Notice the class 'float' printed below 35.0. Since the computed answer is a real number it has automatically assigned a float data type to the computed answer.
All of these statements that we executed thus far are simple statements.
Note: While the first 4 lines of code are similar to algebraic expressions, the two new keywords you may notice are print
and type
.
These are called functions. In very simple terms, a function can be considered as a black box, which takes zero to more inputs called arguments, and splits out zero to more outputs. In this case, the print
function takes in one argument mean
, and splits out that value as the output on the screen.
The type
function takes in the argument mean
and splits out the data type of the argument that is passed in. This output from the type
function is again sent as an argument to the print
function so that the print
function can show that as the output on the screen.
Function arguments are passed with in a pair of parenthesis ()
Rules for variable name declaration
Must start with a letter or underscore
Must contain only letters, digits or underscores
Must not use any of the reserved keywords that is used by Python
Keywords in Python:
and assert break class continue def del elif else except exec finally for from global if import in is lambda not or pass raise return try while yield
Recommendation for Variable Names
- Give meaningful names for variables instead of using a, b, x, y etc., unless it is a variable declared in a loop or a mathematical equation like in the example shown.
- Start with lowercase letter and use underscore to separate words.
- Although camel case notations for variable names is in vogue for Object Oriented Programming (OOP), in Data Analytics however, we rarely create an object, so we will use underscore notations in this book.
- Example of camel case: studentName="joe", Example of underscore: student_name="joe"
Points to note
- Python is very picky on indentation. All the simple statements should line up without any indentation. You will learn the compound statement indentation rules later.
- Variable names are case sensitive: mean != Mean
- A variable should be first defined before it is used. The below code cell throws an exception:
print(d)
NameError: name 'd' is not defined
- Single quote ('), double (") quotes and three triple single or double quotes (```) or (""") are allowed to enclose a String literal value. Use triple quotes when the String spans multiple lines. Triple quotes are also used for documentation which you will learn later.
- Literal values for float, int, bool should not be enclosed with any type of quote.
- A variable which is assigned one type first can get reassigned with another type later. Key in the below statements in the code input cell, run the code and watch the output.
weight = 100 weight = "150 pounds" print(weight)
You will notice that the program runs without any error and the output is 150 pounds.
Few more tips on trouble shooting
- Programming context is maintained between the code cells. Variables which are declared in one cell is available for code cells which are executed, after the cell containing the variable declaration is executed. Order of the cell in the notebook does not matter as long as it is executed after the variable declaration code is executed. However it is a good practice to write all the code cells in the order in which they should be executed.
- Sometimes you may lose track of all the variables active in your context and you may be seeing results which you did not anticipate. In such cases it is a good idea to restart your Kernel and start your executions with a clean slate. To start a clean run of all the code cells use Notebook --> Restart Kernel, Notebook --> Run All Cells
- Shutdown the Kernel for the notebook, close the notebook file and reopen if there are persistent issues which are not resolved by following the above procedure.
Arithmetic Operators
You have already used addition (+) operator in the example above. The other arithmetic operators in Python are listed below:
If x=3
then,
Operator name | Notation | Short Notation | Result |
---|---|---|---|
Addition | x = x + 1 | x += 1 | 4 |
Subtraction | x = x - 1 | x -= 1 | 2 |
Multiplication | x = x * 2 | x *= 2 | 6 |
Division | x = x / 2 | x /= 2 | 1.5 |
Integer Division | x = x // 2 | x //= 2 | 1 (decimal part is floored) |
Modulo (gets the remainder after division) | x = x % 2 |
x %= 2 | 1 |
Exponent | x = x ** 2 | x **= 2 | 9 |
Note: In all the above examples, the expression is evaluated and assigned back to the 'x' variable. This may or may not be the case in your own solutions
Ceiling and Floor
- Ceiling is a type of rounding in which a number with a decimal value greater than 0 is rounded to the next higher number
- Floor is a type of rounding in which a number with any decimal value greater than 0 is rounded to the next lowest whole number that is below the number.
Points to note
- Short Notation is used where ever possible instead of Notation statements. Both achieve the same result but one is shorter in representation.
- All the operations are very similar to standard algebraic results.
- Modulo operator returns the integer remainder after division
- Division result is always a floating point number
- Recommended style guide for Python is PEP 8 - https://www.python.org/dev/peps/pep-0008/
The order of operation is very similar to algebraic rules - PEMDAS. It stands for Parentheses, Exponents, Multiplication, Division, Addition, Subtraction.
More On Assignments
a = .4e7 # assigns an exponential value to a
b = 3 + 4j # Python supports complex data type with real and imaginary parts
print(a)
print(b)
Output:
4000000.0
complex
Using Semicolon
Python does not require semicolons to terminate statements. However, if you wish to put multiple statements in the same line then semicolons can be used to delimit statements. Check out the below code for an example:
a = 3; b = 5; c = 10
print(a, b, c)
The variables a, b and c are initialized in the same line with semicolon separating multiple statements.