import math
print(math.sqrt(25))5.0
Python’s strength comes from reusing code that others have written. A module is a file containing functions and classes. A package is a collection of related modules. You access them with import.
There are several ways to import:
Import a module and use its contents with dot notation:
Import specific items with from:
Use an alias with as:
Import multiple items in one line:
Import everything with * (avoid this, as it can cause name conflicts):
Python modules come from two places:
math, os, datetime, json, random.pip install package-name. Examples: numpy, pandas, geopandas.A few examples from the standard library:
{"name": "Alice", "age": 25}
Third-party packages extend Python into nearly every domain: data analysis, web development, machine learning, geospatial work.
The numpy and pandas examples below are a starting point. The syntax becomes familiar with practice, so follow along without worrying about memorising everything.
numpy handles arrays and mathematical operations on them efficiently. It appears everywhere in Python’s data science and scientific computing ecosystem.
To use a third-party package, you first install it with pip (Python’s package installer). In a notebook like Colab, prefix the command with ! to run it as a shell command:
numpy arrays work like Python lists but are optimised for numerical operations. You create one by passing a list to np.array():
[1 2 3 4 5]
Arithmetic applies to every element at once. Two arrays of the same length are combined element-by-element:
A single number applies to every element:
Arrays also have built-in statistical methods:
pandas is built on numpy and adds the DataFrame: a table with named columns, similar to a spreadsheet. It is the standard tool for data manipulation and analysis in Python.
One way to create a DataFrame is from a dictionary, where keys become column names and values become lists of row data:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
You can access individual columns by name. Numeric columns behave like numpy arrays:
0 25
1 30
2 35
Name: Age, dtype: int64
0 26
1 31
2 36
Name: Age, dtype: int64
This means you can use methods like .mean() directly on columns:
You can filter rows with conditions. A condition like df['Age'] >= 30 produces a boolean series (True/False per row):
Pass that boolean series back into the DataFrame to select only the matching rows:
Which is the same as passing an explicit list of booleans:
| Name | Age | City | |
|---|---|---|---|
| 1 | Bob | 30 | Los Angeles |
| 2 | Charlie | 35 | Chicago |
In practice, you usually combine both steps into one line:
You can also add new columns by assigning to them:
Imports let you build on existing code so you can focus on the parts unique to your problem. The import syntax and common aliases (np, pd) will become second nature quickly.