Python

In Python, Libraries are collections of pre-written code that you can use to solve common tasks without reinventing the wheel. When it comes to Data Science and Engineering, NumPy and Pandas are the two most essential libraries.

1. NumPy (Numerical Python)

Theory

NumPy is the fundamental package for scientific computing in Python. It provides a high-performance multidimensional array object called an ndarray.

  • Why use NumPy over Lists?

    • Speed: NumPy arrays are stored in contiguous memory blocks, making them significantly faster than Python lists.

    • Vectorization: You can perform operations on entire arrays without writing for loops.

    • Memory Efficiency: They consume less space than standard Python lists.

Code Implementation

Python
import numpy as np

# 1. Creating an array
arr = np.array([1, 2, 3, 4, 5])

# 2. Basic Math (Vectorized)
print(arr * 2)      # Multiplies every element by 2
print(arr + 10)     # Adds 10 to every element

# 3. Multi-dimensional Arrays (Matrix)
matrix = np.array([[1, 2], [3, 4]])
print(matrix.shape) # Output: (2, 2)

# 4. Useful Functions
print(np.mean(arr)) # Average
print(np.zeros((2, 3))) # Creates a 2x3 matrix of zeros

2. Pandas (Panel Data)

Theory

Pandas is built on top of NumPy and provides high-level data structures designed to make working with "relational" or "labeled" data easy and intuitive.

  • Core Data Structures:

    1. Series: A 1D labeled array (like a single column in Excel).

    2. DataFrame: A 2D labeled data structure (like a complete table/spreadsheet).

  • Key Features:

    • Easily handles missing data (NaN).

    • Powerful tools for reading CSV, Excel, and SQL databases.

    • Advanced filtering, grouping, and merging capabilities.

Code Implementation

Python
import pandas as pd

# 1. Creating a DataFrame from a Dictionary
data = {
    'Name': ['Amit', 'Sanya', 'Raj'],
    'Age': [25, 30, 22],
    'City': ['Delhi', 'Mumbai', 'Bangalore']
}

df = pd.DataFrame(data)

# 2. Viewing Data
print(df.head(2))   # Shows first 2 rows
print(df.info())    # Summary of the data types

# 3. Selecting Data
print(df['Name'])           # Select a column
print(df[df['Age'] > 24])   # Filtering rows where Age > 24

# 4. Adding a Column
df['Is_Senior'] = df['Age'] > 28
print(df)

3. Comparison Table

FeatureNumPyPandas
Data TypeHomogeneous (all items same type)Heterogeneous (different types allowed)
StructureArrays (ndarrays)Series and DataFrames
Primary UseMathematical/Vector operationsData manipulation and analysis
IndexInteger-based indexingLabel-based indexing
Upcoming Course
Upcoming Course
Learn More
Instructor Tips
Instructor Tips
View Tips
Join Community
Join Community
Join Now