In Python, Libraries are collections of pre-written code that you can use to solve common tasks without reinventing the wheel. When it comes to Data Science and Engineering, NumPy and Pandas are the two most essential libraries.
NumPy is the fundamental package for scientific computing in Python. It provides a high-performance multidimensional array object called an ndarray.
Why use NumPy over Lists?
Speed: NumPy arrays are stored in contiguous memory blocks, making them significantly faster than Python lists.
Vectorization: You can perform operations on entire arrays without writing for loops.
Memory Efficiency: They consume less space than standard Python lists.
import numpy as np # 1. Creating an array arr = np.array([1, 2, 3, 4, 5]) # 2. Basic Math (Vectorized) print(arr * 2) # Multiplies every element by 2 print(arr + 10) # Adds 10 to every element # 3. Multi-dimensional Arrays (Matrix) matrix = np.array([[1, 2], [3, 4]]) print(matrix.shape) # Output: (2, 2) # 4. Useful Functions print(np.mean(arr)) # Average print(np.zeros((2, 3))) # Creates a 2x3 matrix of zeros
Pandas is built on top of NumPy and provides high-level data structures designed to make working with "relational" or "labeled" data easy and intuitive.
Core Data Structures:
Series: A 1D labeled array (like a single column in Excel).
DataFrame: A 2D labeled data structure (like a complete table/spreadsheet).
Key Features:
Easily handles missing data (NaN).
Powerful tools for reading CSV, Excel, and SQL databases.
Advanced filtering, grouping, and merging capabilities.
import pandas as pd # 1. Creating a DataFrame from a Dictionary data = { 'Name': ['Amit', 'Sanya', 'Raj'], 'Age': [25, 30, 22], 'City': ['Delhi', 'Mumbai', 'Bangalore'] } df = pd.DataFrame(data) # 2. Viewing Data print(df.head(2)) # Shows first 2 rows print(df.info()) # Summary of the data types # 3. Selecting Data print(df['Name']) # Select a column print(df[df['Age'] > 24]) # Filtering rows where Age > 24 # 4. Adding a Column df['Is_Senior'] = df['Age'] > 28 print(df)
| Feature | NumPy | Pandas |
| Data Type | Homogeneous (all items same type) | Heterogeneous (different types allowed) |
| Structure | Arrays (ndarrays) | Series and DataFrames |
| Primary Use | Mathematical/Vector operations | Data manipulation and analysis |
| Index | Integer-based indexing | Label-based indexing |
Copyright ©2025. All Rights Reserved Emblab THE RAVE INNOVATION