Sort Arrays in NumPy
Hi Enthusiastic Learners! Have you ever dealt with jumbled numbers? Ever thought of removing randomness & see a clean sequential order? I think we all did. And when it comes to that we can always sort arrays in NumPy. To sort arrays in NumPy, it has provided us many in built functions for doing that. Advantage to sort arrays in NumPy is huge, because as shown in our previous articles efficiency of NumPy in-built functions is very high (Why use Universal Function in NumPy?). Let’s begin with Sorting Arrays!
NumPy provides us 2 major functions for sorting our desired arrays:
- sort()
- argsort()
Watch video tutorial here:
Sort arrays in NumPy using SORT()
SORT()
function syntax is as follows:
sort(arr, axis=-1, kind=’quicksort’, order=None)
Where:
arr is the array to be sorted.
axis is used to define the number of axis along which array need to be flattened — we will discuss it with examples below.
kind allows us to choose type of sorting technique we want to use.
By default it is set to ‘quicksort’. You can choose from following list of algorithms [‘quicksort’, ‘mergesort’, ‘heapsort’ ]
order takes list type input. When ‘arr’ is a structured array, it specifies which fields to compare 1st, 2nd, and so on.
Let’s begin with creating a random array & sort it.
import numpy as np
arr = np.array([18, 9, 3, 4, 2, 8, 11, 6, 1, 4, 2, 43, 57, 3, 6, 22, 17])
arr
Let’s sort this array using sort() function.
sorted_arr = np.sort(arr)
# by default quicsort algo, last axis, and order is optional
print("-- Sorted Array --")
print(sorted_arr)
Choosing algorithm for sorting
Now, let’s use a huge array of random values and try sorting using each algorithm and time-it that, how much time is consumed by each algo.
This will help us determine whether we should let our np.sort()
function use default algorithm or something else.
List of algorithms:
- Quicksort (default algo for np.sort())
- Mergesort
- Heapsort
Creating a huge array in NumPy.
huge_array = np.random.randint(1, 100, size=10000000)
print("Number of elements in huge array = " + str(len(huge_array)) )
print("Printing some values of array --> " + str(huge_array))
QuickSort
Sorting array using default algo ‘quicksort’.
Function definition –> np.sort(huge_array, kind='quicksort')
OR simply use np.sort(huge_array)
# Calculating time consumed by quicksort to sort an array
%timeit np.sort(huge_array)
Mergesort
Sorting array using algo ‘mergesort’.
Function definition –> np.sort(huge_array, kind='mergesort')
# Calculating time consumed by mergesort to sort an array
%timeit np.sort(huge_array, kind='mergesort')
Heapsort
Sorting array using algo ‘heapsort’.
Function definition –> np.sort(huge_array, kind='heapsort')
# Calculating time consumed by mergesort to sort an array
%timeit np.sort(huge_array, kind='heapsort')
From above results it is clear that following is the order in means of speed:
QuickSort <
MergeSort <
HeapSort
Sort arrays in NumPy using ARGSORT()
Unlike np.sort()
, the np.argsort()
function provides us indices (positions) of elements in order they should be placed to get a sorted array.
Just line sort() function it also takes same set of arguments.
np.argsort(arr, axis=-1, kind='quicksort', order=None)
Using argsort() to sort an array ‘arr’.
sorted_arr_indices = np.argsort(arr)
sorted_arr_indices
To check the values of array at those locations we can use following syntax:
arr[np.argsort(arr)]
arr[np.argsort(arr)]
We can clearly see that we are getting a sorted array from this as well, however it is adding a overhead to fetch sorted array values, so it is slower in comparison to that of np.sort().
Use np.argsort() only if you ever need to get indices of sorted array values.
Sort arrays in NumPy across different Axis
Axis can be vertical axis or horizontal axis or in 3-D array some third axis.
For better understanding we will be using only 2-D arrays in this article & for vertical axis we will be dealing with Columns, similarly, for horizontal axis we will be dealing with rows.
Sort arrays in NumPy across Columns
For columns or vertical axis we set value of AXIS = 0
Let’s create a 2-D array and sort it along columns only.
arr_2d = np.random.randint(5, 15, (4, 4))
arr_2d
# Sort values column wise only
sorted_arr_axis_0 = np.sort(arr_2d, axis=0)
print("-- Sorted values per column --")
print(sorted_arr_axis_0)
From above example, it is clear that we have sorted values in Ascending oreder in each column & have not tried to sort values in rows of array.
Sort arrays in NumPy across Rows
For rows or horizontal axis we set value of AXIS = 1
# Sort values column wise only
sorted_arr_axis_1 = np.sort(arr_2d, axis=1)
print("-- Sorted values per row --")
print(sorted_arr_axis_1)
Data across all rows have been sorted & no value in columns have been tried to be sorted.
In our next articles we will be covering more topics related to sorting in NumPy, such as, Partial Sorting in NumPy, indirect sorting & many more. To know about indirect sorting please check this article lexsort() – Indirect Sort in NumPy
Stay tuned & keep learning!