1.0 安装 NumPy

!pip install numpy

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: numpy in /root/anaconda3/envs/pyspark/lib/python3.8/site-packages (1.24.4)
[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.[0m[33m
[0m

2.0 基础知识

2.1 数组

2.1.1 创建NumPy ndarray 对象

NumPy 用于处理数组， NumPy中的数组对象称为 ndarray

我们可以使用 array() 函数创建一个NumPy ndarray() 对象

import numpy as np

arr = np.array([1,2,3,4,5])

print(arr)

print(type(arr))

[1 2 3 4 5]
<class 'numpy.ndarray'>

type(): 这个内置的Python函数告诉我们传递给它的对象的类型。像上面的代码一样，它表明arr是 numpy.ndarray类型。

要创建ndarray，我们可以将列表、元组或任何类似数组的对象传递给array()方法，然后它将被转换为ndarray:

任何可以迭代的对象都可以转为ndarray对象

# 使用元组创建 NumPy 数组
arr = np.array((1,3,2,4,5,6,7))

print(arr)

[1 3 2 4 5 6 7]

2.1.2 数组中的维

数组中的维是数组深度（嵌套的数组）的一个级别

嵌套数组：指的是将数组作为元素的数组

2.1.2.1 0-D数组

0-D 数组，或标量（Scalars），是数组中的元素。数组中的每个值都是一个0-D数组。

# 用值61创建0-D数组：
arr = np.array(61)
print(arr)

2.1.2.2 1-D数组

其中元素为0-D数组的数组，称为一维数组或 1-D数组

这是最常见的基础数组

# 创建包含值 1，2，3，4，5，6 的 1-D 数组：
arr = np.array([1,2,3,4,5,6])

print(arr)

[1 2 3 4 5 6]

2.1.2.3 2-D数组

其元素为1-D数组的数组，称为2-D数组

它们通常用于表示矩阵或二阶张量。

NumPy有一个专门用于矩阵运算的完整子模块numpy.mat

# 创建包含值 1，2，3 和 4，5，6i两个数组的 2-D 数组
arr = np.array([[1,2,3],[4,5,6]])

print(arr)

[[1 2 3]
 [4 5 6]]

2.1.2.4 3-D数组

其元素为2-D数组的数组，称为3-D数组

arr = np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])

print(arr)

[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]

2.1.2.5 检查数组维度

NumPy 数组提供了ndim 属性，该属性返回一个整数，改整数会告诉我们数组有多少维。

# 检查维度

a = np.array(24)
b = np.array([1,2,3,4,5,6])
c = np.array([[1,2,3],[4,5,6]])
d = np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])
e = np.array([[[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]],[[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]]])

print('a`D is', a.ndim, '|', \
      'b`D is', b.ndim, '|', \
      'c`D is', c.ndim, '|', \
      'd`D is', d.ndim, '|', \
      'e`D is', e.ndim, \
     )

a`D is 0 | b`D is 1 | c`D is 2 | d`D is 3 | e`D is 4

2.1.2.6 更高维的数组

数组可以拥有任意数量的维

创建ndarray 的时候可以加上ndmin参数定义维度

# 创建一个有5维度的数组，并验证它拥有5个维度：

arr = np.array([1,2,3,4],ndmin = 5)

print(arr , '\n' , 'arr.ndim is ',arr.ndim)

[[[[[1 2 3 4]]]]] 
 arr.ndim is  5

2.1.3 数组索引

2.1.3.1 访问数组元素

数组索引等同于访问数组元素。

您可以通过引用其索引号来访问数组元素。

NumPy数组中的索引以0开头，这意味着第一个元素的索引为0，第二个元素的索引为1，以此类推。

arr = np.array([1,2,3,4])

# 获取元素
print(arr[0] , arr[1] , arr[2]+arr[3])

1 2 7

2.1.3.2 多维数组

在访问使用逗号 , 去分割元素的维度和索引

例如：[1,2,3] 第一维中的第二个元素，第二维中的第二个元素，第三维中的第三个元素

arr = np.array([[[[1,1,1],[4,4,4]],[[2,2,2],[5,5,5]]], 

                # arr[1]
                [[[3,3,3],[6,6,6]], 

                 # arr[1,1]
                 [[7,7,7], 

                  # arr[1,1,1]
                  [8,8,8]]]])

print('arr[1,1,1] = ' , arr[1,1,1],'\n')

print(arr)

arr[1,1,1] =  [8 8 8] 

[[[[1 1 1]
   [4 4 4]]

  [[2 2 2]
   [5 5 5]]]


 [[[3 3 3]
   [6 6 6]]

  [[7 7 7]
   [8 8 8]]]]

2.1.3.3 负索引

使用负索引从后往前访问数组

arr = np.array([[[[1,1,1],[4,4,4]],[[2,2,2],[5,5,5]]], [[[3,3,3],[6,6,6]], [[7,7,7], [8,6,7]]]])

print('arr[1,1,1,-1] = ' , arr[1,1,1,-1],'\n')
print('arr[1,1,1,1] = ' , arr[1,1,1,1])

arr[1,1,1,-1] =  7 

arr[1,1,1,1] =  6

2.1.4 数组裁切

2.1.4.1 裁切数组

python 中裁切的意思是将元素从一个给定的索引带到另一个给定的索引。

我们像这样传递切片而不是索引:[start:end]。

我们还可以定义步长，如下所示:[start: end: step]。

如果我们不传递start，则将其视为0。

如果我们不传递end，则视为该维度内数组的长度。

如果我们不传递step，则视为1.

arr = np.array([1,2,3,4,5,6,7])

# 1~5 step = 1
print(arr[1:5])

# 4~end step = 1
print(arr[4:])

# start~4 step = 1
print(arr[:4])

# 1~5 step = 2
print(arr[1:5:2])

[2 3 4 5]
[5 6 7]
[1 2 3 4]
[2 4]

2.1.4.2 负裁切和 Step

使用方式和python可迭代对象一样

arr = np.array([1,2,3,4,5,6,7])

# 2~-1(end) step = 1
print(arr[2:-1])

'''
    -1(end)~2 step = 1
    观察到[](空)输出 , 在step 为正的时候
    start > end 区间不是闭合的
    区间会被认为什么元素都没有
'''
# -1(end)~2 step = 1
print(arr[-1:2])

'''
    当step = -1时候
    会从start开始从右向左取值了
    会产生一个封闭区间
'''
# -1(end)~2 step = -1
print(arr[-1:2:-1])

[3 4 5 6]
[]
[7 6 5 4]

2.1.4.3 多维数组的裁切和Step

arr = np.array([[[[1,1,1],[4,4,4]],[[2,2,2],[5,5,5]]], [[[3,3,3],[6,6,6]], [[7,7,7], [8,6,7]]]])

print('arr[1,1,1,::-1] = ' , arr[1,1,1,::-1],'\n')
print('arr[1,1,::-1] = ' , arr[1,1,::-1],'\n')
print('arr[1,1,1,::-1] = ' , arr[1,1,1,::2],'\n')

arr[1,1,1,::-1] =  [7 6 8] 

arr[1,1,::-1] =  [[8 6 7]
 [7 7 7]] 

arr[1,1,1,::-1] =  [8 7]

2.2 数据类型

2.2.1 NumPy数据类型

2.2.1.1 Python 中的数据类型

默认情况下，Python拥有以下数据类型:

strings -用于表示文本数据，文本用引号引起来。例如"ABCD".
integer -用于表示整数。例如-1.-2,-3。
float-用于表示实数。例如1.2,42.42。
boolean -用于表示True 或False.
complex -用于表示复平面中的数字。例如1.0+2.0j，1.5+2.5j。

2.2.1.2 NumPy中的数据类型

NumPy有一些额外的数据类型，并通过一个字符引用数据类型，例如1代表整数，U代表无符号整数等。以下是NumPy中所有数据类型的列表以及用于表示它们的字符。

i-整数
b-布尔
u-无符号整数
f-浮点
C-复合浮点数
m- timedelta
M - datetime0-对象
S-字符串
U-uicode 字符串
V-固定的其他类型的内存块(void)
object 兼容所有数据

2.2.1.3 检查数组的数据类型

NumPy数组对象有一个名为dtype的属性，该属性返回数组的数据类型：

import numpy as np

arr = np.array([1,2,3,4])

print(arr.dtype)

int64

arr = np.array(['apple','apples'])

# Un n是字符串数组中的maxlen元素的长度
def getmaxlen(arr:list) ->int:
    ans = 0
    for i in arr:
        ans = max(len(i) ,ans)
    return ans

print('<U%d'%getmaxlen(arr) ,end = '|')
print(arr.dtype)

<U6|<U6

2.2.1.4 用已定义的数据类型来创建数组

在创建ndarray对象的时候，加入参数dtype ，设置元素的预选类型

arr = np.array([1,2,3,40] , dtype = 'S')

# 数据类型其实被封装后的字节数据
print(type(arr[0]))

print(arr)

# |Sn n依旧是显示数组中最长元素的长度
print(arr.dtype)

<class 'numpy.bytes_'>
[b'1' b'2' b'3' b'40']
|S2

2.2.1.5 数组dtype的强类型属性

不同的属性在同一数组中会显示ValueError

arr = np.array(['1',{1}])

# numpy 代理对象的时候 ， 会将对象序列化为一个兼容的对象
print(type(arr[1]))
print(type(arr[0]))
print(arr.dtype)

<class 'set'>
<class 'str'>
object

# 指定dtype为numpy的基础数据类型的话会将数据全部转为 字节数据存储
arr = np.array([1,2.123,'ccc'] , dtype = 'S')
print(type(arr[0]))
print(type(arr[1]))
print(type(arr[2]))
print(arr.dtype)

<class 'numpy.bytes_'>
<class 'numpy.bytes_'>
<class 'numpy.bytes_'>
|S5

这种转换在指定dtype ，不会再推断数据类型，会依照dtype强转，如何无法转换会直接报错

arr = np.array([1,2.123,'ccc'] , dtype = 'i')
print(arr)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[66], line 1
----> 1 arr = np.array([1,2.123,'ccc'] , dtype = 'i')
      2 print(arr)


ValueError: invalid literal for int() with base 10: 'ccc'

2.2.1.6 转换已有数组的数据类型

更改现有数组的数据类型的最佳方法，是使用astype()方法复制该数组。

astype()函数创建数组的副本，并允许您将数据类型指定为参数。

数据类型可以使用字符串指定，例如'f'表示浮点数，'i'表示整数等。或者您也可以直接使用数据类型，例如float 表示浮点数，int表示整数。

# 通过使用 ‘i’ 作为参数值 ， 将数据类型从浮点数更改为整数

arr = np.array([1.1,2.1,3.1])

coparr = arr.astype('i')

print(coparr)
print(coparr.dtype)

[1 2 3]
int32

# 通过使用 int 作为参数值 ， 将数据类型从浮点数更改为整数

arr = np.array([1.1,2.1,3.1])

coparr = arr.astype(int)

print(coparr)
print(coparr.dtype)

[1 2 3]
int64

# 通过使用 bool 作为参数值 ， 将数据类型从浮点数更改为bool值

arr = np.array([0,2.1,3.1])

coparr = arr.astype(bool)

print(coparr)
print(coparr.dtype)

[False  True  True]
bool

2.2.2 NumPy 数组副本vs视图

2.2.2.1 副本和视图之间的区别

副本和数组视图之间的主要区别在于副本是一个新数组，而这个视图只是原始数组的视图。

副本拥有数据，对副本所做的任何更改都不会影响原始数组，对原始数组所做的任何更改也不会影响副本。

视图不拥有数据，对视图所做的任何更改都会影响原始数组，而对原始数组所做的任何更改都会影响视图。

2.2.2.2 副本

arr = np.array([1,2,3,4,5])
x = arr.copy()
arr[0] = 61

print(arr)
print(x)

[61  2  3  4  5]
[1 2 3 4 5]

2.2.2.2 试图

arr = np.array([1,2,3,4,5])
x = arr.view()
arr[0] = 61

print(arr)
print(x)

[61  2  3  4  5]
[61  2  3  4  5]

arr = np.array([1,2,3,4,5])
x = arr.view()
x[0] = 31

print(arr)
print(x)

[31  2  3  4  5]
[31  2  3  4  5]

2.2.2.4 检查数组是否拥有数据

如上所述，副本拥有数据，而视图不拥有数据，但是我们如何检查呢?

每个 NumPy 数组都有一个属性base，如果该数组拥有数据，则这个base属性返回None.

否则，base属性将引用原始对象。

这个base是NumPy的唯一标识或者元数据

arr = np.array([1,2,3,4,5])
x = arr.copy()
y = arr.view()

print(x.base)
print(y.base)
print(arr.base)

# 切片是view ， 存储的是逻辑地在
print(arr[::].base)

None
[1 2 3 4 5]
None
[1 2 3 4 5]

2.3 形状

2.3.1 NumPy数组形状

2.3.1.1 数组的形状

数组中的形状是每个维中元素的数量

2.3.1.2 获取数组的形状

NumPy 数组中有一名为shape的属性，该属性返回一个元祖，每个索引具有相应元素的的数量

import numpy as np

arr = np.array([[1,2,3,4],[5,6,7,8],[5,6,7,8]])

# 返回 (3,4) ， 这意味着该数组有2个维
# 第一维有3个元素
# 第二维有4个元素
print(arr.shape)

(3, 4)

arr = np.array([1,2,3,4], ndmin = 5)
print(arr)
print(arr[0,0].shape)

[[[[[1 2 3 4]]]]]
(1, 1, 4)

2.3.2 NumPy数组重塑

2.3.2.1 NumPy数组重塑

重塑意味着更改数组的形状

数组的形状是每个维中元素的数量

通过重塑，我们可以添加或删除维度或更改每个维度中的元素数量

2.3.2.2 从1-D重塑到2-D

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
newarr = arr.reshape(4, 2, 1)

print("Original array:")
print(arr)

print("\nReshaped array:")
print(newarr)

print("\nShape of newarr:", newarr.shape)
print("Number of dimensions of newarr:", newarr.ndim)  # 用 ndim 获取维数

Original array:
[1 2 3 4 5 6 7 8]

Reshaped array:
[[[1]
  [2]]

 [[3]
  [4]]

 [[5]
  [6]]

 [[7]
  [8]]]

Shape of newarr: (4, 2, 1)
Number of dimensions of newarr: 3

2.3.2.3 重塑形状原则

ndarray.reshape(x1, x2 , ...... , xi)

x1 * x2 * ...... * xi == arr中0D的元素数量

arr = np.array([[1, 2, 3,4],
                [5, 6, 7, 8]])
newarr = arr.reshape(4, 2, 1)

print("Original array:")
print(arr)

print("\nReshaped array:")
print(newarr)

print("\nShape of newarr:", newarr.shape)
print("Number of dimensions of newarr:", newarr.ndim)  # 用 ndim 获取维数

Original array:
[[1 2 3 4]
 [5 6 7 8]]

Reshaped array:
[[[1]
  [2]]

 [[3]
  [4]]

 [[5]
  [6]]

 [[7]
  [8]]]

Shape of newarr: (4, 2, 1)
Number of dimensions of newarr: 3

2.3.2.4 重塑数组是视图

arr = np.array([1,2,3,4,5,6,7,8])

# 返回数组是视图
print(arr.reshape(2,4).base)

[1 2 3 4 5 6 7 8]

2.3.2.5 未知的维

您可以使用一个‘未知’维度

这意味着您不必在reshape方法中为维度之一指定确切的数字

传递 -1 作为值， NunPy 将为您计算该数字

任意负数都行

arr = np.array([1,2,3,4,5,6,7,8])

newarr = arr.reshape(2,-1,-1)
print(newarr)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[139], line 1
----> 1 newarr = arr.reshape(2,-1,-1)
      2 print(newarr)


ValueError: can only specify one unknown dimension

newarr = arr.reshape(2,-1,2)
print(newarr)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]

newarr = arr.reshape(2,1,-3)
print(newarr)

[[[1 2 3 4]]

 [[5 6 7 8]]]

2.3.2.7 展开数组

展开数组（Flattening the arrays）是指将多维数组转换1D数组.

我们可以使用reshape(-1)来做到这一点