关于Pandas版本: 本文基于 pandas2.1.2 编写。
关于本文内容更新: 随着pandas的stable版本更迭,本文持续更新,不断完善补充。
传送门: Pandas API参考目录
传送门: Pandas 版本更新及新特性
传送门: Pandas 由浅入深系列教程
本节目录
- Pandas.Series.describe()
- 语法:
- 返回值:
- 参数说明:
- include 数据类型白名单
- exclude 数据类型黑名单
- percentiles 自定义百分位数
- 相关方法:
- 示例:
- 例1:自定义百分位数
- 例2:复数的统计描述
- 例2-1、构建包含复数的Series
- 例2-2、复数的统计描述,会引发报错
Pandas.Series.describe()
Pandas.Series.describe 用于生成 Series 的统计学描述。返回一个多行的统计表,每一行对应一个统计指标,有总数、平均数、标准差、最小值、四分位数、最大值等,
- 参与统计描述的列,里面的 缺失值(
NaN),会在计算时被排除。
语法:
Series.describe(percentiles=None, include=None, exclude=None)
返回值:
-
Series or DataFrame
调用
Series.describe方法时,根据传入类型的不同,返回Series或DataFrame。
参数说明:
include 数据类型白名单
-
include : ‘all’, list-like of dtypes or None (default), optional
include参数,用于指定哪种数据类型的列参与统计描述。如果某列的数据类型出现在白名单中,此列将参与统计描述。- 对于
Series此参数无效。
- 对于
exclude 数据类型黑名单
-
exclude : list-like of dtypes or None (default), optional,
exclude参数,用于指定要排除的数据类型白名单。如果某列的数据类型出现在黑名单中,此列将不会参与统计描述。- 对于
Series此参数无效。
- 对于
percentiles 自定义百分位数
-
percentiles : *list-like of numbers, optional
percentiles参数用于自定义百分位数:- list-like: 用 类似列表 传递自定义的
百分位数,列表里每个元素都应该介于0-1之间,默认状态下,百分数只会返回[0.25, 0.5, 0.75](即第1~3四分位数)。
⚠️ 注意 :
你可以指定多个百分位数。例1
- list-like: 用 类似列表 传递自定义的
⚠️ 注意 :
虽然
numpy.number包含复数np.complexfloating,但是Pandas.DataFrame.describe只支持实数的计算,如果DataFrame存在复数,但是没有被排除,会引发报错TypeError: a must be an array of real numbers。 例2对于数值数据(numeric data),结果的索引将包括
count、mean、std、min、max,以及lower、50和upper百分位数。默认情况下,lower百分位数是25,upper百分位数是75。50 百分位数与中位数相同。对于对象数据(object data),例如字符串或时间戳,结果的索引将包括
count、unique、top和freq。top是最常见的值,freq是最常见值的频率。时间戳还包括第一个和最后一个项。如果多个对象值具有最高计数,则计数和
top的结果将从具有最高计数的值中任意选择。对于通过
Series提供的混合数据类型(),默认情况下仅返回数值列的分析结果。如果Series仅包含对象(‘object’)和分类数据(‘category’)而没有任何数值列,则默认情况下将返回对对象(‘object’)和分类数据(‘category’)列的分析结果。如果提供了 include=‘all’ 作为选项,则结果将包括每种类型的属性的并集。
include和exclude参数可用于限制要分析的Series中的列。在分析Series时,这些参数将被忽略。
相关方法:
➡️ 相关方法
DataFrame.count
非空单元格计数
DataFrame.max
最大值
DataFrame.min
最小值
DataFrame.mean
平均值
DataFrame.std
样本标准差/总体标准差
DataFrame.select_dtypes
根据数据类型筛选列
示例:
测试文件下载:
本文所涉及的测试文件,如有需要,可在文章顶部的绑定资源处下载。
若发现文件无法下载,应该是资源包有内容更新,正在审核,请稍后再试。或站内私信作者索要。

例1:自定义百分位数
import pandas as pd
import numpy as np
s = pd.Series(np.arange(1,10,1))s.describe(include=[np.number], percentiles=[0.1, 0.4, 0.7, 0.8, 0.85])
count 9.000000
mean 5.000000
std 2.738613
min 1.000000
10% 1.800000
40% 4.200000
50% 5.000000
70% 6.600000
80% 7.400000
85% 7.800000
max 9.000000
dtype: float64
例2:复数的统计描述
例2-1、构建包含复数的Series
import numpy as np
import pandas as pd# 构建演示数据
s = pd.Series([1 + 1j, 2 + 2j, 3 + 3j])s
0 1.0+1.0j
1 2.0+2.0j
2 3.0+3.0j
dtype: complex128
例2-2、复数的统计描述,会引发报错
s.describe()
D:\miniconda3\envs\python3.12\Lib\site-packages\numpy\core\_methods.py:49: ComplexWarning: Casting complex values to real discards the imaginary partreturn umr_sum(a, axis, dtype, out, keepdims, initial, where)
D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\nanops.py:944: RuntimeWarning: invalid value encountered in sqrtresult = np.sqrt(nanvar(values, axis=axis, skipna=skipna, ddof=ddof, mask=mask))---------------------------------------------------------------------------TypeError Traceback (most recent call last)Cell In[59], line 1
----> 1 df.describe()File D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\generic.py:11544, in NDFrame.describe(self, percentiles, include, exclude)11302 @final11303 def describe(11304 self,(...)11307 exclude=None,11308 ) -> Self:11309 """11310 Generate descriptive statistics.11311 (...)11542 max NaN 3.011543 """
> 11544 return describe_ndframe(11545 obj=self,11546 include=include,11547 exclude=exclude,11548 percentiles=percentiles,11549 ).__finalize__(self, method="describe")File D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\methods\describe.py:97, in describe_ndframe(obj, include, exclude, percentiles)90 else:91 describer = DataFrameDescriber(92 obj=cast("DataFrame", obj),93 include=include,94 exclude=exclude,95 )
---> 97 result = describer.describe(percentiles=percentiles)98 return cast(NDFrameT, result)File D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\methods\describe.py:170, in DataFrameDescriber.describe(self, percentiles)168 for _, series in data.items():169 describe_func = select_describe_func(series)
--> 170 ldesc.append(describe_func(series, percentiles))172 col_names = reorder_columns(ldesc)173 d = concat(174 [x.reindex(col_names, copy=False) for x in ldesc],175 axis=1,176 sort=False,177 )File D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\methods\describe.py:232, in describe_numeric_1d(series, percentiles)227 formatted_percentiles = format_percentiles(percentiles)229 stat_index = ["count", "mean", "std", "min"] + formatted_percentiles + ["max"]230 d = (231 [series.count(), series.mean(), series.std(), series.min()]
--> 232 + series.quantile(percentiles).tolist()233 + [series.max()]234 )235 # GH#48340 - always return float on non-complex numeric data236 dtype: DtypeObj | NoneFile D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\series.py:2769, in Series.quantile(self, q, interpolation)2765 # We dispatch to DataFrame so that core.internals only has to worry2766 # about 2D cases.2767 df = self.to_frame()
-> 2769 result = df.quantile(q=q, interpolation=interpolation, numeric_only=False)2770 if result.ndim == 2:2771 result = result.iloc[:, 0]File D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\frame.py:11831, in DataFrame.quantile(self, q, axis, numeric_only, interpolation, method)11827 raise ValueError(11828 f"Invalid method: {method}. Method must be in {valid_method}."11829 )11830 if method == "single":
> 11831 res = data._mgr.quantile(qs=q, interpolation=interpolation)11832 elif method == "table":11833 valid_interpolation = {"nearest", "lower", "higher"}File D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\internals\managers.py:1508, in BlockManager.quantile(self, qs, interpolation)1504 new_axes = list(self.axes)1505 new_axes[1] = Index(qs, dtype=np.float64)1507 blocks = [
-> 1508 blk.quantile(qs=qs, interpolation=interpolation) for blk in self.blocks1509 ]1511 return type(self)(blocks, new_axes)File D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\internals\blocks.py:1587, in Block.quantile(self, qs, interpolation)1584 assert self.ndim == 21585 assert is_list_like(qs) # caller is responsible for this
-> 1587 result = quantile_compat(self.values, np.asarray(qs._values), interpolation)1588 # ensure_block_shape needed for cases where we start with EA and result1589 # is ndarray, e.g. IntegerArray, SparseArray1590 result = ensure_block_shape(result, ndim=2)File D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\array_algos\quantile.py:39, in quantile_compat(values, qs, interpolation)37 fill_value = na_value_for_dtype(values.dtype, compat=False)38 mask = isna(values)
---> 39 return quantile_with_mask(values, mask, fill_value, qs, interpolation)40 else:41 return values._quantile(qs, interpolation)File D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\array_algos\quantile.py:97, in quantile_with_mask(values, mask, fill_value, qs, interpolation)95 result = np.repeat(flat, len(values)).reshape(len(values), len(qs))96 else:
---> 97 result = _nanpercentile(98 values,99 qs * 100.0,100 na_value=fill_value,101 mask=mask,102 interpolation=interpolation,103 )105 result = np.array(result, copy=False)106 result = result.TFile D:\miniconda3\envs\python3.12\Lib\site-packages\pandas\core\array_algos\quantile.py:218, in _nanpercentile(values, qs, na_value, mask, interpolation)216 return result217 else:
--> 218 return np.percentile(219 values,220 qs,221 axis=1,222 # error: No overload variant of "percentile" matches argument types223 # "ndarray[Any, Any]", "ndarray[Any, dtype[floating[_64Bit]]]",224 # "int", "Dict[str, str]" [call-overload]225 method=interpolation, # type: ignore[call-overload]226 )File D:\miniconda3\envs\python3.12\Lib\site-packages\numpy\lib\function_base.py:4277, in percentile(a, q, axis, out, overwrite_input, method, keepdims, interpolation)4275 a = np.asanyarray(a)4276 if a.dtype.kind == "c":
-> 4277 raise TypeError("a must be an array of real numbers")4279 q = np.true_divide(q, 100)4280 q = asanyarray(q) # undo any decay that the ufunc performed (see gh-13105)TypeError: a must be an array of real numbers