如何使用Python处理丢失的数据

The complete notebook and required datasets can be found in the git repo here

完整的笔记本和所需的数据集可以在git repo中找到

Real-world data often has missing values.

实际数据通常缺少值

Data can have missing values for a number of reasons such as observations that were not recorded/measured or may be data corrupted.

数据可能由于许多原因而缺少值,例如未记录/测量的观测值或数据可能已损坏。

Handling missing data is important as many machine learning algorithms do not support data with missing values.

处理丢失的数据非常重要,因为许多机器学习算法不支持带有缺失值的数据。

In this notebook, you will discover how to handle missing data for machine learning with Python.

在本笔记本中,您将发现如何使用Python处理丢失的数据以进行机器学习。

Specifically, after completing this tutorial you will know:

具体而言,完成本教程后,您将知道:

  • How to mark invalid or corrupt values as missing in your dataset.

    如何在数据集中将无效或损坏的值标记为丢失

  • How to remove rows with missing data from your dataset.

    如何从数据集中删除缺少数据的行。

  • How to impute missing values with mean values in your dataset.

    如何在数据集中用均值估算缺失值

Lets get started.

让我们开始吧。

"How to Handle Missing Data with Python"

总览 (Overview)

This tutorial is divided into 6 parts:

本教程分为6部分:

  1. Diabetes Dataset: where we look at a dataset that has known missing values.

    糖尿病数据集:我们在其中查看具有已知缺失值的数据集。

  2. Mark Missing Values: where we learn how to mark missing values in a dataset.

    标记缺失值:我们在这里学习如何标记数据集中的缺失值。

  3. Missing Values Causes Problems: where we see how a machine learning algorithm can fail when it contains missing values.

    缺失值会导致问题:在这里,我们将了解机器学习算法包含缺失值时如何失败。

  4. Remove Rows With Missing Values: where we see how to remove rows that contain missing values.

    删除具有缺失值的行:我们将在这里看到如何删除包含缺失值的行。

  5. Impute Missing Values: where we replace missing values with sensible values.

    估算缺失值:我们用合理的值替换缺失的值。

  6. Algorithms that Support Missing Values: where we learn about algorithms that support missing values.

    支持缺失值的算法:我们在此处了解支持缺失值的算法。

First, let’s take a look at our sample dataset with missing values.

首先,让我们看一下缺少值的样本数据集。

1.糖尿病数据集 (1. Diabetes Dataset)

The Diabetes Dataset involves predicting the onset of diabetes within 5 years in given medical details.

糖尿病数据集包括在给定的医疗细节中预测5年内的糖尿病发作。

  • Dataset File.

    数据集文件。
  • Dataset Details both files are available in the same folder as this notebook.

    数据集详细信息这两个文件都在与此笔记本相同的文件夹中。

It is a binary (2-class) classification problem. The number of observations for each class is not balanced. There are 768 observations with 8 input variables and 1 output variable. The variable names are as follows:

这是一个二进制(2类)分类问题。 每个类别的观察次数不平衡。 有768个观测值,其中包含8个输入变量和1个输出变量。 变量名称如下:

  1. Number of times pregnant.

    怀孕的次数。
  2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test.

    口服葡萄糖耐量试验中血浆葡萄糖浓度2小时。
  3. Diastolic blood pressure (mm Hg).

    舒张压(毫米汞柱)。
  4. Triceps skinfold thickness (mm).

    三头肌皮褶厚度(毫米)。
  5. 2-Hour serum insulin (mu U/ml).

    2小时血清胰岛素(mu U / ml)。
  6. Body mass index (weight in kg/(height in m)²).

    体重指数(体重以千克/(身高以米)²)。
  7. Diabetes pedigree function.

    糖尿病谱系功能。
  8. Age (years).

    年龄(年)。
  9. Class variable (0 or 1).

    类变量(0或1)。

The baseline performance of predicting the most prevalent class is a classification accuracy of approximately 65%. Top results achieve a classification accuracy of approximately 77%.

预测最流行的类别的基准性能是大约65%的分类准确性。 最佳结果的分类精度约为77%。

A sample of the first 5 rows is listed below.

下面列出了前5行的示例。

# load and summarize the dataset
import numpy as np
import pandas as pd
# load the dataset
dataset = pd.read_csv('pima-indians-diabetes.csv', header=None)
# look few rows of the dataset
dataset.head()

This dataset is known to have missing values.

已知此数据集缺少值。

Specifically, there are missing observations for some columns that are marked as a zero value.(This is a very bad way representation of missing values)

具体来说,某些标记为零值的列缺少观测值(这是表示缺失值的一种非常糟糕的方式)

We can corroborate this by the definition of those columns and the domain knowledge that a zero value is invalid for those measures, e.g. a zero for body mass index or blood pressure is invalid.

我们可以通过定义这些列和领域知识来证实这一点,即对于这些度量,零值无效,例如,对于体重指数或血压为零无效。

Note : Here zero values (0) for data indicate missing values only for few predictors/features, namely 1,2,3,4,5 and not for target/response variable

注意:此处数据的零值(0)仅表示很少的预测变量/特征(即1,2,3,4,5)的缺失值,而不是目标变量/响应变量的缺失值

2.标记缺失值 (2. Mark Missing Values)

Most data has missing values, and the likelihood of having missing values increases with the size of the dataset.

大多数数据都有缺失值,并且缺失值的可能性会随数据集的大小而增加。

Missing data are not rare in real data sets. In fact, the chance that at least one data point is missing increases as the data set size increases.

丢失数据在实际数据集中并不罕见。 实际上,随着数据集大小的增加,至少一个数据点丢失的机会增加。

— Page 187, Feature Engineering and Selection, 2019.

—第187页,功能工程与选择,2019年。

A note on this book, I just received this two days book and enjoying reading it

关于这本书的笔记,我刚收到这两天的书,喜欢阅读

Image for post

In this section, we will look at how we can identify and mark values as missing.

在本节中,我们将研究如何识别和标记缺失值。

We can use summary statistics to help identify missing or corrupt data.

我们可以使用摘要统计信息来帮助识别丢失或损坏的数据。

We can load the dataset as a Pandas DataFrame and print summary statistics on each attribute.

我们可以将数据集作为Pandas DataFrame加载,并在每个属性上打印摘要统计信息。

dataset.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
0 768 non-null int64
1 768 non-null int64
2 768 non-null int64
3 768 non-null int64
4 768 non-null int64
5 768 non-null float64
6 768 non-null float64
7 768 non-null int64
8 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KBdataset.describe()# example of summarizing the number of missing values for each variable
# count the number of missing values for each column
num_missing = (dataset[[1,2,3,4,5]] == 0).sum()
# report the results
num_missing1 5
2 35
3 227
4 374
5 11
dtype: int64# replace '0' values with 'nan'
dataset[[1,2,3,4,5]] = dataset[[1,2,3,4,5]].replace(0, np.nan)
# count the number of nan values in each column
dataset.isnull().sum()0 0
1 5
2 35
3 227
4 374
5 11
6 0
7 0
8 0
dtype: int64dataset.head()

3.缺少值会导致问题 (3. Missing Values Causes Problems)

Having missing values in a dataset can cause errors with some machine learning algorithms.

数据集中缺少值会导致某些机器学习算法出错。

Missing values are common occurrences in data. Unfortunately, most predictive modeling techniques cannot handle any missing values. Therefore, this problem must be addressed prior to modeling.

缺失值是数据中的常见情况。 不幸的是,大多数预测建模技术无法处理任何缺失值。 因此,必须在建模之前解决此问题。

— Page 203, Feature Engineering and Selection, 2019.

—第203页, 功能工程与选择 ,2019年。

In this section, we will try to evaluate a the Linear Discriminant Analysis (LDA) algorithm on the dataset with missing values.

在本节中,我们将尝试对缺少值的数据集评估线性判别分析(LDA)算法。

This is an algorithm that does not work when there are missing values in the dataset.

当数据集中缺少值时,此算法无效。

The below example marks the missing values in the dataset, as we did in the previous section, then attempts to evaluate LDA using 3-fold cross validation and print the mean accuracy.

下面的示例与上一节中的操作一样,标记了数据集中的缺失值,然后尝试使用3倍交叉验证来评估LDA并打印出平均准确度。

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
# split dataset into inputs and outputs
values = dataset.values
X = values[:,0:8]
y = values[:,8]
# define the model
model = LinearDiscriminantAnalysis()
# define the model evaluation procedure
cv = KFold(n_splits=3, shuffle=True, random_state=1)
# evaluate the model
result = cross_val_score(model, X, y, cv=cv, scoring='accuracy')
# report the mean performance
print('Accuracy: %.3f' % result.mean())Accuracy: nan
(FitFailedWarning)

Running the example results in an error, as above the collapsed output

运行示例会导致错误,如上面的折叠输出所示

This is as we expect.

这是我们所期望的。

We are prevented from evaluating an LDA algorithm (and other algorithms) on the dataset with missing values.

我们无法评估缺少值的数据集上的LDA算法(和其他算法)。

Many popular predictive models such as support vector machines, the glmnet, and neural networks, cannot tolerate any amount of missing values.

许多流行的预测模型,例如支持向量机,glmnet和神经网络,都无法容忍任何数量的缺失值。

— Page 195, Feature Engineering and Selection, 2019.

—第195页, 功能工程与选择 ,2019年。

Now, we can look at methods to handle the missing values.

现在,我们来看一下处理缺失值的方法。

4.删除缺少值的行 (4. Remove Rows With Missing Values)

The simplest strategy for handling missing data is to remove records that contain a missing value.

处理缺失数据的最简单策略是删除包含缺失值的记录。

The simplest approach for dealing with missing values is to remove entire predictor(s) and/or sample(s) that contain missing values.

处理缺失值的最简单方法是删除包含缺失值的整个预测变量和/或样本。

— Page 196, Feature Engineering and Selection, 2019.

—第196页,功能工程与选择,2019年。

We can do this by creating a new Pandas DataFrame with the rows containing missing values removed.

为此,我们可以创建一个新的Pandas DataFrame,并删除包含缺失值的行。

Pandas provides the dropna() function that can be used to drop either columns or rows with missing data. We can use dropna() to remove all rows with missing data, as follows:

Pandas提供了dropna()函数,该函数可用于删除缺少数据的列或行。 我们可以使用dropna()删除所有缺少数据的行,如下所示:

present dataset shape:

当前数据集形状:

dataset.shape(768, 9)X.shape(768, 8)# drop rows with missing values
dataset.dropna(inplace=True)
# summarize the shape of the data with missing rows removed
print(dataset.shape)(392, 9)

In this example, we can see that the number of rows has been aggressively cut from 768 in the original dataset to 392 with all rows containing a NaN removed.

在此示例中,我们可以看到行数已从原始数据集中的768个减少到392个,同时删除了所有包含NaN的行。

# split dataset into inputs and outputs
values = dataset.values
X = values[:,0:8]
y = values[:,8]
# define the model
model = LinearDiscriminantAnalysis()
# define the model evaluation procedure
cv = KFold(n_splits=3, shuffle=True, random_state=1)
# evaluate the model
result = cross_val_score(model, X, y, cv=cv, scoring='accuracy')
# report the mean performance
print('Accuracy: %.3f' % result.mean())Accuracy: 0.781

Removing rows with missing values can be too limiting on some predictive modeling problems, an alternative is to impute missing values.

删除缺失值的行可能会在某些预测建模问题上过于局限,一种替代方法是估算缺失值。

5.估算缺失值 (5. Impute Missing Values)

Imputing refers to using a model to replace missing values.

估算是指使用模型替换缺失值。

¦ missing data can be imputed. In this case, we can use information in the training set predictors to, in essence, estimate the values of other predictors.

¦可以估算丢失的数据。 在这种情况下,我们可以使用训练集预测变量中的信息来实质上估计其他预测变量的值。

— Page 42, Applied Predictive Modeling, 2013.

—第42页, 应用预测建模 ,2013年。

There are many options we could consider when replacing a missing value, for example:

替换缺失值时,我们可以考虑许多选项,例如:

A constant value that has meaning within the domain, such as 0, distinct from all other values. A value from another randomly selected record. A mean, median or mode value for the column. A value estimated by another predictive model. Any imputing performed on the training dataset will have to be performed on new data in the future when predictions are needed from the finalized model. This needs to be taken into consideration when choosing how to impute the missing values.

在域中具有含义的常数值,例如0,不同于所有其他值。 来自另一个随机选择的记录的值。 列的平均值,中位数或众数值。 由另一个预测模型估计的值。 当需要根据定型模型进行预测时,将来必须在新数据上执行在训练数据集上执行的所有估算。 选择如何估算缺失值时,必须考虑到这一点。

For example, if you choose to impute with mean column values, these mean column values will need to be stored to file for later use on new data that has missing values.

例如,如果您选择使用平均列值进行估算,则这些平均列值将需要存储到文件中,以便以后在缺少值的新数据上使用。

Pandas provides the fillna() function for replacing missing values with a specific value.

Pandas提供fillna()函数,用于用特定值替换缺失值。

For example, we can use fillna() to replace missing values with the mean value for each column, as follows:

例如,我们可以使用fillna()将缺失值替换为每一列的平均值,如下所示:

# manually impute missing values with numpy
from pandas import read_csv
from numpy import nan
# load the dataset
dataset = read_csv('pima-indians-diabetes.csv', header=None)
# mark zero values as missing or NaN
dataset[[1,2,3,4,5]] = dataset[[1,2,3,4,5]].replace(0, nan)
# fill missing values with mean column values
dataset.fillna(dataset.mean(), inplace=True)
# count the number of NaN values in each column
print(dataset.isnull().sum())0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
dtype: int64

The scikit-learn library provides the SimpleImputer pre-processing class that can be used to replace missing values.

scikit-learn库提供了SimpleImputer预处理类,该类可用于替换缺少的值。

It is a flexible class that allows you to specify the value to replace (it can be something other than NaN) and the technique used to replace it (such as mean, median, or mode). The SimpleImputer class operates directly on the NumPy array instead of the DataFrame.

它是一个灵活的类,允许您指定要替换的值(可以是NaN以外的其他值)以及用于替换它的技术(例如均值,中位数或众数)。 SimpleImputer类直接在NumPy数组而不是DataFrame上操作。

The example below uses the SimpleImputer class to replace missing values with the mean of each column then prints the number of NaN values in the transformed matrix.

下面的示例使用SimpleImputer类用每列的平均值替换缺失值,然后在转换后的矩阵中打印NaN值的数量。

from sklearn.impute import SimpleImputer
# load the dataset
dataset = read_csv('pima-indians-diabetes.csv', header=None)
# mark zero values as missing or NaN
dataset[[1,2,3,4,5]] = dataset[[1,2,3,4,5]].replace(0, np.nan)
# retrieve the numpy array
values = dataset.values
# define the imputer
imputer = SimpleImputer(missing_values=nan, strategy='mean')
# transform the dataset
transformed_values = imputer.fit_transform(values)
# count the number of NaN values in each column
print('Missing: %d' % np.isnan(transformed_values).sum())Missing: 0

In either case, we can train algorithms sensitive to NaN values in the transformed dataset, such as LDA.

无论哪种情况,我们都可以训练对转换后的数据集中的NaN值敏感的算法,例如LDA。

The example below shows the LDA algorithm trained in the SimpleImputer transformed dataset.

下面的示例显示了在SimpleImputer转换的数据集中训练的LDA算法。

We use a Pipeline to define the modeling pipeline, where data is first passed through the imputer transform, then provided to the model. This ensures that the imputer and model are both fit only on the training dataset and evaluated on the test dataset within each cross-validation fold. This is important to avoid data leakage.

我们使用Pipeline来定义建模管道,其中数据首先通过imputer转换传递,然后提供给模型。 这样可以确保在每个交叉验证折叠内,仅将导入者和模型都仅适合于训练数据集并对其进行评估。 这对于避免数据泄漏很重要。

The complete final code is here:

完整的最终代码在这里:

# example of evaluating a model after an imputer transform
from numpy import nan
from pandas import read_csv
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
dataset = read_csv('pima-indians-diabetes.csv', header=None)
# mark zero values as missing or NaN
dataset[[1,2,3,4,5]] = dataset[[1,2,3,4,5]].replace(0, nan)
# split dataset into inputs and outputs
values = dataset.values
X = values[:,0:8]
y = values[:,8]
# define the imputer
imputer = SimpleImputer(missing_values=nan, strategy='mean')
# define the model
lda = LinearDiscriminantAnalysis()
# define the modeling pipeline
pipeline = Pipeline(steps=[('imputer', imputer),('model', lda)])
# define the cross validation procedure
kfold = KFold(n_splits=3, shuffle=True, random_state=1)
# evaluate the model
result = cross_val_score(pipeline, X, y, cv=kfold, scoring='accuracy')
# report the mean performance
print('Accuracy: %.3f' % result.mean())Accuracy: 0.762

Try replacing the missing values with other values and see if you can lift the performance of the model.

尝试用其他值替换缺少的值,看看是否可以提高模型的性能。

Maybe missing values have meaning in the data.

缺失值可能在数据中有意义。

6.支持缺失值的算法 (6. Algorithms that Support Missing Values)

Not all algorithms fail when there is missing data.

缺少数据时,并非所有算法都会失败。

There are algorithms that can be made robust to missing data, such as k-Nearest Neighbors that can ignore a column from a distance measure when a value is missing. Naive Bayes can also support missing values when making a prediction.

有一些算法可以使丢失数据变得健壮,例如k最近邻可以在缺少值时忽略距离度量中的列。 进行预测时, 朴素贝叶斯也可以支持缺失值。

One of the really nice things about Naive Bayes is that missing values are no problem at all.

朴素贝叶斯(Naive Bayes)的真正好处之一是,缺少值根本没有问题。

— Page 100, Data Mining: Practical Machine Learning Tools and Techniques, 2016.

—第100页,数据挖掘:实用机器学习工具和技术,2016年。

There are also algorithms that can use the missing value as a unique and different value when building the predictive model, such as classification and regression trees.

还有一些算法可以在构建预测模型时使用缺失值作为唯一值和不同值,例如分类树和回归树

¦ a few predictive models, especially tree-based techniques, can specifically account for missing data.

¦一些预测模型,尤其是基于树的技术,可以专门说明丢失的数据。

— Page 42, Applied Predictive Modeling, 2013.

—第42页,应用预测建模,2013年。

Sadly, the scikit-learn implementations of naive bayes, decision trees and k-Nearest Neighbors are not robust to missing values. Although it is being considered in future versions of scikit-learn.

可悲的是,朴素贝叶斯,决策树和k最近邻居的scikit-learn实现对于丢失值并不健壮。 尽管scikit-learn的未来版本中将考虑使用它。

Nevertheless, this remains as an option if you consider using another algorithm implementation (such as xgboost).

但是,如果您考虑使用其他算法实现(例如xgboost ),则仍然可以选择这样做。

翻译自: https://medium.com/@kvssetty/how-to-handel-missing-data-71a3eb89ef91

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.tpcf.cn/news/391910.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

MySQL—隔离级别

READ UNCOMMITED(读未提交) 即读取到了正在修改但是却还没有提交的数据&#xff0c;这就会造成数据读取的错误。 READ COMMITED(提交读/不可重复读) 它与READ UNCOMMITED的区别在于&#xff0c;它规定读取的时候读到的数据只能是提交后的数据。 这个级别所带来的问题就是不可…

做虚拟化服务器的配资一致嘛,服务器虚拟化技术在校园网管理中的应用探讨.pdf...

第 卷 第 期 江 苏 建 筑 职 业 技 术 学 院 学 报14 3 Vol.14 曧.3年 月 JOURNAL OF JIANGSU JIANZHU INSTITUTE2014 09 Se .2014p服务器虚拟化技术在校园网管理中的应用探讨,汪小霞 江建( , )健雄职业技术学院 软件与服务外包学院 江苏 太仓 215411: , ,摘 要 高校校园网数据…

aws中部署防火墙_如何在AWS中设置自动部署

aws中部署防火墙by Harry Sauers哈里绍尔斯(Harry Sauers) 如何在AWS中设置自动部署 (How to set up automated deployment in AWS) 设置和配置服务器 (Provisioning and Configuring Servers) 介绍 (Introduction) In this tutorial, you’ll learn how to use Amazon’s AWS…

Runtime的应用

来自&#xff1a;http://www.imlifengfeng.com/blog/?p397 1、快速归档 (id)initWithCoder:(NSCoder *)aDecoder { if (self [super init]) { unsigned int outCount; Ivar * ivars class_copyIvarList([self class], &outCount); for (int i 0; i < outCount; i ) …

使用 VisualVM 进行性能分析及调优

https://www.ibm.com/developerworks/cn/java/j-lo-visualvm/转载于:https://www.cnblogs.com/adolfmc/p/7238893.html

spring—事务控制

编程式事务控制相关对象 PlatformTransactionManager PlatformTransactionManager 接口是 spring 的事务管理器&#xff0c;它里面提供了我们常用的操作事务的方法。注意&#xff1a; PlatformTransactionManager 是接口类型&#xff0c;不同的 Dao 层技术则有不同的实现类 …

为什么印度盛产码农_印度农产品价格的时间序列分析

为什么印度盛产码农Agriculture is at the center of Indian economy and any major change in the sector leads to a multiplier effect on the entire economy. With around 17% contribution to the Gross Domestic Product (GDP), it provides employment to more than 50…

SAP NetWeaver

SAP的新一代企业级服务架构——NetWeaver    SAP NetWeaver是下一代基于服务的平台&#xff0c;它将作为未来所有SAP应用程序的基础。NetWeaver包含了一个门户框架&#xff0c;商业智能和报表&#xff0c;商业流程管理&#xff08;BPM&#xff09;&#xff0c;自主数据管理&a…

NotifyMyFrontEnd 函数背后的数据缓冲区(一)

async.c的 static void NotifyMyFrontEnd(const char *channel, const char *payload, int32 srcPid) 函数中的主要逻辑是这样的&#xff1a;复制代码if (whereToSendOutput DestRemote) { StringInfoData buf; pq_beginmessage(&buf, A); //cursor 为 A pq…

最后期限 软件工程_如何在软件开发的最后期限内实现和平

最后期限 软件工程D E A D L I N E…最后期限… As a developer, this is one of your biggest nightmares or should I say your enemy? Name it whatever you want.作为开发人员&#xff0c;这是您最大的噩梦之一&#xff0c;还是我应该说您的敌人&#xff1f; 随便命名。 …

SQL Server的复合索引学习【转载】

概要什么是单一索引,什么又是复合索引呢? 何时新建复合索引&#xff0c;复合索引又需要注意些什么呢&#xff1f;本篇文章主要是对网上一些讨论的总结。一.概念单一索引是指索引列为一列的情况,即新建索引的语句只实施在一列上。用户可以在多个列上建立索引&#xff0c;这种索…

leetcode 1423. 可获得的最大点数(滑动窗口)

几张卡牌 排成一行&#xff0c;每张卡牌都有一个对应的点数。点数由整数数组 cardPoints 给出。 每次行动&#xff0c;你可以从行的开头或者末尾拿一张卡牌&#xff0c;最终你必须正好拿 k 张卡牌。 你的点数就是你拿到手中的所有卡牌的点数之和。 给你一个整数数组 cardPoi…

pandas处理excel文件和csv文件

一、csv文件 csv以纯文本形式存储表格数据 pd.read_csv(文件名)&#xff0c;可添加参数enginepython,encodinggbk 一般来说&#xff0c;windows系统的默认编码为gbk&#xff0c;可在cmd窗口通过chcp查看活动页代码&#xff0c;936即代表gb2312。 例如我的电脑默认编码时gb2312&…

tukey检测_回到数据分析的未来:Tukey真空度的整洁实现

tukey检测One of John Tukey’s landmark papers, “The Future of Data Analysis”, contains a set of analytical techniques that have gone largely unnoticed, as if they’re hiding in plain sight.John Tukey的标志性论文之一&#xff0c;“ 数据分析的未来 ”&#x…

spring— Spring与Web环境集成

ApplicationContext应用上下文获取方式 应用上下文对象是通过new ClasspathXmlApplicationContext(spring配置文件) 方式获取的&#xff0c;但是每次从容器中获 得Bean时都要编写new ClasspathXmlApplicationContext(spring配置文件) &#xff0c;这样的弊端是配置文件加载多次…

Elasticsearch集群知识笔记

Elasticsearch集群知识笔记 Elasticsearch内部提供了一个rest接口用于查看集群内部的健康状况&#xff1a; curl -XGET http://localhost:9200/_cluster/healthresponse结果&#xff1a; {"cluster_name": "format-es","status": "green&qu…

Item 14 In public classes, use accessor methods, not public fields

在public类中使用访问方法&#xff0c;而非公有域 这标题看起来真晦涩。。解释一下就是&#xff0c;如果类变成public的了--->那就使用getter和setter&#xff0c;不要用public成员。 要注意它的前提&#xff0c;如果是private的class&#xff08;内部类..&#xff09;或者p…

子集和与一个整数相等算法_背包问题的一个变体:如何解决Java中的分区相等子集和问题...

子集和与一个整数相等算法by Fabian Terh由Fabian Terh Previously, I wrote about solving the Knapsack Problem (KP) with dynamic programming. You can read about it here.之前&#xff0c;我写过有关使用动态编程解决背包问题(KP)的文章。 你可以在这里阅读 。 Today …

matplotlib图表介绍

Matplotlib 是一个python 的绘图库&#xff0c;主要用于生成2D图表。 常用到的是matplotlib中的pyplot&#xff0c;导入方式import matplotlib.pyplot as plt 一、显示图表的模式 1.plt.show() 该方式每次都需要手动show()才能显示图表&#xff0c;由于pycharm不支持魔法函数&a…

到2025年将保持不变的热门流行技术

重点 (Top highlight)I spent a good amount of time interviewing SMEs, data scientists, business analysts, leads & their customers, programmers, data enthusiasts and experts from various domains across the globe to identify & put together a list that…