# Scikit-Learn基础教程

## Scikit-Learn基础教程

Scikit-Learn（sklearn）是Python中广泛使用的机器学习库，提供了丰富的工具用于数据预处理、模型训练和评估。本文将带你从基础开始，逐步掌握使用Scikit-Learn进行机器学习的核心步骤和方法。

### 一、安装Scikit-Learn

pip install scikit-learn

### 二、数据预处理

#### 1. 加载数据

Scikit-Learn提供了多种数据集，可以直接加载用于实验和学习。以Iris数据集为例：

from sklearn.datasets import load_iris
X, y = iris.data, iris.target

#### 2. 数据标准化

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

### 三、拆分数据集

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

### 四、训练模型

Scikit-Learn提供了多种机器学习算法，以下是几种常见算法的使用示例。

#### 1. 逻辑回归

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

#### 2. 支持向量机

from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)

#### 3. 决策树

from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

### 五、模型评估

from sklearn.metrics import accuracy_score, classification_report
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

### 六、超参数调优

#### 1. 网格搜索

from sklearn.model_selection import GridSearchCV
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
grid_search = GridSearchCV(SVC(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
print("Best Parameters:", grid_search.best_params_)
model = grid_search.best_estimator_

#### 2. 随机搜索

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
param_dist = {'C': uniform(0.1, 10), 'kernel': ['linear', 'rbf']}
random_search = RandomizedSearchCV(SVC(), param_dist, n_iter=100, cv=5, random_state=42)
random_search.fit(X_train, y_train)
print("Best Parameters:", random_search.best_params_)
model = random_search.best_estimator_

### 七、模型保存和加载

#### 1. 保存模型

import joblib
joblib.dump(model, 'model.pkl')

#### 2. 加载模型

model = joblib.load('model.pkl')

### 八、实例：使用Scikit-Learn进行完整的机器学习流程

import joblib
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
# 加载数据
X, y = iris.data, iris.target
# 数据标准化
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# 拆分数据集
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
# 模型训练和超参数调优
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
grid_search = GridSearchCV(SVC(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
# 最佳模型评估
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
print("Best Parameters:", grid_search.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
# 保存模型
joblib.dump(best_model, 'best_model.pkl')

### 结论

Scikit-Learn作为一款强大的机器学习库，提供了从数据预处理到模型评估的全流程工具，适合各种机器学习任务。通过掌握Scikit-Learn的基本用法和核心组件，开发者可以快速构建和优化机器学习模型，解决实际问题。如果你有任何问题或建议，欢迎在评论区留言。感谢阅读，祝你在机器学习的道路上取得更大进展！

