admin管理员组

文章数量:1516870

[机器学习

[机器学习-5]岭回归[L2正则化]及python实现(Ridge Regression)【L2】

    • 前言
    • 题目
    • 岭回归(Ridge Regression)
    • k-fold validation
    • sklearn ridge
    • (1)实现(realization)
    • (2)实现
    • (3)实现
    • 结语

前言

!!实现直接跳转实现
这章本来是为了接前面的过拟合(overfit)的,结果到现在过拟合还没水出来,所以就先写这一篇啦。
首先简单地提一下过拟合,所谓过拟合,可以理解为过度地取悦我们所用的数据,结果就是训练出来的模型,在我们的训练集(training set)上表现完美,结果在泛化上直接拉跨。原因是我们训练集内部可能有一些结构性的特点,而我们在训练的时候把它们考虑进来了。
下面的几个例子都可以很清楚地看到过拟合的现象出现

题目

今天我在做老师的题目的时候,遇到了这道题

简单来说,就是我们要训练一个模型来预测一个城市的犯罪率,我们先看老师给的README文件

我们的训练集中的第一列为犯罪率,后面为各种因素(比如失业率等等)也就是所谓的feature,使用训练集中的数据后,我们要在测试集上测试我们所训练出来的模型

岭回归(Ridge Regression)

众所周知,L1(LASSO)和L2(Ridge)都是我们用来防止出现过拟合的方法.
对于下面这么一个问题,我们怎么解决呢?我们知道我们的目标函数与限制即参数矩阵的二范数,结合起来,我们要求满足条件的最小参数矩阵。

我们可以通过添加惩罚来讲两者联系起来,这样一来,显然参数矩阵的二范数就不能过大了,因为它越大,惩罚越重。参数lambda与C显然存在具体关系(虽然我们不知道),这样我们就将选C,改变成了选lambda了



不同的L2正则化

k-fold validation

既然题目要求,那么我就来提一嘴吧
所谓k-fold validation就是将数据分成K份,分别将第i(i=1,…,K)份作为验证集(validation set),其余作为训练集,训练出来后在验证机上测试performance

最终通过测试选取一个拥有最小MSE的模型。需要注意与测试集的区别

sklearn ridge

这里我用到了sklearn自带的岭回归函数,它有一些自带的属性与方法


(1)实现(realization)

import pandas as pd
import numpy as np
from matplotlib import pyplot
from sklearn import linear_model
import matplotlib.pyplot as plt
import math
from sklearn.metrics import mean_squared_error
from sklearn.base import clone
def E_in(theta,phi,y):result = 0.5*np.linalg.norm(np.dot(theta,phi)-y)return result
def E_aug(lambda_1, y, X, w):result = 0.5*np.linalg.norm(np.dot(w,X)-y)+lambda_1*np.linalg.norm(X)return result
if __name__ == '__main__':df_train = pd.read_table("crime-train.txt")df_test = pd.read_table("crime-test.txt")column = df_train.shape[1] #96row = df_train.shape[0] #1595MSE_SET =[]C_SET = []MSE_min = 0Y_train = df_train['ViolentCrimesPerPop'] # Y (2)buffer_train = df_train.copy(deep = True) # Xfeature_train = buffer_train.drop('ViolentCrimesPerPop',axis = 1) #(2)Y_test = df_test['ViolentCrimesPerPop']#(2)buffer_test = df_test.copy(deep = True)feature_test = buffer_test.drop('ViolentCrimesPerPop',axis = 1) #(2)alpha_set = []coef_set = []index = []reg_clone = linear_model.Ridge()MSE_min = 0for i in range(10):#10-foldalpha_val = math.pow(10,-i)alpha_set.append(alpha_val)index.append(i)for j in range(10):global X,Y#stest_set = df_train.sample(frac = 0.1,axis = 0) # get 0.1 part oftest_set = df_train[159*i:159*(i+1)]train_set = df_train[~df_train.index.isin(test_set.index)]test_buffer = test_set.copy(deep = True)Y_test = test_set['ViolentCrimesPerPop']feature_test = test_buffer.drop('ViolentCrimesPerPop',axis = 1) train_buffer = train_set.copy(deep =True)Y_train = train_set['ViolentCrimesPerPop']feature_train = train_buffer.drop('ViolentCrimesPerPop',axis = 1)reg = linear_model.Ridge(alpha = alpha_val)reg.fit(feature_train,Y_train)predict = reg.predict(feature_test)if i==0:MSE_min = mean_squared_error(Y_test,predict)Y = Y_trainX = feature_trainelse:MSE_buffer = mean_squared_error(Y_test,predict)if(MSE_buffer<MSE_min):MSE_min = MSE_bufferY = Y_trainX = feature_trainMSE_SET.append(MSE_min)   plt.figure(figsize=(8,6))plt.axes(xscale = "log") #plt.axes(yscale = "log") plt.plot(alpha_set,MSE_SET)plt.xlabel('lambda')  plt.ylabel('MSE') plt.show()#and we can recover the best hypothesis from reg_clone using X,Y to fit

先用pd将数据load进来

在划分出训练集,这里我用的是dataframe自带的sample函数,注释掉的部分是均匀选取的,虽然两者都不会重复,但是第一种随机性更强一点。

最后得到两张图(第一张均匀,第二张随机)
在这里插入图片描述

(2)实现

这里我们就不用了这么麻烦了,直接规规矩矩训练出来然后在test set上测试就行了

import pandas as pd
import numpy as np
from matplotlib import pyplot
from sklearn import linear_model
import matplotlib.pyplot as plt
import math
from sklearn.metrics import mean_squared_error
from sklearn.base import clone
def E_in(theta,phi,y):result = 0.5*np.linalg.norm(np.dot(theta,phi)-y)return result
def E_aug(lambda_1, y, X, w):result = 0.5*np.linalg.norm(np.dot(w,X)-y)+lambda_1*np.linalg.norm(X)return result
if __name__ == '__main__':df_train = pd.read_table("crime-train.txt")df_test = pd.read_table("crime-test.txt")column = df_train.shape[1] #96row = df_train.shape[0] #1595MSE_SET =[]C_SET = []MSE_min = 0Y_train = df_train['ViolentCrimesPerPop'] # Y (2)buffer_train = df_train.copy(deep = True) # Xfeature_train = buffer_train.drop('ViolentCrimesPerPop',axis = 1) #(2)Y_test = df_test['ViolentCrimesPerPop']#(2)buffer_test = df_test.copy(deep = True)feature_test = buffer_test.drop('ViolentCrimesPerPop',axis = 1) #(2)alpha_set = []coef_set = []index = []for i in range(10):alpha_val = math.pow(10,-i) #也可以-i+5alpha_set.append(alpha_val)reg = linear_model.Ridge(alpha=alpha_val)reg.fit(feature_train,Y_train)predict = reg.predict(feature_test)MSE = mean_squared_error(predict,Y_test)MSE_SET.append(MSE)plt.figure(figsize=(8,6))plt.axes(xscale = "log") plt.plot(alpha_set,MSE_SET)         plt.xlabel('lambda')  plt.ylabel('MSE') plt.show()

输出

放大(也可以将指数平移),得到拥有最好performance的lambda为10

(3)实现

我们设定一个阈值,数训练出的w内小于阈值的个数

import pandas as pd
import numpy as np
from matplotlib import pyplot
from sklearn import linear_model
import matplotlib.pyplot as plt
import math
from sklearn.metrics import mean_squared_error
from sklearn.base import clone
def E_in(theta,phi,y):result = 0.5*np.linalg.norm(np.dot(theta,phi)-y)return result
def E_aug(lambda_1, y, X, w):result = 0.5*np.linalg.norm(np.dot(w,X)-y)+lambda_1*np.linalg.norm(X)return result
if __name__ == '__main__':df_train = pd.read_table("crime-train.txt")df_test = pd.read_table("crime-test.txt")column = df_train.shape[1] #96row = df_train.shape[0] #1595MSE_SET =[]C_SET = []MSE_min = 0Y_train = df_train['ViolentCrimesPerPop'] # Y (2)buffer_train = df_train.copy(deep = True) # Xfeature_train = buffer_train.drop('ViolentCrimesPerPop',axis = 1) #(2)Y_test = df_test['ViolentCrimesPerPop']#(2)buffer_test = df_test.copy(deep = True)feature_test = buffer_test.drop('ViolentCrimesPerPop',axis = 1) #(2)alpha_set = []coef_set = []index = []threshold = 2e-02num = []for i in range(10):alpha_val = math.pow(10,-i+5)#alpha_val = ialpha_set.append(alpha_val)reg = linear_model.Ridge(alpha=alpha_val)reg.fit(feature_train,Y_train)num = [i for i in reg.coef_ if i<threshold]coef_set.append(len(num)) #(3)plt.figure(figsize=(8,6))  plt.axes(xscale = "log")       plt.plot(alpha_set,coef_set)plt.xlabel('lambda')  plt.ylabel('coef') plt.show()

结语

综上所述,详见sklearn官网,大家多练习

本文标签: 机器学习