Python으로 다층신경망 구현하기

Deep Learning/[Books] Do it! 정직하게 코딩하며 배우는 딥러닝 입문

Python으로 다층신경망 구현하기

Kellyyyy 2020. 10. 12. 08:00

이번 포스팅에서는 저번포스팅에서 다룬 가중치, 절편 도함수를 토대로 학습하는 다층신경망을 Python으로 구현해본다.

Dataset Load & Preprocessing

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

cancer = load_breast_cancer()
x = cancer.data
y = cancer.target
x_train_all, x_test, y_train_all, y_test = train_test_split(x,y,stratify=y, test_size=0.2, random_state=42)
x_train, x_val, y_train, y_val = train_test_split(x_train_all, y_train_all, stratify=y_train_all, test_size=0.2, random_state=42)

scaler = StandardScaler()
scaler.fit(x_train)
x_train_scaled = scaler.transform(x_train)
x_val_scaled = scaler.transform(x_val)

Model Building

class DualLayer() :
  def __init__(self,l1=0, l2=0, learning_rate=0.1, units=10) :
    self.w1 = None
    self.b1 = None
    self.w2 = None
    self.b2 = None
    self.a1 = None
    self.l1 = l1
    self.l2 = l2
    self.lr = learning_rate
    self.losses = []
    self.units = units
    self.val_losses = []

  def forpass(self, x) :
    z1 = np.dot(x,self.w1) + self.b1
    self.a1 = self.activation(z1)
    z2 = np.dot(self.a1, self.w2) + self.b2
    return z2
  
  def activation(self, z) :
    a = 1 / (1+np.exp(-z))
    return a
  
  def backprop(self,x,err) :
    m = len(x)
    w2_grad = np.dot(self.a1.T,err) / m
    b2_grad = np.sum(err) / m
    err_to_hidden = np.dot(err,self.w2.T) * self.a1 * (1-self.a1)
    w1_grad = np.dot(x.T, err_to_hidden) / m
    b1_grad = np.sum(err_to_hidden) / m
    return w1_grad, b1_grad, w2_grad, b2_grad

  def fit(self, x, y, x_val=None, y_val=None,epochs=100) :
    y = y.reshape(-1,1)
    m = len(x)
    self.init_weights(x.shape[1])
    y_val = y_val.reshape(-1,1)

    for i in range(epochs) :
      a = self.training(x,y,m)
      a = np.clip(a, 1e-10, 1-1e-10)
      loss = np.sum(-(y*np.log(a) + (1-y)*np.log(1-a)))
      self.losses.append((loss + self.reg_loss()) / m)
      self.update_val_loss(x_val, y_val)

  def update_val_loss(self, x_val, y_val) :
    z = self.forpass(x_val)
    a = self.activation(z)
    a = np.clip(a, 1e-10, 1-1e-10)
    val_loss = np.sum(-(y_val*np.log(a) + (1-y_val) * np.log(1-a)))
    self.val_losses.append((val_loss + self.reg_loss())/len(y_val))

  def reg_loss(self) :
    return self.l1 * (np.sum(np.abs(self.w1)) + np.sum(np.abs(self.w2))) + self.l2/2 * (np.sum(self.w1**2) + np.sum(self.w2**2))

  def init_weights(self, n_features) :
    self.w1 = np.ones((n_features, self.units))
    self.b1 = np.zeros(self.units)
    self.w2 = np.ones((self.units, 1))
    self.b2 = 0

  def training(self,x,y,m) :
    z = self.forpass(x)
    a = self.activation(z)
    err = -(y-a)
    w1_grad, b1_grad, w2_grad, b2_grad = self.backprop(x,err)
    
    w1_grad += (self.l1 * np.sign(self.w1) + self.l2 * self.w1) / m
    w2_grad += (self.l1 * np.sign(self.w2) + self.l2 * self.w2) / m

    self.w1 -= self.lr * w1_grad
    self.b1 -= self.lr * b1_grad
    self.w2 -= self.lr * w2_grad
    self.b2 -= self.lr * b2_grad

    return a

  def predict(self, x) :
    z = self.forpass(x)
    return z > 0

  def score(self,x,y) :
    return np.mean(self.predict(x) == y.reshape(-1,1))

L1, L2 규제를 적용할 때 m으로 나눠주는 이유는 손실함수 계산 시에 m으로 나눠주기 때문에 미분값인 규제 값에도 m이 나눠져야한다.

Model Fitting

duallayer = DualLayer(l2=0.01)
duallayer.fit(x_train_scaled, y_train, x_val_scaled, y_val, epochs=20000)
duallayer.score(x_val_scaled, y_val)
# 0.978021978021978

Loss Graph

plt.ylim(0,0.3)
plt.plot(duallayer.losses)
plt.plot(duallayer.val_losses)
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train_loss','val_loss'])
plt.show()

그래프를 보면 초기에 손실함수 그래프가 매우 불안정하다. 이는 가중치 초기화와 관련이 깊다. 지금까지는 가중치를 1로 두고 훈련을 시작했다. 이번에는 정규분포를 따르는 무작위 수로 가중치를 초기화하여 진행해보자.

Init Weights

class RandomInitNetwork(DualLayer) :
  def init_weights(self, n_features) :
    np.random.seed(42)
    self.w1 = np.random.normal(0,1, (n_features, self.units))
    self.b1 = np.zeros(self.units)
    self.w2 = np.random.normal(0,1,(self.units,1))
    self.b2 = 0

b1과 b2 행렬의 크기는 원래 각각 (샘플의 개수 * 은닉층 뉴런개수) , (샘플의 개수 * 1) 이지만 넘파이를 사용하면 절편을 더하는 계산을 위해 샘플의 개수 만큼 행렬을 만들어 줄 필요가 없다. 자동으로 행수 만큼 더해주기 때문이다.

Model Re-Fitting

random_init_net = RandomInitNetwork(l2=0.01) 
random_init_net.fit(x_train_scaled, y_train, x_val_scaled, y_val, epochs=500)
plt.ylim(0,0.7)
plt.plot(random_init_net.losses)
plt.plot(random_init_net.val_losses)
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train_loss','val_loss'])
plt.show()

Loss Graph After randomly Initiating Weights

가중치를 랜덤하게 초기화했더니 손실함수가 감소하는 곡선이 매우 매끄럽고, 가중치를 1로 초기화한 것보다 빠르게 손실함수 값이 줄어든 것을 확인할 수 있다.

Reference.

도서 <Do it! 정직하게 코딩하며 배우는 딥러닝 입문> 이지스 퍼블리싱, 박해선 지음

끝.

저작자표시 (새창열림)

'Deep Learning > [Books] Do it! 정직하게 코딩하며 배우는 딥러닝 입문' 카테고리의 다른 글

다중분류 다층신경망 구현하기 - 상 (0)	2020.10.19
미니배치경사하강법을 이용하는 로지스틱 회귀 모델 구현하기 (0)	2020.10.15
[모델 튜닝] K-폴드 교차검증 (1)	2020.10.04
[모델 튜닝]하는 법 2 - 가중치 제한(feat. L1, L2규제) (0)	2020.09.28
[모델 튜닝] 하는 방법 - 과대적합과 과소적합 (6)	2020.09.24

현재글Python으로 다층신경망 구현하기

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

HONG'S DB