[PyTorch] 모델 파라미터 초기화 하기 (parameter initialization)

Introduction

슬슬 딥러닝 프레임워크에 대해서도 포스트를 올리려 한다.

주로 쓰는 프레임워크는 파이토치(PyTorch) 이며, 간간히 유용하다 싶은 기능들을 찾으면 하나씩 정리해 올릴 생각이다.

새롭게 안 정보를 정리도 할 겸, 공유도 할 겸 적는 글들이니 편하게 봐 줬으면 한다.

이번에는 파이토치에서 모델의 파라미터를 초기화(initialize) 하는 방법에 대해 포스팅한다.

1. Weight Initialization Mechanism

사실 파이토치에서는 기본적인 모듈 클래스(Linear, ConvNd 등) 를 초기화 할 때, 자동으로 파라미터를 적절히 초기화 해 주고 있다. 하나의 예시로, nn.modules의 Linear 클래스의 이니셜라이저를 살펴 보자.

class Linear(Module):

    def __init__(self, in_features: int, out_features: int, bias: bool = True) -> None:
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.empty(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.empty(out_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

먼저 Module 슈퍼클래스를 초기화 한 후, 필요한 모듈 파라미터를 정의하고 나서 self.reset_parameters() 함수를 호출하고 있다. 참고로 코드 상의 torch.empty()는 인자를 shape로 가지는 zero 값 텐서를 반환한다. 이에 대해선 아래의 출력 결과를 참고하자.

import torch

print(repr(torch.empty(3, 3)))

>>
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

하지만 실제로 initialize만을 끝낸 Linear 클래스 인스턴스의 weight를 한번 출력해 보면, 이미 잘 초기화 된 것을 확인할 수 있다.

import torch

layer = torch.nn.Linear(3, 3)

print(repr(layer.weight))

>>
Parameter containing:
tensor([[-0.2833, -0.3393, -0.5608],
        [ 0.1036,  0.2007, -0.1846],
        [-0.3049, -0.2763, -0.5660]], requires_grad=True)

그렇다면, 결국 실질적인 모델 파라미터 초기화는 각 모듈 클래스에서 정의된 reset_parameters() 멤버 함수에서 진행된다는 것이다. 단순히 torch 모듈에 대해서 reset_parameters() 메소드만 호출해 주면 어렵지 않게 모듈 파라미터를 초기화 할 수 있다.

그렇다면 이 함수는 어떻게 생겼을까? 한번 열어보자.

def reset_parameters(self) -> None:
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
    if self.bias is not None:
        fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / math.sqrt(fan_in)
        init.uniform_(self.bias, -bound, bound)

참고로 Conv2d 클래스도 완전히 동일한 reset_parameters() 함수를 쓰고 있다.

코드를 살펴보면, torch.nn.init 모듈에 정의되어 있는 Kaiming initializer 함수를 이용해 weight를 초기화 하고 있는 것을 볼 수 있다. Bias 또한 적절한 방식으로 값을 초기화 하고 있다.

위의 코드에서 이해할 수 있듯이, 특별한 방식을 쓰고 싶은 것이 아니라면 기본적으로는 nn.init에 정의되어 있는 초기화 함수를 사용해 모델 컴포넌트의 파라미터를 임의로 초기화 할 수도 있다. 대부분의 함수는 인자로 weight 텐서를 입력받는다. (위에서 등장한 Parameter 클래스는 torch.Tensor의 subclass이다)

2. Examples

다음의 간단한 예제를 통해 더 알아보자. 이 예제는 이미 초기화가 된 Conv2d 모듈의 파라미터를 random uniform 함수로 다시 초기화 시켜주고 있다.

import torch

layer = torch.nn.Conv2d(1, 1, 2)
print(repr(layer.weight))

>>
Parameter containing:
tensor([[[[-0.4406,  0.1819],
          [-0.1550,  0.1970]]]], requires_grad=True)

torch.nn.init.uniform_(layer.weight, 0.5, 0.5)
print(repr(layer.weight))

>>
Parameter containing:
tensor([[[[0.5000, 0.5000],
          [0.5000, 0.5000]]]], requires_grad=True)

잘 작동하는지를 확인하기 위해, bottom과 top 바운더리 값이 동일한 세팅의 uniform distribution 으로 Conv2d 레이어의 weight를 초기화 시켰다. 출력 결과를 통해 원하는 대로 작동하는 것을 확인할 수 있다 (같은 값 대입을 위해 constant_ 를 대신 써도 된다). 참고로 처음 프린트 된 결과는 모듈 클래스에 내장된 초기화 함수가 적용된 결과이다.

nn.init 모듈에 정의된 다른 초기화 함수들도 동일한 방식으로 사용할 수 있다.

import torch

layer = torch.nn.Conv2d(1, 1, 2)

# Normal distribution
torch.nn.init.normal_(layer.weight)

# Xavier initialization
torch.nn.init.xavier_uniform_(layer.weight)

# Kaiming initialization
torch.nn.init.kaiming_uniform_(layer.weight)

혹은 특정한 값으로 weight를 초기화 하고 싶다면, 아래와 같이 직접 weight에 텐서를 대입해 주는 방법도 있다.

import torch

layer = torch.nn.Conv2d(1, 1, 2)

layer.weight.data = torch.nn.Parameter(
    torch.Tensor([[[[1.0, 2.0],
                    [3.0, 4.0]]]])
)
print(repr(layer.weight))

>>
Parameter containing:
tensor([[[[1., 2.],
          [3., 4.]]]], requires_grad=True)

3. Recursive Weight Initialization

마지막으로, 만약 nn.Sequential이나 nn.Module 내부의 모든 element에 대해서 recursive하게 weight를 초기화 하고 싶다면 torch.nn.module.apply() 함수를 이용하면 된다.

사용법은 간단하다. 모든 submodule에 일괄적으로 적용할 weight initialization 함수를 하나 정의한 후, 이것을 인자로 해서 apply() 멤버 함수를 호출해 주면 된다. 아래의 예제를 참고하자.

import torch


def weight_init_xavier_uniform(submodule):
    if isinstance(submodule, torch.nn.Conv2d):
        torch.nn.init.xavier_uniform_(submodule.weight)
        submodule.bias.data.fill_(0.01)
    elif isinstance(submodule, torch.nn.BatchNorm2d):
        submodule.weight.data.fill_(1.0)
        submodule.bias.data.zero_()


SequentialModel = torch.nn.Sequential(
        torch.nn.Conv2d(1, 1, 2),
        torch.nn.Conv2d(1, 1, 2),
        torch.nn.BatchNorm2d(1),
    )
SequentialModel.apply(weight_init_xavier_uniform)

for layer in SequentialModel:
    print(repr(layer.weight))

>>
Parameter containing:
tensor([[[[-0.7131, -0.0996],
          [ 0.3072, -0.6323]]]], requires_grad=True)
Parameter containing:
tensor([[[[ 0.1358,  0.0232],
          [-0.5050, -0.7775]]]], requires_grad=True)
Parameter containing:
tensor([1.], requires_grad=True)

모듈을 입력받아서 파라미터 초기화를 진행해 주는 함수를 맨 위에 정의하고, 이를 sequential 모델에서 호출하는 apply() 메소드의 인자로 넣어주고 있다. 위 예제에서처럼, submodule의 타입에 조건문을 걸어 모듈 타입에 따라 다른 초기화 방식을 적용하는 것도 가능하다는 것을 알아두자.

실제 적용 시에도 위와 같이 필요에 따라 초기화 함수를 작성한다면, 어렵지 않게 모델 파라미터를 원하는 방식대로 초기화 할 수 있을 것이다.

Summary

요약하자면,

torch.nn.init 모듈에 정의된 초기화 함수를 이용해 특정 모델 컴포넌트의 파라미터를 원하는 방식으로 초기화 시켜줄 수 있다.
nn.Module 이나 nn.Sequential 의 모든 submodule에 대해 recursive 하게 초기화를 적용하고 싶다면, 적절한 초기화 함수를 작성해 torch.nn.module.apply() 를 적용하자.
만약 초기화 방식이 따로 중요한 게 아니라면 단순히 모듈의 reset_parameters() 멤버 함수를 호출하자.

다음은 참고할 만한 페이지들이다.

References

Xavier Initialization

Kaiming Initialization

Pytorch torch.nn.init documentation