Django 애플리케이션에서의 멀티 데이터베이스 전략 마스터하기

소개

오늘날처럼 빠르게 변화하는 디지털 세상에서는 웹 애플리케이션이 점점 더 증가하는 사용자 부하와 데이터 볼륨을 처리할 것으로 예상됩니다. 단일 모놀리식 데이터베이스는 종종 상당한 병목 현상이 되어 성능, 확장성, 심지어 가용성에도 영향을 미칩니다. 애플리케이션이 성장함에 따라 개발자는 느린 쿼리 실행, 리소스 경합, 읽기와 쓰기를 독립적으로 확장하지 못하는 등의 문제에 자주 직면하게 됩니다. 이것이 바로 읽기 복제 아키텍처 및 데이터 샤딩과 같은 정교한 데이터베이스 전략을 구현하는 것이 단순히 유익한 것을 넘어 필수적인 경우가 되는 이유입니다. 이 글에서는 강력하고 인기 있는 Python 웹 프레임워크인 Django가 개발자가 여러 데이터베이스를 효과적으로 구성하고 활용할 수 있도록 어떻게 지원하는지 살펴보고, 특히 이러한 일반적인 장애물을 극복하기 위해 읽기-쓰기 분리 및 데이터 파티셔닝을 달성하는 데 중점을 둡니다.

Django의 핵심 데이터베이스 개념

멀티 데이터베이스 설정에 대한 구현 세부 사항을 살펴보기 전에 Django의 멀티 데이터베이스 설정 이해에 중요한 몇 가지 기본 개념을 명확히 하겠습니다.

데이터베이스 라우터

Django 데이터베이스 라우터는 db_for_read, db_for_write, allow_relation, allow_migrate의 네 가지 메서드를 구현하는 클래스입니다. 이러한 메서드는 특정 작업에 사용할 데이터베이스를 지정하여 애플리케이션 로직, 모델 유형 또는 기타 기준에 따라 다른 데이터베이스로 쿼리를 라우팅할 수 있습니다. 이는 Django 프로젝트 내에서 프로그래밍 방식으로 여러 데이터베이스를 관리하는 초석입니다.

읽기 복제 (읽기-쓰기 분할)

이 전략은 모든 쓰기 작업(삽입, 업데이트, 삭제)을 처리하는 기본 데이터베이스(마스터)와 마스터에서 데이터를 동기화하고 읽기 작업(선택)을 처리하는 하나 이상의 보조 데이터베이스(복제본)를 갖는 것을 포함합니다. 이점은 애플리케이션 트래픽의 대부분을 차지하는 읽기 쿼리가 별도의 서버로 오프로드되어 마스터의 로드를 줄이고 전반적인 성능과 가용성을 개선한다는 것입니다.

데이터 샤딩 (데이터 파티셔닝)

샤딩은 대규모 데이터베이스를 더 작고 관리하기 쉬운 조각, 즉 샤드로 분할하는 기술입니다. 각 샤드는 총 데이터의 하위 집합을 포함하는 별도의 데이터베이스 인스턴스입니다. 데이터는 샤딩 키(예: 사용자 ID, 지리적 지역)를 기반으로 샤드에 분산됩니다. 이 전략은 단일 서버에 맞지 않는 매우 큰 데이터 세트를 처리할 때, 특히 수평적으로 확장하고, 부하를 분산하고, 단일 장애 지점을 피하기 위해 사용됩니다.

Django에서 멀티 데이터베이스 전략 구현

Django의 데이터베이스 라우터에 대한 유연성은 읽기 복제 및 샤딩 구현 모두에 매우 적합합니다.

1. 읽기-쓰기 분할

일반적인 시나리오부터 시작하겠습니다. 즉, 쓰기에는 default 데이터베이스를, 읽기에는 replica 데이터베이스를 사용하는 경우입니다.

1단계: settings.py에서 데이터베이스 구성

먼저 settings.py 파일에 데이터베이스를 정의합니다.

# myproject/settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'my_primary_db',
        'USER': 'db_user',
        'PASSWORD': 'db_password',
        'HOST': 'primary_db_host',
        'PORT': '5432',
    },
    'replica': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'my_primary_db', # Often the same name as primary
        'USER': 'db_user_read_only',
        'PASSWORD': 'db_password_read',
        'HOST': 'replica_db_host',
        'PORT': '5432',
        'OPTIONS': {'read_only': True}, # Optional, but good practice if supported by driver
    }
}

2단계: 데이터베이스 라우터 생성

이제 쓰기 작업은 default로, 읽기 작업은 replica로 지시하는 라우터를 만듭니다.

# myapp/db_routers.py
class PrimaryReplicaRouter:
    """
    A router to control all database operations for models.
    """
    route_app_labels = {'my_app', 'another_app'} # Define which apps this router considers

    def db_for_read(self, model, **hints):
        """
        Attempts to read my_app models go to replica.
        """
        if model._meta.app_label in self.route_app_labels:
            return 'replica'
        return 'default' # All other apps default to primary

    def db_for_write(self, model, **hints):
        """
        Attempts to write my_app models always go to default.
        """
        if model._meta.app_label in self.route_app_labels:
            return 'default'
        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        """
        Allow relations if both objects are in the same database.
        """
        if obj1._state.db == obj2._state.db:
            return True
        return None # Return None to defer to other routers

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        """
        Make sure the my_app apps only appear in the 'default' database.
        """
        if app_label in self.route_app_labels:
            return db == 'default' # Migrations for specified apps only on default
        return None # Return None to defer to other routers

3단계: settings.py에 라우터 등록

마지막으로 Django에게 새 라우터를 사용하도록 알립니다.

# myproject/settings.py
DATABASE_ROUTERS = ['myapp.db_routers.PrimaryReplicaRouter']

이 설정을 통해 my_app 모델을 쿼리할 때 replica 데이터베이스를 사용하게 되며, 변경 사항은 default로 전달됩니다. 특정 데이터베이스에 읽기 또는 쓰기를 명시적으로 강제하려면 Model.objects.using('database_name')을 사용할 수 있습니다.

2. 데이터 샤딩

샤딩을 구현하려면 일반적으로 데이터 조각이 속한 샤드를 결정하기 위한 더 복잡한 로직이 필요합니다. 사용자 ID를 기반으로 사용자가 샤딩되는 간단한 예제를 고려해 보겠습니다.

1단계: 샤드 데이터베이스 구성

각 샤드를 나타내는 여러 데이터베이스를 정의합니다.

# myproject/settings.py
DATABASES = {
    'default': { # Used for some global configurations or as a fallback
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'my_global_db',
        # ...
    },
    'shard_001': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'user_shard_1',
        # ...
    },
    'shard_002': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'user_shard_2',
        # ...
    },
    # ... potentially more shards
}

2단계: 샤딩 라우터 생성

이 라우터는 특정 사용자에게 올바른 샤드를 결정하는 전략이 필요합니다.

# myapp/db_routers.py
NUM_SHARDS = 2 # Define the number of shards
SHARD_MODELS = {'User', 'UserProfile', 'Order'} # Models to be sharded

class ShardRouter:
    """
    A router to control database operations for sharded models.
    """
    def _get_shard_for_user_id(self, user_id):
        """
        Simple sharding logic: user_id % NUM_SHARDS
        """
        return f'shard_{str(user_id % NUM_SHARDS + 1).zfill(3)}'

    def db_for_read(self, model, **hints):
        if model.__name__ in SHARD_MODELS:
            # How to get user_id here? This is the tricky part for sharding.
            # Often, you'll pass a 'shard_key' or 'user_id' via hints,
            # or rely on context in a request-response cycle.
            # For simplicity, let's assume `hints` might contain `instance`
            # or `shard_key` when called explicitly.
            # If not provided, you might default to a specific shard or raise an error.

            # Example: Explicitly passing shard_key when querying
            if 'shard_key' in hints:
                return self._get_shard_for_user_id(hints['shard_key'])
            
            # Example: If a model instance is passed (e.g., during save)
            if 'instance' in hints and hasattr(hints['instance'], 'user_id'):
                 return self._get_shard_for_user_id(hints['instance'].user_id)
            
            # Fallback or error if shard_key cannot be determined
            print(f"Warning: Shard key not provided for {model.__name__} in read operation. Defaulting to shard_001.")
            return 'shard_001' # Consider a more robust fallback or raise an exception

        return None # Defer to other routers or default

    def db_for_write(self, model, **hints):
        if model.__name__ in SHARD_MODELS:
            if 'shard_key' in hints:
                return self._get_shard_for_user_id(hints['shard_key'])

            if 'instance' in hints and hasattr(hints['instance'], 'user_id'):
                return self._get_shard_for_user_id(hints['instance'].user_id)
            
            print(f"Warning: Shard key not provided for {model.__name__} in write operation. Defaulting to shard_001.")
            return 'shard_001'
        return None

    def allow_relation(self, obj1, obj2, **hints):
        # Allow relations only if both objects are on the same shard or are not sharded models
        if obj1._meta.model.__name__ in SHARD_MODELS or obj2._meta.model.__name__ in SHARD_MODELS:
            return obj1._state.db == obj2._state.db
        return None

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        # Migrations for sharded models should only run on their respective shards.
        # This is highly dependent on how you manage schema.
        # Often, you'll run migrations globally or specifically for each shard's schema.
        # For simplicity, let's assume we run migrations on all shards that should contain these models.
        if model_name in SHARD_MODELS:
            return db.startswith('shard_') or db == 'default' # For models that might also live on default
        return None

3단계: 라우터 등록

읽기 복제 설정과 유사하게 샤딩 라우터를 등록합니다.

# myproject/settings.py
DATABASE_ROUTERS = ['myapp.db_routers.ShardRouter']

샤딩 라우터 사용:

샤딩의 까다로운 부분은 샤딩 키를 라우터로 가져오는 것입니다. 일반적으로 보기 또는 서비스 계층을 수정하여 샤딩 키를 명시적으로 제공하게 됩니다.

# myapp/views.py
from django.shortcuts import render
from .models import User

def get_user_data(request, user_id):
    # Pass user_id as a hint to the router
    user = User.objects.using(db_for_read=User, hints={'shard_key': user_id}).get(id=user_id)
    # ... and for writes
    # user.name = "New Name"
    # user.save(using=db_for_write=User, hints={'shard_key': user_id}) 
    return render(request, 'user_detail.html', {'user': user})

user_id를 해당 shard_key에 매핑하는 메커니즘이 필요합니다. Model.objects.using() 및 Model.save()의 경우 Django는 라우터에서 instance(쓰기용) 또는 자체와 명시적으로 제공하는 hints를 전달하여 db_for_read 또는 db_for_write를 호출합니다.

결론

Django에서 읽기 복제 아키텍처 및 데이터 샤딩과 같은 여러 데이터베이스 전략을 구현하는 것은 성장하는 애플리케이션의 확장성, 성능 및 복원력을 향상시키는 강력한 방법입니다. Django의 유연한 데이터베이스 라우터 시스템을 활용하여 개발자는 데이터의 저장 위치와 검색 위치를 정확하게 제어하여 세부적인 최적화를 수행할 수 있습니다. 읽기-쓰기 분할은 구현하기가 비교적 간단하지만, 데이터 라우팅 및 스키마 관리의 복잡성이 더해지는 데이터 샤딩은 신중한 설계가 필요한 복잡성을 야기합니다. 이러한 접근 방식은 올바르게 적용될 때 잠재적인 데이터베이스 병목 현상을 강력하고 확장 가능한 솔루션으로 전환합니다.