MLflow Autolog 확장하기: Custom Logging으로 세부 실험 데이터 관리

MLOps·머신러닝 운영/MLflow를 활용한 머신러닝 실험 관리

MLflow Autolog 확장하기: Custom Logging으로 세부 실험 데이터 관리

Data Jun 2025. 10. 21. 21:42

이번 포스팅에서는 MLflow를 활용해 랜덤포레스트 분류 모델을 학습하고,
autolog() 기능으로 모델과 실험 결과를 자동으로 추적하는 방법을 소개합니다.

1. with 구문을 활용한 실험 관리

MLflow에서 실험을 추적할 때는 일반적으로 with mlflow.start_run(): 구문을 사용합니다.
이 구문은 실험의 시작과 종료를 자동으로 관리하므로,
별도로 mlflow.end_run()을 호출할 필요가 없습니다.

n_estimator = 80
random_state = 1234

with mlflow.start_run():
    model = RandomForestClassifier(n_estimators=n_estimator, random_state=random_state)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    prf = precision_recall_fscore_support(y_test, y_pred, average="binary")

    mlflow.log_param("n_estimator", n_estimator)
    mlflow.log_metric("accuracy_on_test", accuracy)
    mlflow.log_metric("precision_on_test", prf[0])
    mlflow.log_metric("recall_on_test", prf[1])
    mlflow.log_metric("f1score_on_test", prf[2])
    mlflow.sklearn.log_model(model, "model")

✅ 핵심 포인트

with 블록이 끝나면 MLflow가 자동으로 end_run()을 호출합니다.
실험이 자동으로 종료되어 누락 없이 관리됩니다.
코드 가독성이 좋아지고 실험 관리가 훨씬 깔끔해집니다.

2. MLflow Sklearn Autolog 기능

mlflow.sklearn.autolog()을 사용하면 Scikit-Learn 모델의 훈련 과정과 결과를 자동으로 기록할 수 있습니다.
이 기능은 수동으로 log_param()이나 log_metric()을 호출하지 않아도 되므로, 코드가 훨씬 간결해집니다

# MLflow의 자동 로깅 기능 활성화
mlflow.sklearn.autolog()

# 모델 초기화
n_estimator = 77
random_state = 2222

model = RandomForestClassifier(n_estimators=n_estimator, random_state=random_state)

# 모델 학습
model.fit(X_train, y_train)

# 테스트 데이터 예측 및 평가
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
prf = precision_recall_fscore_support(y_test, y_pred, average='binary')

autolog()로 자동 기록되는 정보

모델의 하이퍼파라미터 (n_estimators, random_state, max_depth 등)
모델 구조 및 학습된 객체 (sklearn 모델 파일)
훈련 중의 메트릭 (예: train accuracy 등)

3. autolog로 수집되지 않는 지표 추가 로깅

autolog()은 편리하지만, 테스트 데이터 기반의 세부 성능 지표는 자동으로 기록되지 않습니다.
따라서, 아래와 같이 추가적인 metric 로깅을 직접 수행해야 합니다.

mlflow.sklearn.autolog()

with mlflow.start_run():
    n_estimator = 400 
    random_state = 7777
    max_depth = 2
    
    # 모델 학습
    model = RandomForestClassifier(
        n_estimators=n_estimator, random_state=random_state, max_depth=max_depth
    )
    model.fit(X_train, y_train)
    
    # 테스트 데이터 예측
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    prf = precision_recall_fscore_support(y_test, y_pred, average="binary")

    # autolog로 수집되지 않는 테스트 지표 추가 기록
    mlflow.log_metric("precision_on_test", prf[0])
    mlflow.log_metric("recall_on_test", prf[1])
    mlflow.log_metric("f1score_on_test", prf[2])
    mlflow.log_metric("accuracy_on_test", accuracy)

✅ 핵심 포인트

autolog()이 자동으로 기록하는 정보 외에,
테스트 데이터 기반 성능(precision, recall, f1-score, accuracy) 은 직접 log_metric()으로 기록해야 합니다.
이렇게 하면 MLflow UI에서 훈련 성능과 테스트 성능을 명확히 구분해서 관리할 수 있습니다.

4. MLflow UI에서 결과 확인하기

모델 학습이 완료되면, MLflow Tracking 서버(http://127.0.0.1:5000)에 접속해 결과를 시각적으로 확인할 수 있습니다.

UI에서 확인 가능한 항목:

자동으로 로깅된 하이퍼파라미터 및 모델 정보
훈련/테스트 데이터의 성능 지표
모델 파일 및 실행(run)별 상세 로그
각 run 간 성능 비교 (예: n_estimators 변화에 따른 성능 차이)

정리하면

mlflow.sklearn.autolog()을 사용하면
모델 학습, 파라미터 추적, 모델 저장까지 모든 과정을 한 줄로 자동화할 수 있습니다.
추가적으로 테스트 데이터 기반의 성능 지표만 별도로 기록해주면,
MLflow UI에서 실험 결과를 완벽하게 재현 가능하게 관리할 수 있습니다.

'MLOps·머신러닝 운영 > MLflow를 활용한 머신러닝 실험 관리' 카테고리의 다른 글

MLflow Run (0)	2025.10.22
Docker Compose로 MLflow + MySQL 환경 구축하기 (0)	2025.10.22
MLflow로 랜덤포레스트 분류 모델 자동 추적하기 (0)	2025.10.21
MLflow 설치와 환경 구축 실습 가이드 (0)	2025.10.20
MLflow란 무엇인가? (0)	2025.10.20

현재글MLflow Autolog 확장하기: Custom Logging으로 세부 실험 데이터 관리

Data Mastery: From Analysis to System De

grid-column #, span, grid-template-areas, 카페 매출 분석(배달 서비스), 커피박 프로젝트(서울 행정동 별 카페 매출), Fast_Campus, 호박너구리, grid-row #,

Today :
Yesterday :

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Data Mastery: From Analysis to System De