본문 바로가기
Career/TIL

2024-07-14

by 5ole 2024. 7. 14.

 

<데이터 가공을 위한 SQL 값 조작>

 

1. 코드 값을 레이블로 변환 : CASE 식

SELECT
	user_id
    , CASE
    	WHEN register_device = 1 THEN '데스크톱'
        WHEN register_device = 2 THEN '스마트폰'
        WHEN register_device = 3 THEN '애플리케이션'
        ELSE '' -- 디폴트값 의미
        END AS device_name
 FROM mst_users
 ;

 

 

2. 지정한 값의 날짜/시각 데이터 추출하기

-> 날짜 자료형과 타임스탬프 자료형으로 변환

 

미들웨어에 따라 다양한 방법 : CAST 함수가 가장 범용적

 

SELECT

-- PostgreSQL, Hive, Redshift, Bigquery, SparkSQL 모두 CAST(value AS type) 사용 가능
	CAST('2016-01-30' AS date) AS dt
	, CAST('2016-01-30 12:00:00' AS timestamp) AS stamp

-- PostgreSQL, Hive, Redshift, Bigquery, SparkSQL 모두 type value 사용 가능
	date '2016-01-30' AS dt
	, timestamp '2016-01-30 12:00:00' AS stamp
    
-- Hive, Bigquery 사용
	date('2016-01-30') AS dt
    , timestamp('2016-01-30 12:00:00') AS stamp
    
-- PostgreSQL, Redshift 'value::type' 사용
	'2016-01-30'::date AS dt
    , '2016-01-30 12:00:00' ::timestamp AS stamp

 

 

타임스탬프 자료형에서 연,월,일,시간 값을 추출할 때 -> EXTRACT 함수 사용 (hive, SparkSQL 제외)

SELECT
-- PostgreSQL, Redshift, Bigquery : EXTRACT 함수 사용하기
	EXTRACT(YEAR FROM stamp) AS year
    , EXTRACT(MONTH FROM stamp) AS month
    , EXTRACT(DAY FROM stamp) AS day
    , EXTRACT(HOUR FROM stamp) AS hour
    
-- Hive, SparkSQL 함수 사용하기
	YEAR(stamp) AS year
    , MONTH(stamp) AS month
    , DAY(stamp) AS day
    , HOUR(stamp) AS hour
    
 FROM 
 	(SELECT CAST('2016-01-30 12:00:00' AS timestamp) AS stamp) AS t
  ;

 

 

문자열에서 연,월,일,시간 추출 -> SUBSTRING

SELECT
	stamp
-- PostgreSQL, Hive, Redshift, Bigquery : SUBSTRING 함수 사용하기
	substring(stamp, 1, 4) AS year
    , substring(stamp, 6, 2) AS month
    , substring(stamp, 9, 2) AS day
    , substring(stamp, 12, 2) AS hour
    , substring(stamp, 1, 7) AS year_month

-- PostgreSQL, Hive, Bigquery, SparkSQL : SUBSTR 함수 사용하기
	SUBSTR(stamp, 1, 4) AS year
    , SUBSTR(stamp, 6, 2) AS month
    , SUBSTR(stamp, 9, 2) AS day
    , SUBSTR(stamp, 12, 2) AS hour
    , SUBSTR(stamp, 1, 7) AS year_month
    
 FROM 
-- PostgreSQL, Redshift CAST 방식 : text
 	(SELECT CAST('2016-01-30 12:00:00' AS text) AS stamp) AS t
-- Hive, Bigquery, SparkSQL CAST 방식 : string
 	(SELECT CAST('2016-01-30 12:00:00' AS string) AS stamp) AS t
;

 

 

3. NULL 값 대체하기 : COALESCE 함수

 

- 문자열 + NULL = NULL

- 숫자 + NULL = NULL

--NULL값을 방지하기 위해 0으로 대치

SELECT
	purchase_id
    , amount - COALESCE(coupon,0) AS discount_amount2
FROM
	purchase_log_with_coupon
;

 

 

 

출처

[한빛미디어] 데이터 분석을 위한 SQL 레시피 - 가사키 나가토, 다미야 나오토, 윤인성 옮김

 

'Career > TIL' 카테고리의 다른 글

2024-07-12  (0) 2024.07.13
2024-07-08  (0) 2024.07.09

댓글