Chapter 6. Deploying your simple app

Notice

Recent Posts

Recent Comments

Link

LinkedIn Profile

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

이야기박스

Chapter 6. Deploying your simple app 본문

Computer & Data/Big Data

Chapter 6. Deploying your simple app

박스님 2020. 7. 30. 12:17

이번 챕터는 [Chapter 5. Building a simple app for deployment]의 지식이 필요합니다.

Deployment가 책의 앞부분에 나오는 이유는 요즘 흐름인 CI/CD에 맞추었기 때문입니다.

3가지 모드만 기억하면 됨

Local mode, which you are already familiar with through the examples in previous chapters
Cluster mode (more than one computer or node)
Interactive mode (through a shell)

위 각 모드에 대해서 인프라 아키텍쳐도 함께 알아야 함

Spark Component가 어떻게 구성되는지 확인

# Overview

Driver는

Link	Origin	Destination	Care level
1	Driver	Cluster manager/master	You do care about this link; your application connects to the master or cluster manager this way.
2	Cluster manager/master	Executor	This link establishes a connection between the workers and the master. The workers initiate the connection, but data is passed from the master to the workers. If this link is broken, your cluster manager will not be able to communicate with the executors.
3	Executor	Executor	Internal link between the executors; as developers, we do not care that much about it.
4	Executor	Driver	The executor needs to be able to get back to the driver, which means the driver cannot be behind a firewall (which is a rookie mistake when your first application tries to connect to a cluster in the cloud). If the executors cannot communicate with the driver, they will not be able to send data back.

Driver(Application)는 Cluster와 독립적인 프로세스

SparkSession은 이 프로세스에서 유니크 함. (local 모드이던 cluster 모드이던)

이렇게 클러스터와 연결될때, Spark는 Executor를 요청

요청받은 마스터는 가용한 리소스가 있는 노드에 작업 분할(Executor에 코드 배포)

## 제약 사항

1. Executor와 Driver 사이에는 방화벽이 열려 있어야 함. Driver는 Executors와 커넥션을 맺고 있어야 함

2. 각 어플리케이션은 각 Executor Process를 갖게 됨.

* Scheduling side : each driver schedules its own tasks

* Executor Side : tasks from different applications run in diffrent JVMs

--> Spark Application이 다르면 작업 중간에 데이터 내부 공유가 안됨 (외부 스토리지 쓰면 가능)

3. Spark != Cluster Manager

4. Driver는 Worker Node와 가깝게

# Deploy

단 한번만 배포하면 됨. 이후 Worker Node로 알아서 배포해줌

< JAR 기준 >

* uber JAR

* JAR app & using dependencies are on every worker node

* Clone & Pull from repository.

'Computer & Data > Big Data' 카테고리의 다른 글

Chapter 8. Ingestion from databases (0)	2020.08.06
Chapter 7. Ingestion from files (0)	2020.08.06
Chapter 5. Building a simple app for deployment (0)	2020.07.30
Chapter 2. Architecture and flow (0)	2020.07.16
Spark in action, 2nd edition study (0)	2020.07.16

'Computer & Data/Big Data' Related Articles

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

이야기박스

이야기박스

Chapter 6. Deploying your simple app 본문

Chapter 6. Deploying your simple app

'Computer & Data > Big Data' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역