'펭귄컴퓨팅' 카테고리의 글 목록

Turris Omnia의 OpenWRT 설치 및 복구 방법

펭귄컴퓨팅/임베디드 리눅스 / 2024. 2. 19. 20:18

CZNIC 라는 체코 회사에서 만든 Turris Omnia, 오픈소스 OpenWRT 기반으로 만들어진 매우 훌륭한 라우터이지만, 이미 몇 년전에 회사가 망했는지? 더이상 업데이트도 안되고 있어서, 결국 OpenWRT 펌웨어로 바꿔서 사용해야 한다.

https://openwrt.org/toh/turris/turris_omnia

[OpenWrt Wiki] Turris CZ.NIC Omnia

U-Boot 2015.10-rc2 (Aug 18 2016 - 20:43:35 +0200), Build: jenkins-omnia-master-23 SoC: MV88F6820-A0 Watchdog enabled I2C: ready SPI: ready DRAM: 2 GiB (ECC not enabled) Enabling Armada 385 watchdog. Disabling MCU startup watchdog. Regdomain set to ** MMC:

openwrt.org

0. 먼저 해당장비의 U-Boot를 업데이트 해줘야 한다.

Update U-Boot if needed

Log into the factory OS (TurrisOS), and take note of the U-Boot version installed on your device: strings /dev/mtd0 | grep “U-Boot 20”. Alternatively, watch the serial console when booting.

Only if you have a very old Turris Omnia with U-Boot 2015.10-rc2:

Make sure that you are running TurrisOS >= 5.2, and install the turris-nor-update package.
Execute nor-update, to bring U-Boot to a more recent version (which supports OpenWrt's boot script).
After rebooting, check the U-Boot version again. It should be at least U-Boot 2019.07.

https://repo.turris.cz/hbs/medkit/omnia-medkit-latest.tar.gz

※ Omnia 복구모드 설명

https://docs.turris.cz/hw/omnia/rescue-modes/

1. USB를 이용한 Stock 펌웨어 복구 모드

https://www.youtube.com/watch?v=ZrWzpsxqaRU

가장 최신?의 Stock 펌웨어 omnia-medkit-latest.tar.gz 파일을 FAT32로 포맷된 USB 장치의 root 영역에 복사한 다음,

장비의 USB 포트에 꽂고나서, 위 복구 동영상과 같이, 장비 후면의 reset 버튼을 길게 누르고, 전면 LED가 4개 켜지는 순간( LED 숫자 2 까지 불이 켜지면,) reset 버튼에서 손을 뗀다. 그럼 USB 장치에 있는 펌웨어로 장비가 자동으로 복구된다.

2. Rescue Shell 모드를 이용한 OpenWRT 펌웨어 설치

https://downloads.openwrt.org/releases/23.05.2/targets/mvebu/cortexa9/openwrt-23.05.2-mvebu-cortexa9-cznic_turris-omnia-sysupgrade.img.gz

1) 상위와 같은 최신 버전의 펌웨어 파일을 컴퓨터로 다운로드한다. 그리고, 해당 파일을 gunzip으로 압축을 풀어서, openwrt-23.05.2-mvebu-cortexa9-cznic_turris-omnia-sysupgrade.img 파일로 만듭니다.

2) img 파일을 FAT32로 포맷된 USB 장치의 root 영역에 복사한 다음, 장비의 USB 포트에 꽂고나서, 장비 후면의 reset 버튼을 길게 누르고, 전면 LED가 5개 켜지는 순간( LED 숫자 3 까지 불이 켜지면,) reset 버튼에서 손을 뗀다.

3) 장비가 2) 과정에 의해 rescue shell 모드가 되면, 노트북의 IP를 192.168.1.2/24로 설정하고, 장비에 192.168.1.1 로 접속한다. 이때, 반드시 랜 케이블은 장비의 LAN 포트 4번에 연결하여야 한다. (※ 다른 포트에 연결하면 안된다!!)

ssh root@192.168.1.1 명령어로 접속할때, 키 교환 에러가 발생하면,

ssh -v -oHostKeyAlgorithms=+ssh-rsa root@192.168.1.1

명령어로 접속하면 된다.

4) rescue shell로 접속하면, 아래 명령어로 먼저 usb 장치를 마운트한다.

mkdir mnt

mount /dev/sda1 /mnt

5) 다음 명령어로 OpenWRT 펌웨어를 설치한다.

dd if=/mnt/openwrt-23.05.2-mvebu-cortexa9-cznic_turris-omnia-sysupgrade.img of=/dev/mmcblk0 bs=4096 conv=fsync

Posted by 훅크선장

, |

Python + Playwright로 웹스크래핑(WebScraping) 예제 (Async 방식)

펭귄컴퓨팅/프로그래밍 / 2023. 8. 29. 16:15

import os
import asyncio
from playwright.async_api import async_playwright
from datetime import datetime

# 현재 실행 디렉토리를 가져온다.
current_dir = os.getcwd()
# 디렉토리의 맨 마지막에 / 구분자의 존재 여부를 확인하여, 항상 / 로 끝나도록 만든다.
if current_dir[-1] != '/':
current_dir = current_dir + '/'
# 현재 실행 디렉토리를 출력한다.
#print(current_dir)

# 비동기 실행 메인 함수
async def main():
    # 현재 날짜와 시간을 얻어오고, 년월일 시분초 형식으로 구성한다.
    current_datetime = datetime.now().strftime("%Y-%m-%d %H-%M-%S")
    #print("Current date & time : ", current_datetime)
    # 년월일 시분초 형식으로 문자열을 생성한다.
    str_current_datetime = str(current_datetime)

    # 비동기 playwright 객체를 가지고 실행
    async with async_playwright() as p:
            browser = await p.chromium.launch(headless=True) # 브라우저 생성
            page = await browser.new_page() # 브라우저의 페이지 생성

            await page.goto("https://hook.tistory.com/", timeout=10000) # ※ 접속불가시 timeout 에러 발생!! 에러처리 필요

            # 접속된 페이지의 스크린샷을 뜨고, 지정된 파일이름으로 현재 실행디렉토리 밑에 생성한다.
            await page.screenshot(path=current_dir + 'capture/'+ f'example-chromium-{str_current_datetime}.png')
            print(await page.title())

            #print(await page.locator(
            #       "//div[@class='entry']/div[@class='titleWrap']/div[@class='info']/span[@class='date']"
            #       ).all_text_contents())
            #print(await page.locator(
            #       "//div[@class='entry']/div[@class='titleWrap']/h2/a"
            #       ).all_text_contents())
            #print(await page.locator(
            #       "//div[@class='entry']/div[@class='titleWrap']/h2/a"
            #       ).get_attribute('href').all())
            for a_loc in await page.locator("//div[@class='entry']/div[@class='titleWrap']").all():
                print(await a_loc.locator("//h2/a").text_content())
                print(await a_loc.locator("//div[@class='info']/span[@class='date']").text_content())
                print(await a_loc.locator("//h2/a").get_attribute('href'))
                print("----------------------------------------------------------------------------------------")

            await browser.close() # 브라우저를 종료

# 비동기 실행
asyncio.run(main())

Posted by 훅크선장

, |

Python + Playwright로 웹스크래핑(WebScraping) 예제 (Sync 방식)

펭귄컴퓨팅/프로그래밍 / 2023. 8. 29. 16:07

"""
블로그에 있는 글의 제목들을 추출하는 xpath 구문
page.locator("//div[@class='entry']/div[@class='titleWrap']/h2/a").all_text_contents()

블로그에 있는 글의 작성일자를 추출하는 xpath 구문
page.locator("//div[contains(@class,'entry')]/div[@class='titleWrap']/div[@class='info']/span[@class='date']").first.text_content()
page.locator("//div[@class='entry']/div[@class='titleWrap']/div[@class='info']/span[@class='date']").last.text_content()
page.locator("//div[@class='entry']/div[@class='titleWrap']/div[@class='info']/span[@class='date']").all_text_contents()

블로그에 있는 글의 작성일자를 추출하는 css 구문
page.locator('div:nth-child(3) > div.titleWrap > div > span.date').text_content()

블로그에 있는 글의 개별 링크 URL들을 추출하는 xpath 구문
link_locators = page.locator("//div[@class='entry']/div[@class='titleWrap']/h2/a").all()
for l_loc in link_locators:
print(l_loc.get_attribute('href'))
print(l_loc.text_content())

※ locator() 내부에서 명시적으로 css= xpath= 를 삽입할 수 있지만, 꼭 쓸 필요는 없다.
page.locator("xpath=/html/body/div[1]/div/div[2]/div[2]/div/div[3]/div[1]/div/span[2]").text_content()
"""

from playwright.sync_api import Playwright, sync_playwright, expect
import os
from datetime import datetime

def run(playwright: Playwright) -> None:

# Get Current Working Directory
current_dir = os.getcwd()
if current_dir[-1] != '/':
current_dir = current_dir + '/'
#print(current_dir)

# Get Current Date and Time
current_datetime = datetime.now().strftime("%Y-%m-%d %H-%M-%S")
#print("Current Date & Time : ", current_datetime)
# Convert datetime obj to string
str_current_datetime = str(current_datetime)

## 브라우저가 화면에 나타나지 않도록 headless옵션을 켜고, 크롬브라우저를 사용합니다.
## headless=False 이면, 브라우저가 화면에 나타납니다.
browser = playwright.chromium.launch(headless=True, channel="chromium")
context = browser.new_context()

## 브라우저로 웹페이지를 실행합니다
page = context.new_page()

## 아래 URL 주소로 이동합니다.
page.goto("https://hook.tistory.com/")

## 웹 페이지의 스크린샷을 뜬다. path= 파라미터를 사용하여 저장경로를 별도로 지정한다.
#page.screenshot(path=current_dir + 'capture/'+ f'screenshot-{str_current_datetime}.png')

entry_locators = page.locator("//div[@class='entry']/div[@class='titleWrap']").all()
for a_loc in entry_locators:
print(a_loc.locator("//h2/a").text_content())
print(a_loc.locator("//div[@class='info']/span[@class='date']").text_content())
print(a_loc.locator("//h2/a").get_attribute('href'))
print("----------------------------------------------------------------------------------------")

## 잠시 중지
#page.pause()

# 브라우저 종료
context.close()
browser.close()

# 주 실행함수
with sync_playwright() as playwright:
run(playwright)

Posted by 훅크선장

, |

FTP 서버 복구했습니다.

펭귄컴퓨팅/라이브씨디 / 2023. 1. 6. 21:28

가상머신이 보편화된 요즘 IT 환경에서, 몇 년전에 만들었던 Live CD들이 쓸모가 있을까? 라는 생각이 들었지만...

그래도 만들어진 것들이 나름대로 의미가 있기에, FTP 서버를 복구하였습니다.

많은 ISO 파일들이 있으니, 한꺼번에 받지 마시고, 필요한 것만 받아가시기 바랍니다.

FTP 서버: hook7346.ignorelist.com

FTP 포트: 21번

FTP 계정: kali2ko

FTP 패스워드는 퀴즈입니다. => 영문자판상태에서 한글로 치시면 됩니다.

1) 패스워드는 한글 4글자와 특수문자 1개로 구성되어 있습니다.

2) 패스워드의 첫 두 글자는 우리나라 문자 이름입니다. 이미 1) 문장에서 나온 단어입니다.

3) 패스워드의 세번째와 네번째 두 글자는 3.1절에 외치던 문장에 있는 두 글자입니다.

“대한 독립 OO”에서 OO에 들어가는 단어입니다.

4) 마지막에 들어가는 특수문자는 3)의 단어뒤에 자주 붙는 특수문자입니다. 느낌이 오시나요? (이거 자주 헷갈리시는데, 쩜은 아닙니다. 느낌입니다.)

다시한번 말씀드리지만, 영문자판상태에서 한글로 치시면 됩니다.

한글 4글자와 특수문자 1개로 구성되어 있습니다.(영문자로는 총 12글자입니다.)

퀴즈가 안풀리시는 분은 아래 그림 화면에서 터미널창에 입력된 명령어를 주목하시면 됩니다.

--------------------------------------------------------------------------------------------------------------

Posted by 훅크선장

, |

Pycharm 에서 원격 anaconda 환경을 인터프리터로 사용하는 방법

펭귄컴퓨팅/프로그래밍 / 2022. 9. 30. 16:33

아래와 같이 Pycharm 개발환경에서는 로컬 컴퓨터의 anaconda 같은 가상환경이나 원격 ssh 로그인을 하는 interpreter 환경은 바로 지원하지만, 원격 ssh 상태에서 anaconda 환경을 사용할 수 있는 방법은 없는 것으로 알려져 있다.

https://youtrack.jetbrains.com/issue/PY-35978

No way to activate conda environment for remote interpreter : PY-35978

What steps will reproduce the problem? 1. Create a new Remote (SSH) Interpreter 2. Point it to the python executable of an existing conda env. on a remote server (ᐸenv_pathᐳ/bin/python) 3. Try to run some code using this interpreter. What is the expect

youtrack.jetbrains.com

그러나, 위와 같이 별도 스크립트 파일을 통해서, 원격 anaconda 환경을 인터프리터로 사용할 수 있다.

원격 ssh 로그인 환경에서 아래와 같은 쉘 스크립트 파일을 하나 생성한다. 파일명을 python 이라고 해도 되고, 다른 이름을 원하는 대로 생성하면 된다. 여기서 아래 스크립트 파일명은 python_env 이다.

#!/usr/bin/env bash

source /home/jetbrains/miniconda3/etc/profile.d/conda.sh
conda activate py_35978
python "$@"

위의 경로명에서 /home/jetbrains/miniconda3 부분은 자신의 ssh 로그인 환경에 맞게 수정한다.

py_35978 이라는 가상환경 명도 자신의 환경에 맞게 수정한다.

※ 참고로 나는 추가적으로 python "$@" 줄을 python3 "$@"로 수정했다.

ssh 로그인 상태에서 chmod +x python_env 명령으로, 실행파일로 생성한다.

./python_env 명령을 실행하였을 때, 파이썬 쉘 환경이 되어야만 정상적인 것이다.

PyCharm에서 새로운 프로젝트를 생성하고, SSH 인터프리터를 원격 ssh 서버로 지정한 다음, Interpreter 를 기존의 /usr/bin/python 이 아니라, /home/jetbrains/python_env 와 같이, 앞에 생성한 쉘 스크립트의 정확한 경로명과 파일명을 적어준다.

새로운 프로젝트에서 Terminal 창을 열고서, 다음과 같은 파이썬 코드를 입력한다.
import os
print(os.environ.get('CONDA_PREFIX'))

아마 결과가 다음과 같이 출력되면, 정상적인 설정이 된 것이다.
/home/jetbrains/miniconda3/envs/py_35978

Posted by 훅크선장

, |

Apache Ambari 2.7.5 Build on Mac OS X (맥에서 암바리 2.7.5 빌드 과정)

펭귄컴퓨팅/프로그래밍 / 2021. 8. 5. 16:16

1. vagrant 설치 : with Brew on Mac OS X
user@Mac ~ % /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

user@Mac ~ % brew tap hashicorp/tap
user@Mac ~ % brew install vagrant

※ vagrant 설치 : CentOS 7 Linux
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
sudo yum -y install vagrant

2. ambari-vagrant git clone
user@Mac ~ % git clone https://github.com/u39kun/ambari-vagrant.git

3. /etc/hosts 호스트 추가(설치할 os 버전 부분만)
user@Mac ~ % cat ambari-vagrant/append-to-etc-hosts.txt
※ vagrant의 모든 호스트가 들어있는 파일로, 여기서는 # centos 7.4 부분만 필요하다.

user@Mac ~ % sudo vi /etc/hosts
...
# centos 7.4
192.168.74.101 c7401.ambari.apache.org c7401
192.168.74.102 c7402.ambari.apache.org c7402
192.168.74.103 c7403.ambari.apache.org c7403
192.168.74.104 c7404.ambari.apache.org c7404
192.168.74.105 c7405.ambari.apache.org c7405
192.168.74.106 c7406.ambari.apache.org c7406
192.168.74.107 c7407.ambari.apache.org c7407
192.168.74.108 c7408.ambari.apache.org c7408
192.168.74.109 c7409.ambari.apache.org c7409
192.168.74.110 c7410.ambari.apache.org c7410
...

4. vagrant private key 생성
user@Mac ~ % vagrant ~/.vagrant.d/insecure_private_key

5. pre-configured VM 사용하기
user@Mac ~ % cd ambari-vagrant/centos7.4/
user@Mac ~ % vi Vagrantfile

(dev-bootstrap.sh이 있는 line 29 주석 해제)
...

c7401.vm.provision :shell, :path => "dev-bootstrap.sh"

...

6. dev-bootstrap.sh 파일 수정
※ ambari build 를 위한 적절한 버전의 소프트웨어 사용 설정
user@Mac ~ % vi dev-bootstrap.sh

...
wget https://nodejs.org/dist/v8.11.2/node-v8.11.2-linux-x64.tar.gz
tar zxvf node-v8.11.2-linux-x64.tar.gz
mv node-v8.11.2-linux-x64 /usr/share/node
/usr/share/node/bin/npm install -g brunch
...
wget https://archive.apache.org/dist/maven/maven-3/3.5.3/binaries/apache-maven-3.5.3-bin.tar.gz
tar zxvf apache-maven-3.5.3-bin.tar.gz
mv apache-maven-3.5.3 /usr/share/maven
...

7.VM 생성
user@Mac ~ % cd

user@Mac ~ % cd ambari-vagrant/centos7.4/
user@Mac ~ % cp ~/.vagrant.d/insecure_private_key .
user@Mac ~ % vagrant up c7401

※ vagrant VM을 전체 삭제하려면,
user@Mac ~ % vagrant destroy -f

6.VM 접속
user@Mac ~ % vagrant ssh c7401

7. root 유저로 변환
[vagrant@c7401 ~]$ sudo su -

8. Ambari 2.7.5.0 버전 다운로드 및 mvn 버전 세팅
# wget https://www-eu.apache.org/dist/ambari/ambari-2.7.5/apache-ambari-2.7.5-src.tar.gz
# tar xfvz apache-ambari-2.7.5-src.tar.gz
# cd apache-ambari-2.7.5-src
# mvn versions:set -DnewVersion=2.7.5.0.0
# pushd ambari-metrics
# mvn versions:set -DnewVersion=2.7.5.0.0
# popd

9. Ambari rpm build

1) 개발 패키지 설치

# yum install python-devel.x86_64

2) ambari-admin 폴더의 pom.xml 수정

# cd ambari-admin/

# vi pom.xml

...

...

# cd ..

3) ambari-metrics 폴더의 pom.xml 수정 (line 42-50)

# cd ambari-metrics/

# vi pom.xml

...

    <hbase.tar>https://archive.apache.org/dist/hbase/2.0.2/hbase-2.0.2-bin.tar.gz</hbase.tar>
    <hbase.folder>hbase-2.0.2</hbase.folder>
    <hadoop.tar>https://archive.apache.org/dist/hadoop/common/hadoop-3.1.4/hadoop-3.1.4.tar.gz</hadoop.tar>
    <hadoop.folder>hadoop-3.1.4</hadoop.folder>
    <hadoop.version>3.1.4</hadoop.version>
    <grafana.folder>grafana-6.7.4</grafana.folder>
    <grafana.tar>https://dl.grafana.com/oss/release/grafana-6.7.4.linux-amd64.tar.gz</grafana.tar>
    <phoenix.tar>https://downloads.apache.org/phoenix/apache-phoenix-5.0.0-HBase-2.0/bin/apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz</phoenix.tar>
    <phoenix.folder>apache-phoenix-5.0.0-HBase-2.0-bin</phoenix.folder>

...

# cd ..

4) ambari-metrics/ambari-metrics-timelineservice 폴더의 pom.xml 수정 (line 929)

# cd ambari-metrics/ambari-metrics-timelineservice/

# vi pom.xml

...

file="${project.build.directory}/embedded/${phoenix.folder}/phoenix-5.0.0-HBase-2.0-server.jar

...

# cd ..

5) Build

# mvn -B clean install rpm:rpm -DnewVersion=2.7.5.0.0 -DbuildNumber=5895e4ed6b30a2da8a90fee2403b6cab91d19972 -DskipTests -Dpython.ver="python >= 2.6" -Drat.numUnapprovedLicenses=600

[INFO] Scanning for projects...
...

...

[INFO] Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.W4O5wJ
[INFO] + umask 022
[INFO] + cd /root/apache-ambari-2.7.5-src/ambari-infra/ambari-infra-manager-it/target/rpm/ambari-infra-manager-it/BUILD
[INFO] + /usr/bin/rm -rf /root/apache-ambari-2.7.5-src/ambari-infra/ambari-infra-manager-it/target/rpm/ambari-infra-manager-it/buildroot
[INFO] + exit 0
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Ambari Main 2.7.5.0.0 .............................. SUCCESS [  4.795 s]
[INFO] Apache Ambari Project POM .......................... SUCCESS [  0.172 s]
[INFO] Ambari Web ......................................... SUCCESS [ 54.736 s]
[INFO] Ambari Views ....................................... SUCCESS [  2.374 s]
[INFO] Ambari Admin View .................................. SUCCESS [  6.410 s]
[INFO] ambari-utility 1.0.0.0-SNAPSHOT .................... SUCCESS [ 33.500 s]
[INFO] ambari-metrics ..................................... SUCCESS [  1.068 s]
[INFO] Ambari Metrics Common .............................. SUCCESS [  9.894 s]
[INFO] Ambari Metrics Hadoop Sink ......................... SUCCESS [ 10.069 s]
[INFO] Ambari Metrics Flume Sink .......................... SUCCESS [  2.123 s]
[INFO] Ambari Metrics Kafka Sink .......................... SUCCESS [  2.115 s]
[INFO] Ambari Metrics Storm Sink .......................... SUCCESS [  3.453 s]
[INFO] Ambari Metrics Storm Sink (Legacy) ................. SUCCESS [  3.210 s]
[INFO] Ambari Metrics Collector ........................... SUCCESS [07:29 min]
[INFO] Ambari Metrics Monitor ............................. SUCCESS [  2.178 s]
[INFO] Ambari Metrics Grafana ............................. SUCCESS [  9.604 s]
[INFO] Ambari Metrics Host Aggregator ..................... SUCCESS [ 22.365 s]
[INFO] Ambari Metrics Assembly ............................ SUCCESS [03:35 min]
[INFO] Ambari Service Advisor 1.0.0.0-SNAPSHOT ............ SUCCESS [ 27.628 s]
[INFO] Ambari Server ...................................... SUCCESS [  01:15 h]
[INFO] Ambari Functional Tests ............................ SUCCESS [  2.111 s]
[INFO] Ambari Agent ....................................... SUCCESS [-58.-229 s]
[INFO] ambari-logsearch ................................... SUCCESS [  1.735 s]
[INFO] Ambari Logsearch Appender .......................... SUCCESS [  4.152 s]
[INFO] Ambari Logsearch Config Api ........................ SUCCESS [  0.403 s]
[INFO] Ambari Logsearch Config JSON ....................... SUCCESS [  0.398 s]
[INFO] Ambari Logsearch Config Solr ....................... SUCCESS [ 12.952 s]
[INFO] Ambari Logsearch Config Zookeeper .................. SUCCESS [  1.690 s]
[INFO] Ambari Logsearch Config Local ...................... SUCCESS [  0.197 s]
[INFO] Ambari Logsearch Log Feeder Plugin Api ............. SUCCESS [  9.075 s]
[INFO] Ambari Logsearch Log Feeder Container Registry ..... SUCCESS [  9.950 s]
[INFO] Ambari Logsearch Log Feeder ........................ SUCCESS [01:01 min]
[INFO] Ambari Logsearch Web ............................... SUCCESS [01:14 min]
[INFO] Ambari Logsearch Server ............................ SUCCESS [  02:27 h]
[INFO] Ambari Logsearch Assembly .......................... SUCCESS [  4.325 s]
[INFO] Ambari Logsearch Integration Test .................. SUCCESS [02:39 min]
[INFO] ambari-infra ....................................... SUCCESS [  1.036 s]
[INFO] Ambari Infra Solr Client ........................... SUCCESS [ 12.429 s]
[INFO] Ambari Infra Solr Plugin ........................... SUCCESS [ 29.353 s]
[INFO] Ambari Infra Manager ............................... SUCCESS [01:34 min]
[INFO] Ambari Infra Assembly .............................. SUCCESS [  1.477 s]
[INFO] Ambari Infra Manager Integration Tests 2.7.5.0.0 ... SUCCESS [ 51.914 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:53 h
[INFO] Finished at: 2021-06-30T04:58:54Z
[INFO] ------------------------------------------------------------------------

Posted by 훅크선장

, |

Scrapy 에서 파일을 자동으로 다운로드하게 하는 방법

펭귄컴퓨팅/프로그래밍 / 2021. 4. 12. 15:05

참조 1: www.programmersought.com/article/36393758843/
참조 2: stackoverflow.com/questions/47031394/scrapy-file-download-how-to-use-custom-filename
https:stackoverflow.com/questions/54737465/unable-to-rename-downloaded-images-through-pipelines-without-the-usage-of-item-p/54741148#54741148
참조 3: docs.scrapy.org/en/latest/topics/media-pipeline.html
참조 4: coderecode.com/download-files-scrapy/
참조 5: heodolf.tistory.com/19

www.hanumoka.net/2020/07/11/python-20200711-python-scrapy-selenium/

iosroid.tistory.com/28

scrapy 프로젝트를 생성하고 나서, 다음의 과정을 수행한다.

$ scrapy startproject apt_collection

$ scrapy genspider -t basic reports_spider github.com

1. items.py 파일에 아이템을 추가한다.
디폴트로 생성되어 있는 아이템 클래스를 그대로 이용할 수도 있고, 새로운 아이템 클래스를 생성해도 된다.
보통 디폴트 아이템 클래스는 아무것도 없다.

class AptCollectionItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()

pass

그래서, 새로운 아이템 클래스를 생성한다.
반드시 필요한 항목은 files와 file_urls 이다.
일반파일을 다운로드하려면 FilesPipeline을 이용하는 것이고, 이미지 파일을 다운로드 하려면 ImagesPipeline을 이용하는 것이다.
현재는 일반파일을 다운로드 하는 것이므로, 아래는 FilesPipeline을 이용하는 경우를 설명한다.

class reportItem(scrapy.Item):
    title = scrapy.Field()   # 제목
    date = scrapy.Field()  # 날짜
    publisher = scrapy.Field()  # 작성회사명
    url = scrapy.Field() # 원본 URL
    files = scrapy.Field() # file_urls에 지정된 파일들에 대한 다운로드 결과가 저장되는 항목으로, 파일 1개에 대해서, url, path, checksum, status 가 dictionary 형태로 저장된다. ex) [{'url': 'http://.....', 'path': 'abcdefg.pdf', 'checksum': '2f88dd877d61e287fc1135a7ec0a2fa5', 'status': 'downloaded'}]

file_urls = scrapy.Field() # URL을 기준으로 scrapy가 다운로드 저장할 파일의 목록 ※ 반드시 List 자료형으로 넘겨줘야 함!!

files와 file_urls를 scrapy.Field()로 정의하면 된다.

※ files와 file_urls 이름을 바꾸고 싶다면, settings.py에서 아래와 같이 정의하면 된다.

#DEFAULT_FILES_URLS_FIELD = 'file_urls'
#DEFAULT_FILES_RESULT_FIELD = 'files'

ImagesPipeline을 이용하는 경우라면, images와 image_urls 명칭을 사용한다.

image_urls = scrapy.Field()

images = scrapy.Field()

images와 image_urls 명칭을 바꾸고 싶다면, settings.py에서 아래와 같이 정의하면 된다.
IMAGES_URLS_FIELD = 'field_name_for_your_images_urls'
IMAGES_RESULT_FIELD = 'field_name_for_your_processed_images'

2. settings.py에 있는 ITEM_PIPELINES 설정을 해준다.

robots.txt 방어가 있을 수 있으니,

ROBOTSTXT_OBEY = False

로, 설정을 바꾸는 것을 잊지 않도록 한다.

디폴트의 ITEM_PIPELINES은 다음과 같다.
#ITEM_PIPELINES =

# 'apt_collection.pipelines.AptCollectionPipeline': 300,
#}

또는,

ITEM_PIPELINES = {

'scrapy.pipelines.images.ImagesPipeline': 1,

'scrapy.pipelines.files.FilesPipeline': 1,

}

이렇게 되어 있을 것인데, 이를 파일을 다운로드하도록, 바꾸어줘야 한다.

ITEM_PIPELINES = {

'apt_collection.pipelines.AptCollectionPipeline': 1

}

FILES_STORE = r'./download' # 파일이 저장되는 경로를 지정

이런 식으로 바꾸던가, 아니면,

ITEM_PIPELINES = {

'my.pipelines.MyFilesPipeline': 200,

'scrapy.pipelines.files.FilesPipeline': None,

}

기존에 있는 FilesPipeline 들은 모두 None 처리해주거나, 아예 지우는 것이 매우 중요하다.
FILES_STORE 변수를 지정해서, 다운로드 되는 파일들이 저장되는 경로를 지정할 수 있다. 항상 하는 것이 좋다.

3. pipelines.py에 있는 파이프라인을 재정의한다.
디폴트로 생성되는 파이프라인은 다음과 같이 되어 있다.

class AptCollectionPipeline:
def process_item(self, item, spider):

return item

이것을 다음과 같이, FilesPipeline 이나 ImagesPipeline 을 가지도록 다시 써야한다.
Class 라인에서 Pipeline 클래스 이름 옆에 반드시 (FilesPipeline)이나 (ImagesPipeline) 구문이 추가되어야 한다.
FilesPipeline과 ImagesPipeline 모듈 임포트도 해준다.

from scrapy.pipelines.files import FilesPipeline
from scrapy.pipelines.files import ImagesPipeline

class AptCollectionPipeline(FilesPipeline):
    # 디폴트 처리를 주석처리하고, 파일명을 정해진 규칙대로 지정하기 위한 Override 함수 정의
    def file_path(self, request, response=None, info=None, *, item=None):
        file_name: str = request.url.split("/")[-1] # 파일 수집 URL에 파일명이 있는 경우에는 URL 에서 파일명을 추출해서 사용

#file_name : str = request.meta['filename'] # 메타 정보를 이용하여, 이전 request 함수에서 파일명을 전달받은 수 있음

#file_name: str = item['filename']  # 추출되는 Item의 항목에 별도의 파일명을 지정할 수 있다면, 그 파일명을 가져와, 저장하는 파일 이름으로 사용 가능(파일의 저장 경로는 settings.py에 설정되어 있음!)
        # print("filename :  ", file_name)
        return file_name

Pipeline을 재정의하면서, 반드시 필요한 파일명 지정에 대한 함수를 Override 하게 해야 한다.

파일명을 spider 함수쪽에서 item 항목으로 전달하거나, 또는 meta 항목에 넣어서 전달할 수도 있다.
아니면, request.url 쪽의 내용을 파싱해서 사용하는 방법도 있다.
또는, 다운로드 대상 파일 URL을 수집요청하는 호출 쪽에서 meta 값을 다음과 같이 정의해서 넘겨주면 된다.

meta = {'filename': item['name']}
yield Request(url=file_url, meta=meta)

※ get_media_requests 함수도 Override 해야한다는 의견도 있는데, 확인해보니 각 Item에 수집되는 파일이 1개 인 경우에는 아무 상관이 없었다. 또한 한 Item에서 수집되는 파일이 여러개이더라도, 파일명을 위의 file_path 함수에서와 같이 url로부터 추출되는 경우라면 이상이 없었다. 그러나 한 Item에서 여러개의 파일이 수집되어야 하고, 파일명을 별도로 지정해야 하는 경우에는 get_media_requests 함수에서 파일명을 file_path로 넘겨줘야만 하므로, 다음과 같이 get_media_requests 함수를 재정의해서 추가해야 한다.
※ v2.3 버전에서는 item 이 파이프라인으로 정상적으로 넘어오지 않는 것 같은 현상이 있다. get_media_requests 함수를 반드시 정의해서 meta 값을 Request 인자로 사용해야만 했다. v2.4에서는 get_media_requests 함수가 없이 file_path 만 재 정의하여도 잘 동작하였다.
def get_media_requests(self, item, info):
for idx, file_url in enumerate(item['file_urls']): # 수집되어야할 파일들의 URL은 반드시 List 자료형으로 입력된다.
#print("filename : ", item['file_names'][idx])
yield scrapy.Request(file_url, meta={'filename': item['file_names'][idx]}) # 각각 다운받을 파일 URL에 대한 요청을 하면서, meta 딕셔너리 값의 하나로 파일명 리스트에서 순번에 맞게 파일명을 넘겨준다. 이것때문에 이 함수를 재정의하였다.

※ 파일이 다운로드 되고나서, 처리를 해주는 함수도 있다. (아마 파일이 정상적이지 않은지 확인하는 방법으로 유용할 듯 하다.)
from itemadapter import ItemAdapter
from scrapy.exceptions import DropItem

def item_completed(self, results, item, info):
    file_paths = [x['path'] for ok, x in results if ok]
    if not file_paths:
        raise DropItem("Item contains no files")
    adapter = ItemAdapter(item)
    adapter['file_paths'] = file_paths
    return item

4. spider 함수에서 item을 생성하면, Scrapy에 의해 파일이 자동으로 수집된다.

            item['file_urls'] = [url_1, url_2, url_3] # URL을 기준으로 scrapy가 다운로드 저장할 파일의 목록 ※ 반드시 List 자료형으로 넘겨줘야 함!!
            yield item

5. 만약 crawl 하는 과정에서 파일 다운로드가 되지 않고, 302 에러가 발생한다면, 파일 다운로드 URL이 redirect되는 경우이므로,

settings.py 파일안에, 아래의 변수를 추가하면 해결된다.

MEDIA_ALLOW_REDIRECTS = True

Posted by 훅크선장

, |

CodeWarrior TAP , Pin map of CWH-CTP-COP-YE

펭귄컴퓨팅/임베디드 리눅스 / 2020. 12. 23. 14:14

www.nxp.com/design/software/development-software/codewarrior-development-tools/run-control-devices/codewarrior-tap:CW_TAP

CodeWarrior® TAP | NXP

Distributor Name Region Inventory Inventory Date Upon selection of a preferred distributor, you will be directed to their web site to place and service your order. Please be aware that distributors are independent businesses and set their own prices, terms

www.nxp.com

NXP 사의 JTAG Debugger인 CodeWarrior TAP 장비에는 반드시 케이블을 연결해야만, 장비 디버깅이 가능하다.

왜냐하면, CodeWarrior TAP 장비에 연결포트는 30pin 이고, 우리가 일반적인으로 보고 있는 디버깅 대상 장비(표준 개발 장비)는 보통 20pin 이나 16pin과 같은 표준 JTAG 포트를 쓰기 때문이다.

무슨 이유인지 몰라도, CodeWarrior TAP 장비의 전용 케이블들은 본 장비와 별도로 구매해야 하고, 가격도 70달러 즈음이라 안 살 수가 없다는 것이 날벼락이다.

표준 JTAG 포트와 같이 pin map을 알려주면, JTAG 포트가 표준이 아닌 장비에도 JTAG 케이블 없이 바로 장비 포트에 연결해서 사용할 있을 것 같은데... 케이블 팔아서 부자되려고 하는지? 30pin의 pin map을 공개하지 않고 있다. (아무리 찾아봐도 안보인다!!)

결국 NXP가 판매하는 전용 케이블, 16pin JTAG 케이블인 CWH-CTP-COP-YE 를 구해서, Pin Map을 그려보았다.

www.nxp.com/design/software/development-software/codewarrior-development-tools/run-control-devices/power-architecture-processor-cop-probe-tips-for-codewarrior-tap:CWH-CTP-COP-YE

Pin Map of CWH-CTP-COP-YE :

Posted by 훅크선장

, |

ESP32 프로그래밍

펭귄컴퓨팅/프로그래밍 / 2020. 9. 11. 21:38

Arduino에 비해, 성능이 훨씬 낫다고 하는 ESP32를 배워보고 있다.

참고할만한 정보를 모아봤다.

1. 맥에서 프로그래밍 환경 설정

docs.espressif.com/projects/esp-idf/en/latest/esp32/get-started/macos-setup.html#install-prerequisites

Standard Setup of Toolchain for Mac OS - ESP32 - — ESP-IDF Programming Guide latest documentation

docs.espressif.com

2. ESP32로 IoT 장치를 만드는 정보가 모아진 곳

esp32.net/

The Internet of Things with ESP32

Created by Espressif Systems, ESP32 is a low-cost, low-power system on a chip (SoC) series with Wi-Fi & dual-mode Bluetooth capabilities! The ESP32 family includes the chips ESP32-D0WDQ6 (and ESP32-D0WD), ESP32-D2WD, ESP32-S0WD, and the system in package (

esp32.net

3. ESP32 개발보드 공부를 시작하기 좋은 곳

randomnerdtutorials.com/getting-started-with-esp32/

Getting Started with the ESP32 Development Board | Random Nerd Tutorials

This is a getting started guide for the ESP32 Development board. The ESP32 is the ESP8266 sucessor. Loaded with new features: WiFi, Bluetooth, dual core, ...

randomnerdtutorials.com

4. ESP32에서 별도의 UART 송수신 예제인 UART_ECHO 실험을 할때, 문제가 되는 부분인 가상 에뮬레이터 설정에 관련된 내용이 맞게 쓰여진 참고 사이트 (일본어)

tech.yoshimax.net/archives/107

[ESP32-EVB][esp-idf][UART] サンプルの uart_echo を試してみる – yoshimax::tech

ESP32-EVBを使って、esp-idfのuart_echoサンプルを試してみたいと思います。 esp-idf/examples/peripherals/uart_echo/main/uart_echo_example_main.c サンプルをコピーします $ cp -r esp-idf/examples/peripherals/uart_echo ./ $ cp -r

tech.yoshimax.net

5. ESP32 교육을 듣게된 계기가 된 온라인 워크숍의 자료 공유 장소인 밴드

band.us/band/80474328/attachment

(온라인 워크숍) ESP32 (초급+중급 과정) | 밴드

밴드는 그룹 멤버와 함께 하는 공간입니다. 동호회, 스터디, 주제별 모임을 밴드로 시작하세요.

band.us

6. 실습에 사용된 ESP32보드와 센서 쉴드 구매한 곳

mechasolution.com/shop/goods/goods_view.php?goodsno=577245&category=

mechasolution.com/shop/goods/goods_view.php?goodsno=583929&category=

(센서 쉴드만 사면 된다. 코어보드 살 필요 없다!!)

전자부품 전문 쇼핑몰 메카솔루션입니다.

국내 최대 전자부품 쇼핑몰, 아두이노 키트, 라즈베리파이 등 당일발송, 예제 제공, 쇼핑 그 이상을 제공합니다.

mechasolution.com

Posted by 훅크선장

, |

CentOS 7에서 GO 언어 환경 설치하기 (Vim 8 + Vim-go + YCM)

펭귄컴퓨팅/프로그래밍 / 2020. 5. 7. 13:21

Go 언어 연습을 위해서, CentOS 7 Linux 환경에서 Go 언어 환경을 설치해 본 것입니다.

0. GoLang 설치하기

# cd

# wget https://dl.google.com/go/go1.14.2.linux-amd64.tar.gz

# tar -zxvf go1.14.2.linux-amd64.tar.gz

# mv go/ /usr/local/

GoLang 사용을 위한 환경변수 설정

# cd

# vi .bashrc

다음을 추가한다.

export PATH=/usr/local/go/bin:$PATH

Go 언어 테스트해보기

# vi helloworld.go

예제 파일 내용을 다음과 같이 만든다.

package main

import "fmt"

func main() {
fmt.Println("Hello World")
}

# go build helloworld.go

# ./helloworld

Hello World

1. 기존 vim 패치키 제거

# yum remove vim-common vim-enhanced vim-filesystem vim-minimal

2. 필요한 라이브러리 등 설치

# yum install gcc make ncurses ncurses-devel cmake

# yum install ctags git tcl-devel ruby ruby-devel lua lua-devel luajit luajit-devel python python-devel perl perl-devel perl-ExtUtils-ParseXS perl-ExtUtils-XSpp perl-ExtUtils-CBuilder perl-ExtUtils-Embed

3. vim 8 다운로드 및 빌드와 설치

# cd

# git clone https://github.com/vim/vim.git

# cd vim

# ./configure --with-features=huge --enable-multibyte --enable-rubyinterp --enable-python3interp --enable-perlinterp --enable-luainterp

# make

# make install

설치가 완료되면, 버전 확인하기

# vim --version

4. Pathogon은 vim 플러그인과 런타임 파일을 쉽게 관리 할 수 있도록 도와주는 vim 환경관리 툴이다. 먼저 pathogon을 설치한다.
~/.vim/autoload 디렉토리 밑에 vim-pathogon을 클론(clone)한다.

# mkdir -p ~/.vim/autoload

# cd ~/.vim/autoload
# git clone https://github.com/tpope/vim-pathogen.git
pathogen.vim을 ~/.vim/autoload 디렉토리에 복사한다.

# cp vim-pathogen/autoload/pathogen.vim ./

아래와 같이 vim-go 스크립트를 다운로드 한다.

# mkdir -p ~/.vim/bundle

# cd ~/.vim/bundle
# git clone https://github.com/fatih/vim-go.git

홈 디렉토리에 .vimrc 파일을 생성하고, 다음과 같이 작성한다.

# vi ~/.vimrc
execute pathogen#infect()
syntax on
filetype plugin indent on

vim을 실행하고 :GoInstallBinaries명령을 실행하면, vim-go 관련된 플러그인들이 자동으로 설치된다.

4. YouCompleteMe(이하 YCM)은 VIM을 위한 자동코드완성 엔진이다. YCM은 C, C++, Object-C, Object-C++, CUDA, Python2, Pyton3, C#, Go 등 다양한 언어에 대한 자동완성기능을 제공한다.
YCM을 클론하고 컴파일 한다. Go 자동완성을 지원하고 싶다면 --go-completer 를 컴파일 옵션으로 설정해야 한다.

# cd ~/.vim/bundle
# git clone https://github.com/Valloric/YouCompleteMe.git
# cd ~/.vim/bundle/YouCompleteMe
# git submodule update --init --recursive
# ./install.sh --go-completer

5.Tagbar
Tagbar 플러그인을 이용해서 현재 파일의 태그를 탐색해서, 코드의 대략적인 구조를 빠르게 살펴볼수 있다.

#cd ~/.vim/bundle
# git clone https://github.com/majutsushi/tagbar.git

6. NerdTree
NERDTree는 Vim용 파일 탐색기다. 이 플러그인은 디렉토리의 구조를 계층적으로 보여줘서, 파일을 쉽게 탐색하고 편집할 수 있도록 도와준다.

# cd ~/.vim/bundle
# git clone https://github.com/scrooloose/nerdtree.git

참조 사이트 :

https://golang.org/dl/

Downloads - The Go Programming Language

Downloads After downloading a binary release suitable for your system, please follow the installation instructions. If you are building from source, follow the source installation instructions. See the release history for more information about Go releases

golang.org

https://www.joinc.co.kr/w/man/12/golang/Start

golang 시작하기 - 개발환경 만들기

www.joinc.co.kr

http://www.programmersought.com/article/6244238683/

Centos7 install vim8.0 + YouCompleteMe + support python 3.6 - Programmer Sought

Install python3.6:https://blog.csdn.net/wanormi/article/details/82900782 Upgrade vim and gcc Upgrade gcc sudo yum install centos-release-scl -y sudo yum install devtoolset-3-toolchain -y sudo yum install gcc-c++ sudo scl enable devtoolset-3 bash Upgrade vi

www.programmersought.com

https://phoenixnap.com/kb/how-to-install-vim-centos-7

How to Install Vim 8.2 on CentOS 7? (Latest Version)

You don't need to wait for the latest version of Vim to appear in official repositories. In tutorial learn how to install Vim 8.2 on CentOS 7. Get Started!

phoenixnap.com

Posted by 훅크선장

, |

훅크선장의 전함

카테고리

달력

공지사항

태그목록

최근에 올라온 글

'펭귄컴퓨팅'에 해당되는 글 120건

Turris Omnia의 OpenWRT 설치 및 복구 방법

0. 먼저 해당장비의 U-Boot를 업데이트 해줘야 한다.

Update U-Boot if needed

1. USB를 이용한 Stock 펌웨어 복구 모드

2. Rescue Shell 모드를 이용한 OpenWRT 펌웨어 설치

Python + Playwright로 웹스크래핑(WebScraping) 예제 (Async 방식)

Python + Playwright로 웹스크래핑(WebScraping) 예제 (Sync 방식)

FTP 서버 복구했습니다.

Pycharm 에서 원격 anaconda 환경을 인터프리터로 사용하는 방법

Apache Ambari 2.7.5 Build on Mac OS X (맥에서 암바리 2.7.5 빌드 과정)

Scrapy 에서 파일을 자동으로 다운로드하게 하는 방법

CodeWarrior TAP , Pin map of CWH-CTP-COP-YE

ESP32 프로그래밍

CentOS 7에서 GO 언어 환경 설치하기 (Vim 8 + Vim-go + YCM)

최근에 달린 댓글

글 보관함

링크

티스토리툴바