전체 글

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference. - b4rtaz/distributed-llama

github.com

저작자표시 (새창열림)

7세대 사용된 지포스 940MX 파이토치 및 CUDA 호환성

나스닥171819 2025. 10. 24. 03:56

2025. 10. 24. 03:56

728x90

(e:\conda\aa) E:\aabb>python ddddd11cuda.py
e:\conda\aa\lib\site-packages\torch\cuda\__init__.py:283: UserWarning:
    Found GPU0 NVIDIA GeForce 940MX which is of cuda capability 5.0.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (6.1) - (9.0)

  warnings.warn(
e:\conda\aa\lib\site-packages\torch\cuda\__init__.py:304: UserWarning:
    Please install PyTorch with a following CUDA
    configurations:  12.6 following instructions at
    https://pytorch.org/get-started/locally/

  warnings.warn(matched_cuda_warn.format(matched_arches))
e:\conda\aa\lib\site-packages\torch\cuda\__init__.py:326: UserWarning:
NVIDIA GeForce 940MX with CUDA capability sm_50 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce 940MX GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(
Traceback (most recent call last):
  File "E:\aabb\ddddd11cuda.py", line 61, in <module>
    loss = criterion(output, y)
  File "e:\conda\aa\lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "e:\conda\aa\lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "e:\conda\aa\lib\site-packages\torch\nn\modules\loss.py", line 1310, in forward
    return F.cross_entropy(
  File "e:\conda\aa\lib\site-packages\torch\nn\functional.py", line 3462, in cross_entropy
    return torch._C._nn.cross_entropy_loss(
torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

저작자표시 (새창열림)

파이토치 강좌(다중 분류 계산이 제일 오래 걸린다.)

나스닥171819 2025. 10. 24. 03:12

2025. 10. 24. 03:12

728x90

파이토치 강좌

Python Pytorch 강좌 : 제 1강 - PyTorch 소개 및 설치 - YUN DAE HEE

Python Pytorch 강좌 : 제 1강 - PyTorch 소개 및 설치

PyTorch

076923.github.io

다중 분류(Multiclass Classification)Permalink

다중 분류(Multiclass Classification)란 규칙에 따라 입력된 값을 세 그룹 이상으로 분류하는 작업을 의미합니다.

구분하려는 결과가 A 그룹, B 그룹, C 그룹 등으로 데이터를 나누는 경우를 의미합니다.

하나의 특성(feature)이나 여러 개의 특성(feature)으로부터 나온 값을 계산해 각각의 클래스(Class)에 속할 확률을 추정합니다.

다중 분류는 소프트맥스 회귀(Softmax Regression)라고도 부르며, 소프트맥스 함수(Softmax Function)를 활용해 클래스에 포함될 확률을 계산합니다.

다중 분류 계산이 제일 오래 걸린다.

(sadtalker) C:\A>python ddddd11cuda.py
Epoch : 1000, Cost : 0.210
Epoch : 2000, Cost : 0.135
Epoch : 3000, Cost : 0.098
Epoch : 4000, Cost : 0.079
Epoch : 5000, Cost : 0.064
Epoch : 6000, Cost : 0.055
Epoch : 7000, Cost : 0.047
Epoch : 8000, Cost : 0.042
Epoch : 9000, Cost : 0.037
Epoch : 10000, Cost : 0.034
---------
tensor([[ 10.6537,   3.8069, -10.9961],
        [  9.7564,   3.1990,  -9.9373],
        [ -1.0024,   2.6155,  -0.7469],
        [ -0.5724,   2.0802,  -0.7521],
        [ -9.1612,   2.4963,   7.2355],
        [-17.3300,   1.4733,  15.3350],
        [  6.2296,   2.4642,  -6.8609],
        [ -1.3416,   3.0682,  -0.7330],
        [ -6.5406,   2.3072,   4.8822]], device='cuda:0')
tensor([[1.0000, 0.0000, 0.0000],
        [1.0000, 0.0000, 0.0000],
        [0.0300, 0.9400, 0.0300],
        [0.0600, 0.8900, 0.0500],
        [0.0000, 0.0100, 0.9900],
        [0.0000, 0.0000, 1.0000],
        [0.9800, 0.0200, 0.0000],
        [0.0100, 0.9700, 0.0200],
        [0.0000, 0.0700, 0.9300]], device='cuda:0')
tensor([0, 0, 1, 1, 2, 2, 0, 1, 2], device='cuda:0')
['acute triangle', 'acute triangle', 'right triangle', 'right triangle', 'obtuse triangle', 'obtuse triangle', 'acute triangle', 'right triangle', 'obtuse triangle']

저작자표시 (새창열림)

신경망 3DMM을 증명

나스닥171819 2025. 10. 24. 01:20

2025. 10. 24. 01:20

728x90

신경망 3DMM을 증명

신경망 3DMM을 증명해 준 NPHM의 저자와 데이터 세트 구성에 도움을 준 NerSemble의 저자에게 특별한 감사를 드립니다. 마지막으로 사전 훈련된 모델을 제공해 주신 Faceformer 작성자에게 감사의 말씀을 전하고 싶습니다.

tarun738/i3DMM

GitHub - tarun738/i3DMM

Contribute to tarun738/i3DMM development by creating an account on GitHub.

github.com

저작자표시 (새창열림)

PyTorch에서 GPU 여러 개 사용하기(1)

나스닥171819 2025. 10. 23. 23:38

2025. 10. 23. 23:38

728x90

PyTorch에서 GPU 여러 개 사용하기(1)

imagebatch size는 GPU 수보다 커야 한다.DistributedDataParallel은 torch.nn.DataParallel보다 상당히 빠르다. N개의 GPU가 있는 host에서 DistributedDataParallel를 사용하려면 N개의 프로세스가 있

velog.io

저작자표시 (새창열림)

PREV 이전 1 2 3 4 ···234 NEXT 다음

나스닥171819

전체 글

CPU 전용 로컬 LLM(GPU 없음) - 일본어 특화

CPU 전용 로컬 LLM(GPU 없음) - 일본어 특화

나스닥 25K 가 바닥

qwen3-30b-a3b 최소 64GB RAM GPU 24GB RAM 필요

분산 라마

7세대 사용된 지포스 940MX 파이토치 및 CUDA 호환성

파이토치 강좌(다중 분류 계산이 제일 오래 걸린다.)

다중 분류(Multiclass Classification)Permalink

다중 분류 계산이 제일 오래 걸린다.

신경망 3DMM을 증명

PyTorch에서 GPU 여러 개 사용하기(1)

+ Recent posts

티스토리툴바