Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About ArticlesStock — AI & Technology Journalist
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    A Coding Implementation to Grasp GPU Computing with CuPy, Customized CUDA Kernels, Streams, Sparse Matrices, and Profiling

    Naveed AhmadBy Naveed Ahmad15/05/2026Updated:15/05/2026No Comments2 Mins Read
    blog11 1 7


    header("6. RAW CUDA KERNEL — MANDELBROT")
    mandel = cp.RawKernel(r'''
    extern "C" __global__
    void mandel(float xmin, float xmax, float ymin, float ymax,
               int W, int H, int max_iter, int* out) {
       int ix = blockDim.x * blockIdx.x + threadIdx.x;
       int iy = blockDim.y * blockIdx.y + threadIdx.y;
       if (ix >= W || iy >= H) return;
       float cx = xmin + (xmax - xmin) * ix / (W - 1);
       float cy = ymin + (ymax - ymin) * iy / (H - 1);
       float zx = 0.f, zy = 0.f;
       int it = 0;
       whereas (zx*zx + zy*zy < 4.f && it < max_iter) {
           float t = zx*zx - zy*zy + cx;
           zy = 2.f*zx*zy + cy;
           zx = t; ++it;
       }
       out[iy*W + ix] = it;
    }
    ''', 'mandel')
    W, H, ITER = 1024, 1024, 400
    img = cp.zeros((H, W), dtype=cp.int32)
    threads = (16, 16)
    blocks = ((W + 15)//16, (H + 15)//16)
    mandel(blocks, threads,
          (cp.float32(-2.0), cp.float32(1.0),
           cp.float32(-1.5), cp.float32(1.5),
           W, H, ITER, img))
    cp.cuda.Stream.null.synchronize()
    print(f"Mandelbrot finished. max iter reached={int(img.max())}")
    plt.determine(figsize=(6,6))
    plt.imshow(cp.asnumpy(cp.log1p(img)), cmap='twilight_shifted', extent=[-2,1,-1.5,1.5])
    plt.title("Mandelbrot set — computed with a CuPy RawKernel")
    plt.axis('off'); plt.present()
    header("7. CUDA STREAMS")
    s1, s2 = cp.cuda.Stream(non_blocking=True), cp.cuda.Stream(non_blocking=True)
    with s1:
       a1 = cp.random.rand(2000, 2000, dtype=cp.float32)
       b1 = cp.random.rand(2000, 2000, dtype=cp.float32)
       c1 = a1 @ b1
    with s2:
       a2 = cp.random.rand(2000, 2000, dtype=cp.float32)
       b2 = cp.random.rand(2000, 2000, dtype=cp.float32)
       c2 = a2 @ b2
    s1.synchronize(); s2.synchronize()
    print(f"Stream-1 imply={float(c1.imply()):.4f}")
    print(f"Stream-2 imply={float(c2.imply()):.4f}")



    Source link

    Naveed Ahmad

    Naveed Ahmad is a technology journalist and AI writer at ArticlesStock, covering artificial intelligence, machine learning, and emerging tech policy. Read his latest articles.

    Related Posts

    OpenAI is reportedly making ready authorized motion towards Apple; it would not be the primary companion to really feel burned

    15/05/2026

    Greatest AI Brokers for Software program Improvement Ranked: A Benchmark-Pushed Take a look at the Present Discipline

    15/05/2026

    Mira Murati Desires Her AI to ‘Preserve People within the Loop’

    15/05/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.