Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About ArticlesStock — AI & Technology Journalist
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    A Coding Implementation to Grasp GPU Computing with CuPy, Customized CUDA Kernels, Streams, Sparse Matrices, and Profiling

    Naveed AhmadBy Naveed Ahmad15/05/2026Updated:15/05/2026No Comments2 Mins Read
    blog11 1 7


    header("6. RAW CUDA KERNEL — MANDELBROT")
    mandel = cp.RawKernel(r'''
    extern "C" __global__
    void mandel(float xmin, float xmax, float ymin, float ymax,
               int W, int H, int max_iter, int* out) {
       int ix = blockDim.x * blockIdx.x + threadIdx.x;
       int iy = blockDim.y * blockIdx.y + threadIdx.y;
       if (ix >= W || iy >= H) return;
       float cx = xmin + (xmax - xmin) * ix / (W - 1);
       float cy = ymin + (ymax - ymin) * iy / (H - 1);
       float zx = 0.f, zy = 0.f;
       int it = 0;
       whereas (zx*zx + zy*zy < 4.f && it < max_iter) {
           float t = zx*zx - zy*zy + cx;
           zy = 2.f*zx*zy + cy;
           zx = t; ++it;
       }
       out[iy*W + ix] = it;
    }
    ''', 'mandel')
    W, H, ITER = 1024, 1024, 400
    img = cp.zeros((H, W), dtype=cp.int32)
    threads = (16, 16)
    blocks = ((W + 15)//16, (H + 15)//16)
    mandel(blocks, threads,
          (cp.float32(-2.0), cp.float32(1.0),
           cp.float32(-1.5), cp.float32(1.5),
           W, H, ITER, img))
    cp.cuda.Stream.null.synchronize()
    print(f"Mandelbrot finished. max iter reached={int(img.max())}")
    plt.determine(figsize=(6,6))
    plt.imshow(cp.asnumpy(cp.log1p(img)), cmap='twilight_shifted', extent=[-2,1,-1.5,1.5])
    plt.title("Mandelbrot set — computed with a CuPy RawKernel")
    plt.axis('off'); plt.present()
    header("7. CUDA STREAMS")
    s1, s2 = cp.cuda.Stream(non_blocking=True), cp.cuda.Stream(non_blocking=True)
    with s1:
       a1 = cp.random.rand(2000, 2000, dtype=cp.float32)
       b1 = cp.random.rand(2000, 2000, dtype=cp.float32)
       c1 = a1 @ b1
    with s2:
       a2 = cp.random.rand(2000, 2000, dtype=cp.float32)
       b2 = cp.random.rand(2000, 2000, dtype=cp.float32)
       c2 = a2 @ b2
    s1.synchronize(); s2.synchronize()
    print(f"Stream-1 imply={float(c1.imply()):.4f}")
    print(f"Stream-2 imply={float(c2.imply()):.4f}")



    Source link

    Naveed Ahmad

    Naveed Ahmad is a technology journalist and AI writer at ArticlesStock, covering artificial intelligence, machine learning, and emerging tech policy. Read his latest articles.

    Related Posts

    Greatest AI Brokers for Software program Improvement Ranked: A Benchmark-Pushed Take a look at the Present Discipline

    15/05/2026

    Mira Murati Desires Her AI to ‘Preserve People within the Loop’

    15/05/2026

    What occurs when AI begins constructing itself?

    15/05/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.