Current problem statement i am working on is as follows:
I will give an example of graph dataset to understand problem statement.
Graph has V (vertices) and E(set of edges).
ex. livejournal has #V = 4.8M, #E=57.7M (M stands for Million) Size=597MB/1.1GB (unweighted/weighted)
1GiB = 1024^3 B
1GB = 1000^3
Now fastest supercomputer today as of 2022 is Japnese Fugaku by Fujitsu.
It is exascale that means it can compute 10^18 FLOPS (1 exaFLOPS).
It has around 158k nodes and 32GiB per node of SDRAM memory. (roughly 34.35GB)
Now these implies cache will be lot less than memory capcity. L1-d=64KiB, L2=8MiB.
Point is CPU needs data in cache hierarchy and memory. Farther away the data, more time will be spent on bringing requested data closer to CPU. To compute any graph algorithm on given dataset size, in no way it will always present in cache hierarhcy at the same time (untill now these is our regular problem in computer architecture).
Now, Graphs are interesting and adds more complexity here. Graph accesses are very random. Let's say your cache can hold only 100 nodes currently, upon request to 10^5 node number, you will have to make space to accomodate memory block containing these requested node number. And lets say you evicted cache line containing node 1. And at very next request from cpu, node 1 is requested. But right now it is not in cache. May be it is not even in DRAM. We will have to request it from storage. Hence adds extra latency. So these randomness is not predictable to make use of prefetched data. And these is the problem, graph computation becomes more costly.
Hope you enjoyed it.
I will try to post it more. If you find information in blog to be impricise please help me improve.
That's it for today.
Link to fugaku specification: https://www.fujitsu.com/global/about/innovation/fugaku/specifications/
I am looking for research position in computer architecture in industrial/academic labs.
Please let me know if you or someone in your contact is hiring.
Thank you,
-Pravesh