您所在的位置:首頁 - 學術報告

學術報告

Efficiently Running Al WorkloadsUsing Long SlMD and Matrix lSAs

微信圖片_20241008090634.jpg

主講人:MarcCasas Guix 巴塞羅那超算中心

時間:2024年10月7日9:30-11:30

地 點:主樓B1421

主持人:劉偉峰


主講人簡介:

Marc Casas is a technica researchlead at the Barcelona SupercomputingCenter (BSc)andlecturer attheUniversitat Polit è cnica de Catalunya(UPC). His researchlays betweencomputer architecture(e.g,memoryaddresstranslation,andvector architectures)high-performance computing(e.g.sparse linear algebraparallel deep learning). He is the technicallead of theSONAR (parallelSOftware and New ARchitectures)research group,composed of PhD students, engineers,and postdocs. Marc has lead BSC contributions to severaeuropean projects (Mont-Blanc2020,European RrocessoiInitiative, etc.), and research collaborations with nteandlBM.

Marc has been at Bcsince 2013.He was apostdoctoral research scholar at the Lawrence LivermoreNationalLaboratory(LLNL)from2010 to 2013.He receivedthe Marie Curie and Ramón y Cajal Fellowships on 2014and 2018,respectively.He obtained a 5-years degreein mathematics in 2004,and a PhD degree in ComputerScience in 2010 from the Universitat Politècnica deCatalunya (UPC).


內容摘要:

This talk will show how state-of-the-art proposalsto compute convolutions on architectures with CPUsupporting SlMD instructions deliver poor performancefor long SlMD lengths due to freguent cache conflictmisses.The talk will propose new algorithmic approachesto mitigate the limitation of state-of-the-art proposals viathe adaptation of the amount of computation exposed tothe microarchitecture to mitigate cache misses, and theredefinition of the activation memory layout to improvethe memory access pattern.These algorithmic approachesMatrix Tile Extension(MT),a novewill motivate thematrix Instruction-Set Architecture (lSA) that completelydecouples the instruction set architecture from thmicroarchitecture and seamlessly interacts with existincvectorISAs.MTEincurs minimalimplementation overheacsince it only requires a few additional instructions and a64-bit Control Status Register (CSR) to keep its state, andbeats the best state-of-the-art matrix lSA by 1.20x.


阳江市| 黄平县| 斗六市| 汉中市| 紫金县| 阜宁县| 西城区| 应城市| 通海县| 鹤山市| 特克斯县| 梅河口市| 正宁县| 钟山县| 迭部县| 西盟| 长汀县| 瓦房店市| 鄂尔多斯市| 若尔盖县| 璧山县| 曲周县| 旬阳县| 青浦区| 大姚县| 奇台县| 宜君县| 盐池县| 新泰市| 铁岭市| 慈利县| 北碚区| 江津市| 大洼县| 大新县| 弥渡县| 滨州市| 大庆市| 佛教| 冀州市| 开化县|