Haskellの場合、エイリアス関数がパフォーマンスを低下させる理由

Question

[私の質問を明確にする]：State Monad。この質問は私のコードを最適化するためのものではありません。このコードは、ステートフルモナドのベンチマーク用です。 State- freeのピュアバージョンとSTモナドバージョンのコード、および私のリポジトリのマスターブランチのベンチマーク結果を確認できます。

コードで速記しようとすると、速記関数によって予期しない結果が生じます。（- フォーカスされたベンチマークおよび全体的なベンチマークを参照してください

このリポジトリのLazinessTestブランチでは、

私は略記しようとしたこのコード、

runTimeSlot' :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot' (target : idx : rest) inst operand = do d <- get case inst of -- this code ↓↓↓↓ 0 -> case (rem target sizeOfTarget) of 0 -> state $ \s -> ((idx : rest), setTime operand d) ... 1 -> case (rem target sizeOfTarget) of 0 -> state $ \s -> ((idx : rest), modifyTime (\x -> rem x operand) d) ...

次のように

runTimeSlot'' :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot'' (target : idx : rest) inst operand = do d <- get case inst of -- as like as ↓↓↓↓ 0 -> case targetInData of 0 -> state $ \s -> ((idx : rest), setTime operand d) ... 1 -> case targetInData of 0 -> state $ \s -> ((idx : rest), modifyTime (\x -> rem x operand) d) ... where targetInData = rem target sizeOfTarget

そして、622μsから1.767 msに大幅にパフォーマンスが低下することを示しています。

値targetInDataも次のステップcaseで評価されますが、targetInDataを厳密に次のようにすることで計算できると思いました

runTimeSlot''' :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot''' (target : idx : rest) inst operand = do d <- get -- evaluate it ↓↓ here before it used targetInData `seq` case inst of 0 -> case targetInData of 0 -> state $ \s -> ((idx : rest), setTime operand d) ... 1 -> case targetInData of 0 -> state $ \s -> ((idx : rest), modifyTime (\x -> rem x operand) d) ... where targetInData = rem target sizeOfTarget

しかし、これも機能しません。（1.758ミリ秒かかります）

@AndrásKovácsのコメントに基づいて（ありがとう、@AndrásKovács）BangPatternsをtargetInDataに次のように追加しました

runTimeSlot''b :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot''b (target : idx : rest) inst operand = do d <- get case inst of 0 -> case targetInData of -- Set 0 -> state $ \s -> ((idx : rest), setTime operand d) ... 1 -> case targetInData of -- Mod 0 -> state $ \s -> ((idx : rest), modifyTime (\x -> rem x operand) d) ... where !targetInData = rem target sizeOfTarget runTimeSlot'''b :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot'''b (target : idx : rest) inst operand = do d <- get -- evaluate it ↓↓ here before it used targetInData `seq` case inst of 0 -> case targetInData of 0 -> state $ \s -> ((idx : rest), setTime operand d) ... 1 -> case targetInData of 0 -> state $ \s -> ((idx : rest), modifyTime (\x -> rem x operand) d) ... where !targetInData = rem target sizeOfTarget

そしてそれは少しは役立ちますが、予期しない状況を完全に解決するわけではありません。

1.767 ms @ runTimeSlot''-> 1.527 ms @ runTimeSlot''b
1.758 ms @ runTimeSlot'''-> 1.503 ms @ runTimeSlot'''b

622μs@ runTimeSlot ???

私はこの状況を怠惰で自分で説明することはできませんでした。

省略形のコードとして(rem target sizeOfTaregt)を置き換えるだけでパフォーマンスが低下する理由を説明できますか？

これが単一のコンパイル可能なサンプルコードとベンチマークの結果：（私は不必要なコードを十分に減らすことができなかったのは残念です）

-- dependencies: base, containers, criterion, deepseq, mtl, splitmix {-# LANGUAGE BangPatterns #-} module Main where import Criterion.Main import Criterion.Types import Control.DeepSeq import Control.Monad.State.Strict import Data.Bifunctor import Data.Maybe import qualified Data.IntMap as IM import Data.List import System.Random.SplitMix myConfig60s = defaultConfig { timeLimit = 60.0, resamples = 10000, verbosity = Verbose } randomInts :: SMGen -> [Int] randomInts = unfoldr (Just . (first fromIntegral . bitmaskWithRejection64 64)) main :: IO () main = do putStrLn "Initialize" let size = 10000 let instSize = 2 let targetSize = 16 let operandSize = 256 let i0Gen = (mkSMGen . fromIntegral) 0 let (targetGen, i1Gen) = splitSMGen i0Gen let (instGen, i2Gen) = splitSMGen i1Gen let (operGen, iGen) = splitSMGen i2Gen let infTargetList = map (\x -> rem x targetSize) $ randomInts targetGen let infInstList = map (\x -> rem x instSize) $ randomInts instGen let infOperandList = map (\x -> rem x operandSize + 1) $ randomInts operGen let (iTime : iBalance : iStatus : _) = randomInts iGen let targetList = take (size * 2) infTargetList let instList = take size infInstList let operandList = take size infOperandList targetList `deepseq` instList `deepseq` operandList `deepseq` putStrLn "Evaluated" let iData = Data iTime iBalance iStatus IM.empty let ssBench = bgroup "SingleState Simulation" $ [ bench "SingleState.StrictPure'" $ nf ( runSimulatorPure' size targetList instList operandList ) iData , bench "SingleState.StrictPure''" $ nf ( runSimulatorPure'' size targetList instList operandList ) iData , bench "SingleState.StrictState'" $ nf ( runState $ runSimulator' size targetList instList operandList ) iData , bench "SingleState.StrictState''" $ nf ( runState $ runSimulator'' size targetList instList operandList ) iData , bench "SingleState.StrictState''b" $ nf ( runState $ runSimulator''b size targetList instList operandList ) iData , bench "SingleState.StrictState'''" $ nf ( runState $ runSimulator''' size targetList instList operandList ) iData , bench "SingleState.StrictState'''b" $ nf ( runState $ runSimulator'''b size targetList instList operandList ) iData , bench "SingleState.StrictState''''" $ nf ( runState $ runSimulator'''' size targetList instList operandList ) iData , bench "SingleState.StrictState'''''" $ nf ( runState $ runSimulator''''' size targetList instList operandList ) iData ] putStrLn "Do bench" defaultMainWith myConfig60s [ssBench] -- from SingleState.StrictPure of the repo runSimulatorPure' :: Int -> [Int] -> [Int] -> [Int] -> Data -> Data runSimulatorPure' 0 _ _ _ d = d runSimulatorPure' size tList (i : iList) (o : oList) d = restTList `seq` newData `seq` runSimulatorPure' (size - 1) restTList iList oList newData where (restTList, newData) = runTimeSlotPure' tList i o d runTimeSlotPure' :: [Int] -> Int -> Int -> Data -> ([Int], Data) runTimeSlotPure' (target : idx : rest) inst operand d = case inst of 0 -> case (rem target sizeOfTarget) of -- Set 0 -> ((idx : rest), setTime operand d) 1 -> ((idx : rest), setBalance operand d) 2 -> ((idx : rest), setStatus operand d) 3 -> (rest, setEntry idx operand d) 1 -> case (rem target sizeOfTarget) of -- Mod 0 -> ((idx : rest), modifyTime (\x -> rem x operand) d) 1 -> ((idx : rest), modifyBalance (\x -> rem x operand) d) 2 -> ((idx : rest), modifyStatus (\x -> rem x operand) d) 3 -> (rest, modifyEntry (\x -> rem x operand) idx d) runSimulatorPure'' :: Int -> [Int] -> [Int] -> [Int] -> Data -> Data runSimulatorPure'' 0 _ _ _ d = d runSimulatorPure'' size tList (i : iList) (o : oList) d = restTList `seq` newData `seq` runSimulatorPure'' (size - 1) restTList iList oList newData where (restTList, newData) = runTimeSlotPure'' tList i o d runTimeSlotPure'' :: [Int] -> Int -> Int -> Data -> ([Int], Data) runTimeSlotPure'' (target : idx : rest) inst operand d = case inst of 0 -> case targetInData of -- Set 0 -> ((idx : rest), setTime operand d) 1 -> ((idx : rest), setBalance operand d) 2 -> ((idx : rest), setStatus operand d) 3 -> (rest, setEntry idx operand d) 1 -> case targetInData of -- Mod 0 -> ((idx : rest), modifyTime (\x -> rem x operand) d) 1 -> ((idx : rest), modifyBalance (\x -> rem x operand) d) 2 -> ((idx : rest), modifyStatus (\x -> rem x operand) d) 3 -> (rest, modifyEntry (\x -> rem x operand) idx d) where targetInData = rem target sizeOfTarget -- from SingleState.StrictState of the repo runSimulator :: Int -> [Int] -> [Int] -> [Int] -> State Data Data runSimulator 0 _ _ _ = get runSimulator size tList (i : iList) (o : oList) = do restTList <- runTimeSlot tList i o runSimulator (size - 1) restTList iList oList runTimeSlot :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot (target : idx : rest) inst operand = do d <- get case inst of 0 -> case (rem target sizeOfTarget) of -- Set 0 -> state $ \s -> ((idx : rest), setTime operand d) 1 -> state $ \s -> ((idx : rest), setBalance operand d) 2 -> state $ \s -> ((idx : rest), setStatus operand d) 3 -> state $ \s -> (rest, setEntry idx operand d) 1 -> case (rem target sizeOfTarget) of -- Mod 0 -> state $ \s -> ((idx : rest), modifyTime rF d) 1 -> state $ \s -> ((idx : rest), modifyBalance rF d) 2 -> state $ \s -> ((idx : rest), modifyStatus rF d) 3 -> state $ \s -> (rest, modifyEntry rF idx d) -- 2 -> Add -- 3 -> Div where rF x = rem x operand runSimulator' :: Int -> [Int] -> [Int] -> [Int] -> State Data Data runSimulator' 0 _ _ _ = get runSimulator' size tList (i : iList) (o : oList) = do restTList <- runTimeSlot' tList i o runSimulator' (size - 1) restTList iList oList runTimeSlot' :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot' (target : idx : rest) inst operand = do d <- get case inst of 0 -> case (rem target sizeOfTarget) of -- Set 0 -> state $ \s -> ((idx : rest), setTime operand d) 1 -> state $ \s -> ((idx : rest), setBalance operand d) 2 -> state $ \s -> ((idx : rest), setStatus operand d) 3 -> state $ \s -> (rest, setEntry idx operand d) 1 -> case (rem target sizeOfTarget) of -- Mod 0 -> state $ \s -> ((idx : rest), modifyTime (\x -> rem x operand) d) 1 -> state $ \s -> ((idx : rest), modifyBalance (\x -> rem x operand) d) 2 -> state $ \s -> ((idx : rest), modifyStatus (\x -> rem x operand) d) 3 -> state $ \s -> (rest, modifyEntry (\x -> rem x operand) idx d) -- 2 -> Add -- 3 -> Div runSimulator'' :: Int -> [Int] -> [Int] -> [Int] -> State Data Data runSimulator'' 0 _ _ _ = get runSimulator'' size tList (i : iList) (o : oList) = do restTList <- runTimeSlot'' tList i o runSimulator'' (size - 1) restTList iList oList runTimeSlot'' :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot'' (target : idx : rest) inst operand = do d <- get case inst of 0 -> case targetInData of -- Set 0 -> state $ \s -> ((idx : rest), setTime operand d) 1 -> state $ \s -> ((idx : rest), setBalance operand d) 2 -> state $ \s -> ((idx : rest), setStatus operand d) 3 -> state $ \s -> (rest, setEntry idx operand d) 1 -> case targetInData of -- Mod 0 -> state $ \s -> ((idx : rest), modifyTime (\x -> rem x operand) d) 1 -> state $ \s -> ((idx : rest), modifyBalance (\x -> rem x operand) d) 2 -> state $ \s -> ((idx : rest), modifyStatus (\x -> rem x operand) d) 3 -> state $ \s -> (rest, modifyEntry (\x -> rem x operand) idx d) -- 2 -> Add -- 3 -> Div where targetInData = rem target sizeOfTarget runSimulator''b :: Int -> [Int] -> [Int] -> [Int] -> State Data Data runSimulator''b 0 _ _ _ = get runSimulator''b size tList (i : iList) (o : oList) = do restTList <- runTimeSlot''b tList i o runSimulator''b (size - 1) restTList iList oList runTimeSlot''b :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot''b (target : idx : rest) inst operand = do d <- get case inst of 0 -> case targetInData of -- Set 0 -> state $ \s -> ((idx : rest), setTime operand d) 1 -> state $ \s -> ((idx : rest), setBalance operand d) 2 -> state $ \s -> ((idx : rest), setStatus operand d) 3 -> state $ \s -> (rest, setEntry idx operand d) 1 -> case targetInData of -- Mod 0 -> state $ \s -> ((idx : rest), modifyTime (\x -> rem x operand) d) 1 -> state $ \s -> ((idx : rest), modifyBalance (\x -> rem x operand) d) 2 -> state $ \s -> ((idx : rest), modifyStatus (\x -> rem x operand) d) 3 -> state $ \s -> (rest, modifyEntry (\x -> rem x operand) idx d) -- 2 -> Add -- 3 -> Div where !targetInData = rem target sizeOfTarget runSimulator''' :: Int -> [Int] -> [Int] -> [Int] -> State Data Data runSimulator''' 0 _ _ _ = get runSimulator''' size tList (i : iList) (o : oList) = do restTList <- runTimeSlot''' tList i o runSimulator''' (size - 1) restTList iList oList runTimeSlot''' :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot''' (target : idx : rest) inst operand = do d <- get targetInData `seq` case inst of 0 -> case targetInData of -- Set 0 -> state $ \s -> ((idx : rest), setTime operand d) 1 -> state $ \s -> ((idx : rest), setBalance operand d) 2 -> state $ \s -> ((idx : rest), setStatus operand d) 3 -> state $ \s -> (rest, setEntry idx operand d) 1 -> case targetInData of -- Mod 0 -> state $ \s -> ((idx : rest), modifyTime (\x -> rem x operand) d) 1 -> state $ \s -> ((idx : rest), modifyBalance (\x -> rem x operand) d) 2 -> state $ \s -> ((idx : rest), modifyStatus (\x -> rem x operand) d) 3 -> state $ \s -> (rest, modifyEntry (\x -> rem x operand) idx d) -- 2 -> Add -- 3 -> Div where targetInData = rem target sizeOfTarget runSimulator'''b :: Int -> [Int] -> [Int] -> [Int] -> State Data Data runSimulator'''b 0 _ _ _ = get runSimulator'''b size tList (i : iList) (o : oList) = do restTList <- runTimeSlot'''b tList i o runSimulator'''b (size - 1) restTList iList oList runTimeSlot'''b :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot'''b (target : idx : rest) inst operand = do d <- get targetInData `seq` case inst of 0 -> case targetInData of -- Set 0 -> state $ \s -> ((idx : rest), setTime operand d) 1 -> state $ \s -> ((idx : rest), setBalance operand d) 2 -> state $ \s -> ((idx : rest), setStatus operand d) 3 -> state $ \s -> (rest, setEntry idx operand d) 1 -> case targetInData of -- Mod 0 -> state $ \s -> ((idx : rest), modifyTime (\x -> rem x operand) d) 1 -> state $ \s -> ((idx : rest), modifyBalance (\x -> rem x operand) d) 2 -> state $ \s -> ((idx : rest), modifyStatus (\x -> rem x operand) d) 3 -> state $ \s -> (rest, modifyEntry (\x -> rem x operand) idx d) -- 2 -> Add -- 3 -> Div where !targetInData = rem target sizeOfTarget runSimulator'''' :: Int -> [Int] -> [Int] -> [Int] -> State Data Data runSimulator'''' 0 _ _ _ = get runSimulator'''' size tList (i : iList) (o : oList) = do restTList <- runTimeSlot'''' tList i o runSimulator'''' (size - 1) restTList iList oList runTimeSlot'''' :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot'''' (target : idx : rest) inst operand = do d <- get case inst of 0 -> case targetInData of -- Set 0 -> state $ \s -> ((idx : rest), setTime operand d) 1 -> state $ \s -> ((idx : rest), setBalance operand d) 2 -> state $ \s -> ((idx : rest), setStatus operand d) 3 -> state $ \s -> (rest, setEntry idx operand d) 1 -> case targetInData of -- Mod 0 -> state $ \s -> ((idx : rest), modifyTime rF d) 1 -> state $ \s -> ((idx : rest), modifyBalance rF d) 2 -> state $ \s -> ((idx : rest), modifyStatus rF d) 3 -> state $ \s -> (rest, modifyEntry rF idx d) -- 2 -> Add -- 3 -> Div where targetInData = rem target sizeOfTarget rF x = rem x operand runSimulator''''' :: Int -> [Int] -> [Int] -> [Int] -> State Data Data runSimulator''''' 0 _ _ _ = get runSimulator''''' size tList (i : iList) (o : oList) = do restTList <- runTimeSlot''''' tList i o runSimulator''''' (size - 1) restTList iList oList runTimeSlot''''' :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot''''' (target : idx : rest) inst operand = do d <- get targetInData `seq` case inst of 0 -> case targetInData of -- Set 0 -> state $ \s -> ((idx : rest), setTime operand d) 1 -> state $ \s -> ((idx : rest), setBalance operand d) 2 -> state $ \s -> ((idx : rest), setStatus operand d) 3 -> state $ \s -> (rest, setEntry idx operand d) 1 -> case targetInData of -- Mod 0 -> state $ \s -> ((idx : rest), modifyTime rF d) 1 -> state $ \s -> ((idx : rest), modifyBalance rF d) 2 -> state $ \s -> ((idx : rest), modifyStatus rF d) 3 -> state $ \s -> (rest, modifyEntry rF idx d) -- 2 -> Add -- 3 -> Div where targetInData = rem target sizeOfTarget rF x = rem x operand type Balance = Int type Time = Int type Status = Int type Idx = Int type Datum = Int data Data = Data { time :: Time , balance :: Balance , status :: Status , aMap :: IM.IntMap Datum } deriving (Show,Eq) sizeOfTarget :: Int sizeOfTarget = 4 instance NFData Data where rnf (Data t b s m) = rnf t `seq` rnf b `seq` rnf s `seq` rnf m getTime = time getBalance = balance getStatus = status getEntry idx = fromMaybe 0 . IM.lookup idx . aMap setTime newTime d = d { time = newTime } setBalance newBalance d = d { balance = newBalance } setStatus newStatus d = d { status = newStatus } setEntry idx aDatum d = d { aMap = IM.insert idx aDatum (aMap d) } modifyTime f d = d { time = f (time d) } modifyBalance f d = d { balance = f (balance d) } modifyStatus f d = d { status = f (status d) } modifyEntry f idx d = d { aMap = IM.adjust f idx (aMap d) }

更新

@AndrásKovácsのコメントに基づいて、新しいテスト関数を追加しました。
新しい機能で測定時間ごとに更新されます。
State- freeバージョンとSTモナドバージョンを masterブランチから参照してください

P.S。

このコードを -threadedなしの-O2 を指定して実行しました。
ベンチマークの結果全体は here と here で確認できます。
Stateモナドバージョンのない純粋な関数は、この簡略化されたコードによってパフォーマンスの変化を示しません。
このコードから、この問題に関するベンチマーク全体を構築できます。

Andr&#225;s Kov&#225;cs · Answer

ghc -O2 -ddump-simpl -dsuppress-allを介してコア出力を確認する限り、GHCは次の場合、Stateタプルのボックス化解除とワーカーラッピングを単純に実行しません。

runTimeSlot2 :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot2 (target : idx : rest) inst operand = do d <- get case inst of 0 -> ... 1 -> .. where targetInData = rem target sizeOfTarget

ただし、以下の場合に動作します。 targetInDataの前のletにcaseを置くこともできます。

runTimeSlot2 :: [Int] -> Int -> Int -> State Data [Int] runTimeSlot2 (target : idx : rest) inst operand = do d <- get case inst of 0 -> ... 1 -> .. where targetInData = rem target sizeOfTarget

どういう理由ですか？何も思いつきません。しかし、これはとにかく私たちがGHCに少し信頼しすぎている例であり、プログラムは最初から最適とはほど遠いものです。まず、Dataを厳密にし、whnfではなくnfをベンチマークで使用します。

data Data = Data { time :: !Time , balance :: !Balance , status :: !Status , aMap :: !(IM.IntMap Datum) } deriving (Show,Eq)

第2に、この特定の例では、Stateが私たちを大いに買うとは思わず、末尾再帰関数を作成するだけです。

runSimulator1 :: Int -> [Int] -> [Int] -> [Int] -> Data -> Data runSimulator1 = go where go 0 _ _ _ d = d go size (target : (idx : rest)) (i : iList) (o : oList) d = let targetInData = rem target sizeOfTarget in case i of 0 -> case targetInData of 0 -> go (size - 1) (idx : rest) iList oList (setTime o d) 1 -> go (size - 1) (idx : rest) iList oList (setBalance o d) 2 -> go (size - 1) (idx : rest) iList oList (setStatus o d) 3 -> go (size - 1) rest iList oList (setEntry idx o d) 1 -> case targetInData of 0 -> go (size - 1) (idx : rest) iList oList (modifyTime (\x -> rem x o) d) 1 -> go (size - 1) (idx : rest) iList oList (modifyBalance (\x -> rem x o) d) 2 -> go (size - 1) (idx : rest) iList oList (modifyStatus (\x -> rem x o) d) 3 -> go (size - 1) rest iList oList (modifyEntry (\x -> rem x o) idx d)

これは、私のコンピューターでは、元のベンチマークのパフォーマンスの高いバリアントよりも2倍以上速く実行されます。

元のコードにパフォーマンスの問題があることに気付きました：

... 0 -> case targetInData of 0 -> state $ \s -> ((idx : rest), setTime operand d) 1 -> state $ \s -> ((idx : rest), setBalance operand d) 2 -> state $ \s -> ((idx : rest), setStatus operand d) 3 -> state $ \s -> (rest, setEntry idx operand d) ...

上記では、setTime operand dのような返されるすべての状態は遅延です。したがって、多数のサンクを取得します。代わりにできます：

0 -> case targetInData of -- Set 0 -> (idx : rest) <$ (put $! setTime operand d) 1 -> (idx : rest) <$ (put $! setBalance operand d) 2 -> (idx : rest) <$ (put $! setStatus operand d) 3 -> rest <$ (put $! setEntry idx operand d)

これによりパフォーマンスが向上しますが、GHCはStateを単純な関数の引数または結果としてunboxできますが、Data Tuple内のDataをunboxできないので、私のState- freeバージョンよりも少し遅くなります。

一般に、本当に最適化したい場合、最も堅牢なソリューションは、純粋（非モナド）で厳密なプレーン関数、できれば末尾再帰です。その程度まで最適化する努力に値するかどうかは、開発状況に依存します。