すべての因子を返すためのR関数

Question

私の通常の検索fooは私を失敗させています。整数のすべての因子を返すR関数を見つけようとしています。 factorize()関数を持つパッケージが少なくとも2つあります：gmpとconf.designですが、これらの関数は素因数のみを返します。すべての要素を返す関数が欲しいのですが。

Rにはファクターと呼ばれる構造があり、検索に多くのノイズがかかるため、明らかにこれを検索することは困難です。

Chase · Accepted Answer

私のコメントをフォローアップするために（私のタイプミスを@Ramnathに感謝します）、ブルートフォース方式は私の64ビット8ギグマシンでかなりうまく機能しているようです：

FUN <- function(x) { x <- as.integer(x) div <- seq_len(abs(x)) factors <- div[x %% div == 0L] factors <- list(neg = -factors, pos = factors) return(factors) }

いくつかの例：

> FUN(100) $neg [1] -1 -2 -4 -5 -10 -20 -25 -50 -100 $pos [1] 1 2 4 5 10 20 25 50 100 > FUN(-42) $neg [1] -1 -2 -3 -6 -7 -14 -21 -42 $pos [1] 1 2 3 6 7 14 21 42 #and big number > system.time(FUN(1e8)) user system elapsed 1.95 0.18 2.14

Richie Cotton · Answer

素因数からすべての因数を得ることができます。 gmpはこれらを非常に迅速に計算します。

library(gmp) library(plyr) get_all_factors <- function(n) { prime_factor_tables <- lapply( setNames(n, n), function(i) { if(i == 1) return(data.frame(x = 1L, freq = 1L)) plyr::count(as.integer(gmp::factorize(i))) } ) lapply( prime_factor_tables, function(pft) { powers <- plyr::alply(pft, 1, function(row) row$x ^ seq.int(0L, row$freq)) power_grid <- do.call(expand.grid, powers) sort(unique(apply(power_grid, 1, prod))) } ) } get_all_factors(c(1, 7, 60, 663, 2520, 75600, 15876000, 174636000, 403409160000))

Joseph Wood · Answer

更新

アルゴリズムは完全に更新され、複数の多項式と、数百万のチェックを排除するいくつかの巧妙なふるい分け手法が実装されています。元のリンクに加えて、この論文とこの投稿から primo は、この最後の段階で非常に役に立ちました（primoへの多くの称賛）。 Primoは、比較的短いスペースでQSの内臓を説明する素晴らしい仕事をしており、かなり驚くべきアルゴリズムも作成しました（2秒以内に下部の38！+ 1の数を因数分解します!!非常識です!!）。

約束通り、以下は Quadratic Sieve の私の謙虚なR実装です。 1月下旬に約束して以来、私はこのアルゴリズムに散発的に取り組んできました。それは非常に複雑であり、うまくいけば、私の関数名がそれ自体を物語っているので、私はそれを完全に説明しようとはしません（要求されない限り...また、以下のリンクは非常に良い仕事をします）。これは、プログラマーの観点からも数学的にも要求が厳しいため、私がこれまで実行しようとした中で最も困難なアルゴリズムの1つであることが証明されています。私は数え切れないほどの論文を読みましたが、最終的には、これら5つが最も役立つことがわかりました（ QSieve1 、 QSieve2 、 QSieve 、 QSieve4 、 QSieve5 ）。

N.B.このアルゴリズムは、現状では、一般的な素因数分解アルゴリズムとしてはあまり機能しません。さらに最適化された場合は、小さい素数（つまり、この投稿で提案されているように10 ^ 5未満）を除外するコードのセクションを添付する必要があります。次に、QuadSieveAllを呼び出し、確認してください。これらが素数である場合、そうでない場合は、すべての素数が残るまで、これらの要素の両方でQuadSieveAllを呼び出します（これらの手順はすべてそれほど難しくありません）。ただし、この投稿の主なポイントは、二次ふるい法の心臓部を強調することです。したがって、以下の例はすべて半素数です（正方形を含まないほとんどの奇数を因数分解しますが、…また、例を見たことがありません。非半素数を示さなかったQS）。 OPが素因数分解ではなくすべての因数分解を返す方法を探していたのは知っていますが、このアルゴリズム（さらに最適化されている場合）と上記のアルゴリズムの1つを組み合わせると、一般的な因数分解アルゴリズムとして考慮する必要があります（特にOPは Project Euler のために何かを必要としていました。これは通常、力ずくの方法よりもはるかに多くのものを必要とします）。ちなみに、MyIntToBit関数は this 回答のバリエーションであり、PrimeSieveはDontas氏が登場した post からのものですしばらく前に（それについても称賛）。

QuadSieveMultiPolysAll <- function(MyN, fudge1=0L, fudge2=0L, LenB=0L) { ### 'MyN' is the number to be factored; 'fudge1' is an arbitrary number ### that is used to determine the size of your prime base for sieving; ### 'fudge2' is used to set a threshold for sieving; ### 'LenB' is a the size of the sieving interval. The last three ### arguments are optional (they are determined based off of the ### size of MyN if left blank) ### The first 8 functions are helper functions PrimeSieve <- function(n) { n <- as.integer(n) if (n > 1e9) stop("n too large") primes <- rep(TRUE, n) primes[1] <- FALSE last.prime <- 2L fsqr <- floor(sqrt(n)) while (last.prime <= fsqr) { primes[seq.int(last.prime^2, n, last.prime)] <- FALSE sel <- which(primes[(last.prime + 1):(fsqr + 1)]) if (any(sel)) { last.prime <- last.prime + min(sel) } else { last.prime <- fsqr + 1 } } MyPs <- which(primes) rm(primes) gc() MyPs } MyIntToBit <- function(x, Dig) { i <- 0L string <- numeric(Dig) while (x > 0) { string[Dig - i] <- x %% 2L x <- x %/% 2L i <- i + 1L } string } ExpBySquaringBig <- function(x, n, p) { if (n == 1) { MyAns <- mod.bigz(x,p) } else if (mod.bigz(n,2)==0) { MyAns <- ExpBySquaringBig(mod.bigz(pow.bigz(x,2),p),div.bigz(n,2),p) } else { MyAns <- mod.bigz(mul.bigz(x,ExpBySquaringBig(mod.bigz( pow.bigz(x,2),p), div.bigz(sub.bigz(n,1),2),p)),p) } MyAns } TonelliShanks <- function(a,p) { P1 <- sub.bigz(p,1); j <- 0L; s <- P1 while (mod.bigz(s,2)==0L) {s <- s/2; j <- j+1L} if (j==1L) { MyAns1 <- ExpBySquaringBig(a,(p+1L)/4,p) MyAns2 <- mod.bigz(-1 * ExpBySquaringBig(a,(p+1L)/4,p),p) } else { n <- 2L Legendre2 <- ExpBySquaringBig(n,P1/2,p) while (Legendre2==1L) {n <- n+1L; Legendre2 <- ExpBySquaringBig(n,P1/2,p)} x <- ExpBySquaringBig(a,(s+1L)/2,p) b <- ExpBySquaringBig(a,s,p) g <- ExpBySquaringBig(n,s,p) r <- j; m <- 1L Test <- mod.bigz(b,p) while (!(Test==1L) && !(m==0L)) { m <- 0L Test <- mod.bigz(b,p) while (!(Test==1L)) {m <- m+1L; Test <- ExpBySquaringBig(b,pow.bigz(2,m),p)} if (!m==0) { x <- mod.bigz(x * ExpBySquaringBig(g,pow.bigz(2,r-m-1L),p),p) g <- ExpBySquaringBig(g,pow.bigz(2,r-m),p) b <- mod.bigz(b*g,p); r <- m }; Test <- 0L }; MyAns1 <- x; MyAns2 <- mod.bigz(p-x,p) } c(MyAns1, MyAns2) } SieveLists <- function(facLim, FBase, vecLen, sieveD, MInt) { vLen <- ceiling(vecLen/2); SecondHalf <- (vLen+1L):vecLen MInt1 <- MInt[1:vLen]; MInt2 <- MInt[SecondHalf] tl <- vector("list",length=facLim) for (m in 3:facLim) { st1 <- mod.bigz(MInt1[1],FBase[m]) m1 <- 1L+as.integer(mod.bigz(sieveD[[m]][1] - st1,FBase[m])) m2 <- 1L+as.integer(mod.bigz(sieveD[[m]][2] - st1,FBase[m])) sl1 <- seq.int(m1,vLen,FBase[m]) sl2 <- seq.int(m2,vLen,FBase[m]) tl1 <- list(sl1,sl2) st2 <- mod.bigz(MInt2[1],FBase[m]) m3 <- vLen+1L+as.integer(mod.bigz(sieveD[[m]][1] - st2,FBase[m])) m4 <- vLen+1L+as.integer(mod.bigz(sieveD[[m]][2] - st2,FBase[m])) sl3 <- seq.int(m3,vecLen,FBase[m]) sl4 <- seq.int(m4,vecLen,FBase[m]) tl2 <- list(sl3,sl4) tl[[m]] <- list(tl1,tl2) } tl } SieverMod <- function(facLim, FBase, vecLen, SD, MInt, FList, LogFB, Lim, myCol) { MyLogs <- rep(0,nrow(SD)) for (m in 3:facLim) { MyBool <- rep(FALSE,vecLen) MyBool[c(FList[[m]][[1]][[1]],FList[[m]][[2]][[1]])] <- TRUE MyBool[c(FList[[m]][[1]][[2]],FList[[m]][[2]][[2]])] <- TRUE temp <- which(MyBool) MyLogs[temp] <- MyLogs[temp] + LogFB[m] } MySieve <- which(MyLogs > Lim) MInt <- MInt[MySieve]; NewSD <- SD[MySieve,] newLen <- length(MySieve); GoForIT <- FALSE MyMat <- matrix(integer(0),nrow=newLen,ncol=myCol) MyMat[which(NewSD[,1L] < 0),1L] <- 1L; MyMat[which(NewSD[,1L] > 0),1L] <- 0L if ((myCol-1L) - (facLim+1L) > 0L) {MyMat[,((facLim+2L):(myCol-1L))] <- 0L} if (newLen==1L) {MyMat <- matrix(MyMat,nrow=1,byrow=TRUE)} if (newLen > 0L) { GoForIT <- TRUE for (m in 1:facLim) { vec <- rep(0L,newLen) temp <- which((NewSD[,1L]%%FBase[m])==0L) NewSD[temp,] <- NewSD[temp,]/FBase[m]; vec[temp] <- 1L test <- temp[which((NewSD[temp,]%%FBase[m])==0L)] while (length(test)>0L) { NewSD[test,] <- NewSD[test,]/FBase[m] vec[test] <- (vec[test]+1L) test <- test[which((NewSD[test,]%%FBase[m])==0L)] } MyMat[,m+1L] <- vec } } list(MyMat,NewSD,MInt,GoForIT) } reduceMatrix <- function(mat) { tempMin <- 0L; n1 <- ncol(mat); n2 <- nrow(mat) mymax <- 1L for (i in 1:n1) { temp <- which(mat[,i]==1L) t <- which(temp >= mymax) if (length(temp)>0L && length(t)>0L) { MyMin <- min(temp[t]) if (!(MyMin==mymax)) { vec <- mat[MyMin,] mat[MyMin,] <- mat[mymax,] mat[mymax,] <- vec } t <- t[-1]; temp <- temp[t] for (j in temp) {mat[j,] <- (mat[j,]+mat[mymax,])%%2L} mymax <- mymax+1L } } if (mymax<n2) {simpMat <- mat[-(mymax:n2),]} else {simpMat <- mat} lenSimp <- nrow(simpMat) if (is.null(lenSimp)) {lenSimp <- 0L} mycols <- 1:n1 if (lenSimp>1L) { ## "Diagonalizing" Matrix for (i in 1:lenSimp) { if (all(simpMat[i,]==0L)) {simpMat <- simpMat[-i,]; next} if (!simpMat[i,i]==1L) { t <- min(which(simpMat[i,]==1L)) vec <- simpMat[,i]; tempCol <- mycols[i] simpMat[,i] <- simpMat[,t]; mycols[i] <- mycols[t] simpMat[,t] <- vec; mycols[t] <- tempCol } } lenSimp <- nrow(simpMat); MyList <- vector("list",length=n1) MyFree <- mycols[which((1:n1)>lenSimp)]; for (i in MyFree) {MyList[[i]] <- i} if (is.null(lenSimp)) {lenSimp <- 0L} if (lenSimp>1L) { for (i in lenSimp:1L) { t <- which(simpMat[i,]==1L) if (length(t)==1L) { simpMat[ ,t] <- 0L MyList[[mycols[i]]] <- 0L } else { t1 <- t[t>i] if (all(t1 > lenSimp)) { MyList[[mycols[i]]] <- MyList[[mycols[t1[1]]]] if (length(t1)>1) { for (j in 2:length(t1)) {MyList[[mycols[i]]] <- c(MyList[[mycols[i]]], MyList[[mycols[t1[j]]]])} } } else { for (j in t1) { if (length(MyList[[mycols[i]]])==0L) {MyList[[mycols[i]]] <- MyList[[mycols[j]]]} else { e1 <- which(MyList[[mycols[i]]]%in%MyList[[mycols[j]]]) if (length(e1)==0) { MyList[[mycols[i]]] <- c(MyList[[mycols[i]]],MyList[[mycols[j]]]) } else { e2 <- which(!MyList[[mycols[j]]]%in%MyList[[mycols[i]]]) MyList[[mycols[i]]] <- MyList[[mycols[i]]][-e1] if (length(e2)>0L) {MyList[[mycols[i]]] <- c(MyList[[mycols[i]]], MyList[[mycols[j]]][e2])} } } } } } } TheList <- lapply(MyList, function(x) {if (length(x)==0L) {0} else {x}}) list(TheList,MyFree) } else { list(NULL,NULL) } } else { list(NULL,NULL) } } GetFacs <- function(vec1, vec2, n) { x <- mod.bigz(prod.bigz(vec1),n) y <- mod.bigz(prod.bigz(vec2),n) MyAns <- c(gcd.bigz(x-y,n),gcd.bigz(x+y,n)) MyAns[sort.list(asNumeric(MyAns))] } SolutionSearch <- function(mymat, M2, n, FB) { colTest <- which(apply(mymat, 2, sum) == 0) if (length(colTest) > 0) {solmat <- mymat[ ,-colTest]} else {solmat <- mymat} if (length(nrow(solmat)) > 0) { nullMat <- reduceMatrix(t(solmat %% 2L)) listSol <- nullMat[[1]]; freeVar <- nullMat[[2]]; LF <- length(freeVar) } else {LF <- 0L} if (LF > 0L) { for (i in 2:min(10^8,(2^LF + 1L))) { PosAns <- MyIntToBit(i, LF) posVec <- sapply(listSol, function(x) { t <- which(freeVar %in% x) if (length(t)==0L) { 0 } else { sum(PosAns[t])%%2L } }) ansVec <- which(posVec==1L) if (length(ansVec)>0) { if (length(ansVec) > 1L) { myY <- apply(mymat[ansVec,],2,sum) } else { myY <- mymat[ansVec,] } if (sum(myY %% 2) < 1) { myY <- as.integer(myY/2) myY <- pow.bigz(FB,myY[-1]) temp <- GetFacs(M2[ansVec], myY, n) if (!(1==temp[1]) && !(1==temp[2])) { return(temp) } } } } } } ### Below is the main portion of the Quadratic Sieve BegTime <- Sys.time(); MyNum <- as.bigz(MyN); DigCount <- nchar(as.character(MyN)) P <- PrimeSieve(10^5) SqrtInt <- .mpfr2bigz(trunc(sqrt(mpfr(MyNum,sizeinbase(MyNum,b=2)+5L)))) if (DigCount < 24) { DigSize <- c(4,10,15,20,23) f_Pos <- c(0.5,0.25,0.15,0.1,0.05) MSize <- c(5000,7000,10000,12500,15000) if (fudge1==0L) { LM1 <- lm(f_Pos ~ DigSize) m1 <- summary(LM1)$coefficients[2,1] b1 <- summary(LM1)$coefficients[1,1] fudge1 <- DigCount*m1 + b1 } if (LenB==0L) { LM2 <- lm(MSize ~ DigSize) m2 <- summary(LM2)$coefficients[2,1] b2 <- summary(LM2)$coefficients[1,1] LenB <- ceiling(DigCount*m2 + b2) } LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum))))) B <- P[P<=LimB]; B <- B[-1] facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L] LenFBase <- length(facBase)+1L } else if (DigCount < 67) { ## These values were obtained from "The Multiple Polynomial ## Quadratic Sieve" by Robert D. Silverman DigSize <- c(24,30,36,42,48,54,60,66) FBSize <- c(100,200,400,900,1200,2000,3000,4500) MSize <- c(5,25,25,50,100,250,350,500) LM1 <- loess(FBSize ~ DigSize) LM2 <- loess(MSize ~ DigSize) if (fudge1==0L) { fudge1 <- -0.4 LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum))))) myTarget <- ceiling(predict(LM1, DigCount)) while (LimB < myTarget) { LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum))))) fudge1 <- fudge1+0.001 } B <- P[P<=LimB]; B <- B[-1] facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L] LenFBase <- length(facBase)+1L while (LenFBase < myTarget) { fudge1 <- fudge1+0.005 LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum))))) myind <- which(P==max(B))+1L myset <- tempP <- P[myind] while (tempP < LimB) { myind <- myind + 1L tempP <- P[myind] myset <- c(myset, tempP) } for (p in myset) { t <- ExpBySquaringBig(MyNum,(p-1)/2,p)==1L if (t) {facBase <- c(facBase,p)} } B <- c(B, myset) LenFBase <- length(facBase)+1L } } else { LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum))))) B <- P[P<=LimB]; B <- B[-1] facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L] LenFBase <- length(facBase)+1L } if (LenB==0L) {LenB <- 1000*ceiling(predict(LM2, DigCount))} } else { return("The number you've entered is currently too big for this algorithm!!") } SieveDist <- lapply(facBase, function(x) TonelliShanks(MyNum,x)) SieveDist <- c(1L,SieveDist); SieveDist[[1]] <- c(SieveDist[[1]],1L); facBase <- c(2L,facBase) Lower <- -LenB; Upper <- LenB; LenB2 <- 2*LenB+1L; MyInterval <- Lower:Upper M <- MyInterval + SqrtInt ## Set that will be tested SqrDiff <- matrix(sub.bigz(pow.bigz(M,2),MyNum),nrow=length(M),ncol=1L) maxM <- max(MyInterval) LnFB <- log(facBase) ## N.B. primo uses 0.735, as his siever ## is more efficient than the one employed here if (fudge2==0L) { if (DigCount < 8) { fudge2 <- 0 } else if (DigCount < 12) { fudge2 <- .7 } else if (DigCount < 20) { fudge2 <- 1.3 } else { fudge2 <- 1.6 } } TheCut <- log10(maxM*sqrt(2*asNumeric(MyNum)))*fudge2 myPrimes <- as.bigz(facBase) CoolList <- SieveLists(LenFBase, facBase, LenB2, SieveDist, MyInterval) GetMatrix <- SieverMod(LenFBase, facBase, LenB2, SqrDiff, M, CoolList, LnFB, TheCut, LenFBase+1L) if (GetMatrix[[4]]) { newmat <- GetMatrix[[1]]; NewSD <- GetMatrix[[2]]; M <- GetMatrix[[3]] NonSplitFacs <- which(abs(NewSD[,1L])>1L) newmat <- newmat[-NonSplitFacs, ] M <- M[-NonSplitFacs] lenM <- length(M) if (class(newmat) == "matrix") { if (nrow(newmat) > 0) { PosAns <- SolutionSearch(newmat,M,MyNum,myPrimes) } else { PosAns <- vector() } } else { newmat <- matrix(newmat, nrow = 1) PosAns <- vector() } } else { newmat <- matrix(integer(0),ncol=(LenFBase+1L)) PosAns <- vector() } Atemp <- .mpfr2bigz(trunc(sqrt(sqrt(mpfr(2*MyNum))/maxM))) if (Atemp < max(facBase)) {Atemp <- max(facBase)}; myPoly <- 0L while (length(PosAns)==0L) {LegTest <- TRUE while (LegTest) { Atemp <- nextprime(Atemp) Legendre <- asNumeric(ExpBySquaringBig(MyNum,(Atemp-1L)/2,Atemp)) if (Legendre == 1) {LegTest <- FALSE} } A <- Atemp^2 Btemp <- max(TonelliShanks(MyNum, Atemp)) B2 <- (Btemp + (MyNum - Btemp^2) * inv.bigz(2*Btemp,Atemp))%%A C <- as.bigz((B2^2 - MyNum)/A) myPoly <- myPoly + 1L polySieveD <- lapply(1:LenFBase, function(x) { AInv <- inv.bigz(A,facBase[x]) asNumeric(c(((SieveDist[[x]][1]-B2)*AInv)%%facBase[x], ((SieveDist[[x]][2]-B2)*AInv)%%facBase[x])) }) M1 <- A*MyInterval + B2 SqrDiff <- matrix(A*pow.bigz(MyInterval,2) + 2*B2*MyInterval + C,nrow=length(M1),ncol=1L) CoolList <- SieveLists(LenFBase, facBase, LenB2, polySieveD, MyInterval) myPrimes <- c(myPrimes,Atemp) LenP <- length(myPrimes) GetMatrix <- SieverMod(LenFBase, facBase, LenB2, SqrDiff, M1, CoolList, LnFB, TheCut, LenP+1L) if (GetMatrix[[4]]) { n2mat <- GetMatrix[[1]]; N2SD <- GetMatrix[[2]]; M1 <- GetMatrix[[3]] n2mat[,LenP+1L] <- rep(2L,nrow(N2SD)) if (length(N2SD) > 0) {NonSplitFacs <- which(abs(N2SD[,1L])>1L)} else {NonSplitFacs <- LenB2} if (length(NonSplitFacs)<2*LenB) { M1 <- M1[-NonSplitFacs]; lenM1 <- length(M1) n2mat <- n2mat[-NonSplitFacs,] if (lenM1==1L) {n2mat <- matrix(n2mat,nrow=1)} if (ncol(newmat) < (LenP+1L)) { numCol <- (LenP + 1L) - ncol(newmat) newmat <- cbind(newmat,matrix(rep(0L,numCol*nrow(newmat)),ncol=numCol)) } newmat <- rbind(newmat,n2mat); lenM <- lenM+lenM1; M <- c(M,M1) if (class(newmat) == "matrix") { if (nrow(newmat) > 0) { PosAns <- SolutionSearch(newmat,M,MyNum,myPrimes) } } } } } EndTime <- Sys.time() TotTime <- EndTime - BegTime print(format(TotTime)) return(PosAns) }

古いQSアルゴリズムを使用

> library(gmp) > library(Rmpfr) > n3 <- prod(nextprime(urand.bigz(2, 40, 17))) > system.time(t5 <- QuadSieveAll(n3,0.1,myps)) user system elapsed 164.72 0.77 165.63 > system.time(t6 <- factorize(n3)) user system elapsed 0.1 0.0 0.1 > all(t5[sort.list(asNumeric(t5))]==t6[sort.list(asNumeric(t6))]) [1] TRUE

新しいMuli-Polynomial QSアルゴリズムを使用

> QuadSieveMultiPolysAll(n3) [1] "4.952 secs" Big Integer ('bigz') object of length 2: [1] 342086446909 483830424611 > n4 <- prod(nextprime(urand.bigz(2,50,5))) > QuadSieveMultiPolysAll(n4) ## With old algo, it took over 4 hours [1] "1.131717 mins" Big Integer ('bigz') object of length 2: [1] 166543958545561 880194119571287 > n5 <- as.bigz("94968915845307373740134800567566911") ## 35 digits > QuadSieveMultiPolysAll(n5) [1] "3.813167 mins" Big Integer ('bigz') object of length 2: [1] 216366620575959221 438925910071081891 > system.time(factorize(n5)) ## It appears we are reaching the limits of factorize user system elapsed 131.97 0.00 131.98

補足：上記の番号n5は非常に興味深い番号です。それをチェックしてくださいここ

ブレークポイント!!!!

> n6 <- factorialZ(38) + 1L ## 45 digits > QuadSieveMultiPolysAll(n6) [1] "22.79092 mins" Big Integer ('bigz') object of length 2: [1] 14029308060317546154181 37280713718589679646221 > system.time(factorize(n6)) ## Shut it down after 2 days of running

最新の勝利（50桁）

> n9 <- prod(nextprime(urand.bigz(2,82,42))) > QuadSieveMultiPolysAll(n9) [1] "12.9297 hours" Big Integer ('bigz') object of length 2: [1] 2128750292720207278230259 4721136619794898059404993 ## Based off of some crude test, factorize(n9) would take more than a year.

QSは一般に、小さい数値ではポラードのrhoアルゴリズムほどには機能せず、数値が大きくなるにつれてQSの能力が明らかになり始めることに注意してください。誰でも、いつものように、これが誰かに刺激を与えることを願っています！乾杯！

George Dontas · Answer

次のアプローチは、非常に大きな数（文字列として渡す必要がある）の場合でも、正しい結果を提供します。そして、それは本当に速いです。

# TEST # x <- as.bigz("12345678987654321") # all_divisors(x) # all_divisors(x*x) # x <- pow.bigz(2,89)-1 # all_divisors(x) library(gmp) options(scipen =30) sort_listz <- function(z) { #========================== z <- z[order(as.numeric(z))] # sort(z) } # function sort_listz mult_listz <- function(x,y) { do.call('c', lapply(y, function(i) i*x)) } all_divisors <- function(x) { #========================== if (abs(x)<=1) return(x) else { factorsz <- as.bigz(factorize(as.bigz(x))) # factorize returns up to # e.g. x= 12345678987654321 factors: 3 3 3 3 37 37 333667 333667 factorsz <- sort_listz(factorsz) # vector of primes, sorted prime_factorsz <- unique(factorsz) #prime_ekt <- sapply(prime_factorsz, function(i) length( factorsz [factorsz==i])) prime_ekt <- vapply(prime_factorsz, function(i) sum(factorsz==i), integer(1), USE.NAMES=FALSE) spz <- vector() # keep all divisors all <-1 n <- length(prime_factorsz) for (i in 1:n) { pr <- prime_factorsz[i] pe <- prime_ekt[i] all <- all*(pe+1) #counts all divisors prz <- as.bigz(pr) pse <- vector(mode="raw",length=pe+1) pse <- c( as.bigz(1), prz) if (pe>1) { for (k in 2:pe) { prz <- prz*pr pse[k+1] <- prz } # for k } # if pe>1 if (i>1) { spz <- mult_listz (spz, pse) } else { spz <- pse; } # if i>1 } #for n spz <- sort_listz (spz) return (spz) } } # function factors_all_divisors #====================================

洗練されたバージョン、非常に高速。コードはシンプルで読みやすく、クリーンなままです。

テスト

#Test 4 (big prime factor) x <- pow.bigz(2,256)+1 # = 1238926361552897 * 93461639715357977769163558199606896584051237541638188580280321 system.time(z2 <- all_divisors(x)) # user system elapsed # 19.27 1.27 20.56 #Test 5 (big prime factor) x <- as.bigz("12345678987654321321") # = 3 * 19 * 216590859432531953 system.time(x2 <- all_divisors(x^2)) #user system elapsed #25.65 0.00 25.67

Joseph Wood · Answer

メジャーアップデート

以下は私の最新のR因数分解アルゴリズムです。これははるかに高速で、 rle 関数に敬意を表しています。

アルゴリズム3（更新）

_library(gmp) MyFactors <- function(MyN) { myRle <- function (x1) { n1 <- length(x1) y1 <- x1[-1L] != x1[-n1] i <- c(which(y1), n1) list(lengths = diff(c(0L, i)), values = x1[i], uni = sum(y1)+1L) } if (MyN==1L) return(MyN) else { pfacs <- myRle(factorize(MyN)) unip <- pfacs$values pv <- pfacs$lengths n <- pfacs$uni myf <- unip[1L]^(0L:pv[1L]) if (n > 1L) { for (j in 2L:n) { myf <- c(myf, do.call(c,lapply(unip[j]^(1L:pv[j]), function(x) x*myf))) } } } myf[order(asNumeric(myf))] ## 'order' is faster than 'sort.list' } _

以下は新しいベンチマークです（Dirk Eddelbuettelが言うようにここ、 "経験論と議論することはできません"）：

ケース1（大きな素因数）

_set.seed(100) myList <- lapply(1:10^3, function(x) sample(10^6, 10^5)) benchmark(SortList=lapply(myList, function(x) sort.list(x)), OrderFun=lapply(myList, function(x) order(x)), replications=3, columns = c("test", "replications", "elapsed", "relative")) test replications elapsed relative 2 OrderFun 3 59.41 1.000 1 SortList 3 61.52 1.036 ## The times are limited by "gmp::factorize" and since it relies on ## pseudo-random numbers, the times can vary (i.e. one pseudo random ## number may lead to a factorization faster than others). With this ## in mind, any differences less than a half of second ## (or so) should be viewed as the same. x <- pow.bigz(2,256)+1 system.time(z1 <- MyFactors(x)) user system elapsed 14.94 0.00 14.94 system.time(z2 <- all_divisors(x)) ## system.time(factorize(x)) user system elapsed ## user system elapsed 14.94 0.00 14.96 ## 14.94 0.00 14.94 all(z1==z2) [1] TRUE x <- as.bigz("12345678987654321321") system.time(x1 <- MyFactors(x^2)) user system elapsed 20.66 0.02 20.71 system.time(x2 <- all_divisors(x^2)) ## system.time(factorize(x^2)) user system elapsed ## user system elapsed 20.69 0.00 20.69 ## 20.67 0.00 20.67 all(x1==x2) [1] TRUE _

ケース2（小さい数値）

_set.seed(199) samp <- sample(10^9, 10^5) benchmark(JosephDivs=sapply(samp, MyFactors), DontasDivs=sapply(samp, all_divisors), OldDontas=sapply(samp, Oldall_divisors), replications=10, columns = c("test", "replications", "elapsed", "relative"), order = "relative") test replications elapsed relative 1 JosephDivs 10 470.31 1.000 2 DontasDivs 10 567.10 1.206 ## with vapply(..., USE.NAMES = FALSE) 3 OldDontas 10 626.19 1.331 ## with sapply _

ケース3（完全に徹底するため）

_set.seed(97) samp <- sample(10^6, 10^4) benchmark(JosephDivs=sapply(samp, MyFactors), DontasDivs=sapply(samp, all_divisors), CottonDivs=sapply(samp, get_all_factors), ChaseDivs=sapply(samp, FUN), replications=5, columns = c("test", "replications", "elapsed", "relative"), order = "relative") test replications elapsed relative 1 JosephDivs 5 22.68 1.000 2 DontasDivs 5 27.66 1.220 3 CottonDivs 5 126.66 5.585 4 ChaseDivs 5 554.25 24.438 _

元の投稿

コットン氏のアルゴリズムは非常に素晴らしいRの実装です。強引な方法は、これまでのところしか得られず、多数で失敗します（これもめちゃくちゃ遅いです）。さまざまなニーズを満たす3つのアルゴリズムを提供しました。最初のアルゴリズム（1月15日に投稿した元のアルゴリズムであり、わずかに更新されています）は、効率的で正確で、他の言語に簡単に翻訳できる組み合わせアプローチを提供するスタンドアロンの因数分解アルゴリズムです。 2番目のアルゴリズムは、非常に高速で、数千の数をすばやく因数分解する必要がある場合に非常に役立つふるいです。 3つ目は、短い（上記に投稿）が、2 ^ 70未満の任意の数に優れた強力なスタンドアロンアルゴリズムです（元のコードからほとんどすべてを廃棄しました）。 Richie Cottonが_plyr::count_関数を使用したことからインスピレーションを得ました（_plyr::count_と非常によく似たリターンを持つ独自のrle関数を作成するようになりました）、GeorgeDontasのクリーンな処理方法些細なケース（つまり、if (n==1) return(1)）、およびZelazny7によって提供された質問の解決策はbigzベクトルに関して持っていました。

アルゴリズム1（元）

_library(gmp) factor2 <- function(MyN) { if (MyN == 1) return(1L) else { max_p_div <- factorize(MyN) prime_vec <- max_p_div <- max_p_div[sort.list(asNumeric(max_p_div))] my_factors <- powers <- as.bigz(vector()) uni_p <- unique(prime_vec); maxp <- max(prime_vec) for (i in 1:length(uni_p)) { temp_size <- length(which(prime_vec == uni_p[i])) powers <- c(powers, pow.bigz(uni_p[i], 1:temp_size)) } my_factors <- c(as.bigz(1L), my_factors, powers) temp_facs <- powers; r <- 2L temp_facs2 <- max_p_div2 <- as.bigz(vector()) while (r <= length(uni_p)) { for (i in 1:length(temp_facs)) { a <- which(prime_vec > max_p_div[i]) temp <- mul.bigz(temp_facs[i], powers[a]) temp_facs2 <- c(temp_facs2, temp) max_p_div2 <- c(max_p_div2, prime_vec[a]) } my_sort <- sort.list(asNumeric(max_p_div2)) temp_facs <- temp_facs2[my_sort] max_p_div <- max_p_div2[my_sort] my_factors <- c(my_factors, temp_facs) temp_facs2 <- max_p_div2 <- as.bigz(vector()); r <- r+1L } } my_factors[sort.list(asNumeric(my_factors))] } _

アルゴリズム2（ふるい）

_EfficientFactorList <- function(n) { MyFactsList <- lapply(1:n, function(x) 1) for (j in 2:n) { for (r in seq.int(j, n, j)) {MyFactsList[[r]] <- c(MyFactsList[[r]], j)} }; MyFactsList} _

1から100,000までのすべての数値を2秒未満で因数分解します。このアルゴリズムの効率を理解するために、ブルートフォース法を使用して1〜100,000を因数分解する時間は3分近くかかります。

_system.time(t1 <- EfficientFactorList(10^5)) user system elapsed 1.04 0.00 1.05 system.time(t2 <- sapply(1:10^5, MyFactors)) user system elapsed 39.21 0.00 39.23 system.time(t3 <- sapply(1:10^5, all_divisors)) user system elapsed 49.03 0.02 49.05 TheTest <- sapply(1:10^5, function(x) all(t2[[x]]==t3[[x]]) && all(asNumeric(t2[[x]])==t1[[x]]) && all(asNumeric(t3[[x]])==t1[[x]])) all(TheTest) [1] TRUE _

最終的な考え

大きな数の因数分解についてのDontas氏の最初のコメントは、2 ^ 200を超える数のように、本当に本当に大きな数についてはどうでしょうか。このページでどのアルゴリズムを選択しても、それらのほとんどは Pollard-Rhoアルゴリズムを使用する_gmp::factorize_に依存しているため、すべて非常に長い時間がかかることがわかります。このことから question 、このアルゴリズムは2 ^ 70未満の数に対してのみ妥当です。私は現在、 Quadratic Sieve を実装する独自のfactorizeアルゴリズムに取り組んでいます。これにより、これらのアルゴリズムがすべて次のレベルに引き上げられます。

Joseph Wood · Answer

この質問が最初に行われて以来、R言語では多くの変更がありました。 numbersパッケージのバージョン_0.6-3_には、数値のすべての因数を取得するのに非常に役立つ関数divisorsが含まれていました。ほとんどのユーザーのニーズを満たしますが、生の速度を探している場合、またはより大きな数で作業している場合は、別の方法が必要になります。私は、このような問題を目的とした高度に最適化された関数を含む2つの新しいパッケージを作成しました（この質問に部分的に触発されました。追加する場合があります）。最初のものはRcppAlgosで、もう1つはbigIntegerAlgosです。

`RcppAlgos`

RcppAlgosには、_2^53 - 1_未満の数の約数を取得するための2つの関数が含まれています：divisorsRcpp（多くの数の完全な因数分解をすばやく取得するためのベクトル化された関数）＆divisorsSieve（範囲全体の完全な因数分解）。まず、divisorsRcppを使用して多くの乱数を因数分解します。

_library(gmp) ## for all_divisors by @GeorgeDontas library(RcppAlgos) library(numbers) options(scipen = 999) set.seed(42) testSamp <- sample(10^10, 10) ## vectorized so you can pass the entire vector as an argument testRcpp <- divisorsRcpp(testSamp) testDontas <- lapply(testSamp, all_divisors) identical(lapply(testDontas, as.numeric), testRcpp) [1] TRUE _

そして今、divisorsSieveを使用して、範囲全体で多くの数値を因数分解します。

_system.time(testSieve <- divisorsSieve(10^13, 10^13 + 10^5)) user system elapsed 0.586 0.014 0.602 system.time(testDontasSieve <- lapply((10^13):(10^13 + 10^5), all_divisors)) user system elapsed 54.327 0.187 54.655 identical(lapply(testDontasSieve, asNumeric), testSieve) [1] TRUE _

divisorsRcppとdivisorsSieveはどちらも柔軟で効率的な優れた関数ですが、_2^53 - 1_に制限されています。

`bigIntegerAlgos`

bigIntegerAlgosパッケージは、非常に多数用に設計されたdivisorsBigとquadraticSieveの2つの関数を備えています。それらは Cライブラリgmp に直接リンクします。 divisorsBigの場合、次のようになります。

_library(bigIntegerAlgos) ## testSamp is defined above... N.B. divisorsBig is not quite as ## efficient as divisorsRcpp. This is so because divisorsRcpp ## can take advantage of more efficient data types..... it is ## still blazing fast!! See the benchmarks below for reference. testBig <- divisorsBig(testSamp) identical(testDontas, testBig) [1] TRUE _

そして、これが私の元の投稿で定義されたベンチマークです（N.B. MyFactorsはdivisorsRcppとdivisorsBigに置き換えられています）。

_## Case 2 library(rbenchmark) set.seed(199) samp <- sample(10^9, 10^5) benchmark(RcppAlgos=divisorsRcpp(samp), bigIntegerAlgos=divisorsBig(samp), DontasDivs=lapply(samp, all_divisors), replications=10, columns = c("test", "replications", "elapsed", "relative"), order = "relative") test replications elapsed relative 1 RcppAlgos 10 8.021 1.000 2 bigIntegerAlgos 10 15.246 1.901 3 DontasDivs 10 400.284 49.905 ## Case 3 set.seed(97) samp <- sample(10^6, 10^4) benchmark(RcppAlgos=divisorsRcpp(samp), bigIntegerAlgos=divisorsBig(samp), numbers=lapply(samp, divisors), ## From the numbers package DontasDivs=lapply(samp, all_divisors), CottonDivs=lapply(samp, get_all_factors), ChaseDivs=lapply(samp, FUN), replications=5, columns = c("test", "replications", "elapsed", "relative"), order = "relative") test replications elapsed relative 1 RcppAlgos 5 0.098 1.000 2 bigIntegerAlgos 5 0.330 3.367 3 numbers 5 11.613 118.500 4 DontasDivs 5 16.093 164.214 5 CottonDivs 5 60.428 616.612 6 ChaseDivs 5 342.608 3496.000 _

次のベンチマークは、divisorsBig関数の基礎となるアルゴリズムの真の力を示しています。因数分解される数は_10_の累乗であるため、素因数分解のステップはほとんど完全に無視できます（たとえば、私のマシンではsystem.time(factorize(pow.bigz(10,30)))レジスタ_0_）。したがって、タイミングの違いは、素因数を組み合わせてすべての因数を生成する速度にのみ起因します。

_library(microbenchmark) powTen <- pow.bigz(10,30) microbenchmark(divisorsBig(powTen), all_divisors(powTen), unit = "relative") Unit: relative expr min lq mean median uq max neval divisorsBig(powTen) 1.00000 1.0000 1.00000 1.00000 1.00000 1.00000 100 all_divisors(powTen) 32.35507 33.3054 33.28786 33.31253 32.11571 40.39236 100 ## Negative numbers show an even greater increase in efficiency negPowTen <- powTen * -1 microbenchmark(divisorsBig(negPowTen), all_divisors(negPowTen), unit = "relative") Unit: relative expr min lq mean median uq max neval divisorsBig(negPowTen) 1.00000 1.00000 1.0000 1.00000 1.00000 1.00000 100 all_divisors(negPowTen) 46.42795 46.22408 43.7964 47.93228 45.33406 26.64657 100 _

`quadraticSieve`

quadraticSieveの2つのデモンストレーションを残しておきます。

_n5 <- as.bigz("94968915845307373740134800567566911") system.time(print(quadraticSieve(n5))) Big Integer ('bigz') object of length 2: [1] 216366620575959221 438925910071081891 user system elapsed 4.154 0.021 4.175 ## original time was 3.813167 mins or 228.8 seconds ~ 50x slower n9 <- prod(nextprime(urand.bigz(2, 82, 42))) system.time(print(quadraticSieve(n9))) Big Integer ('bigz') object of length 2: [1] 2128750292720207278230259 4721136619794898059404993 user system elapsed 1010.404 2.715 1013.184 ## original time was 12.9297 hours or 46,547 seconds ~ 46x slower _

ご覧のとおり、quadraticSieveは元のQuadSieveMultiPolysAllよりもはるかに高速ですが、まだ多くの作業が行われていません。 1分以内に_n9_をファクタリングするという現在の目標を使用して、この関数を改善するための進行中の研究があります。 quadraticSieveをベクトル化し、divisorsBigをquadraticSieveと統合する計画もあります。現在のところ、_gmp::factorize_が使用するのと同じアルゴリズムに制限されています。