web-dev-qa-db-ja.com

テキストファイルに巨大なデータを書き込む最速の方法Java

Text [csv]ファイルに巨大なデータを書き込む必要があります。 BufferedWriterを使用してデータを書き込むと、174 mbのデータを書き込むのに約40秒かかりました。これは最速ですかJavaは提供できますか?

bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) );

注:これらの40秒には、結果セットからレコードを繰り返し取得する時間も含まれます。 :)。 174 mbは、結果セットの400000行に対応します。

64
Rakesh Juyal

BufferedWriterを削除して、FileWriterを直接使用してみてください。最新のシステムでは、とにかくドライブのキャッシュメモリに書き込むだけの可能性が高くなります。

175MB(400万文字列)を書き込むには4-5秒の範囲でかかります-これはWindowsを実行するデュアルコア2.4GHz DellでXP 80GB、7200-RPM日立ディスク。

レコードを取得する時間と、ファイルを書き込む時間を分離できますか?

import Java.io.BufferedWriter;
import Java.io.File;
import Java.io.FileWriter;
import Java.io.IOException;
import Java.io.Writer;
import Java.util.ArrayList;
import Java.util.List;

public class FileWritingPerfTest {


private static final int ITERATIONS = 5;
private static final double MEG = (Math.pow(1024, 2));
private static final int RECORD_COUNT = 4000000;
private static final String RECORD = "Help I am trapped in a fortune cookie factory\n";
private static final int RECSIZE = RECORD.getBytes().length;

public static void main(String[] args) throws Exception {
    List<String> records = new ArrayList<String>(RECORD_COUNT);
    int size = 0;
    for (int i = 0; i < RECORD_COUNT; i++) {
        records.add(RECORD);
        size += RECSIZE;
    }
    System.out.println(records.size() + " 'records'");
    System.out.println(size / MEG + " MB");

    for (int i = 0; i < ITERATIONS; i++) {
        System.out.println("\nIteration " + i);

        writeRaw(records);
        writeBuffered(records, 8192);
        writeBuffered(records, (int) MEG);
        writeBuffered(records, 4 * (int) MEG);
    }
}

private static void writeRaw(List<String> records) throws IOException {
    File file = File.createTempFile("foo", ".txt");
    try {
        FileWriter writer = new FileWriter(file);
        System.out.print("Writing raw... ");
        write(records, writer);
    } finally {
        // comment this out if you want to inspect the files afterward
        file.delete();
    }
}

private static void writeBuffered(List<String> records, int bufSize) throws IOException {
    File file = File.createTempFile("foo", ".txt");
    try {
        FileWriter writer = new FileWriter(file);
        BufferedWriter bufferedWriter = new BufferedWriter(writer, bufSize);

        System.out.print("Writing buffered (buffer size: " + bufSize + ")... ");
        write(records, bufferedWriter);
    } finally {
        // comment this out if you want to inspect the files afterward
        file.delete();
    }
}

private static void write(List<String> records, Writer writer) throws IOException {
    long start = System.currentTimeMillis();
    for (String record: records) {
        writer.write(record);
    }
    writer.flush();
    writer.close();
    long end = System.currentTimeMillis();
    System.out.println((end - start) / 1000f + " seconds");
}
}
96
David Moles

メモリマップファイルを試してください(私のm/c、コア2デュオ、2.5GB RAMに174MBを書き込むのに300 m/sかかります):

byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes();
int number_of_lines = 400000;

FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines);
for (int i = 0; i < number_of_lines; i++)
{
    wrBuf.put(buffer);
}
rwChannel.close();
36
Deepak Agarwal

統計のためだけに:

マシンは新しいSSDを搭載した古いDellです

CPU:Intel Pentium D 2,8 Ghz

SSD:Patriot Inferno 120GB SSD

4000000 'records'
175.47607421875 MB

Iteration 0
Writing raw... 3.547 seconds
Writing buffered (buffer size: 8192)... 2.625 seconds
Writing buffered (buffer size: 1048576)... 2.203 seconds
Writing buffered (buffer size: 4194304)... 2.312 seconds

Iteration 1
Writing raw... 2.922 seconds
Writing buffered (buffer size: 8192)... 2.406 seconds
Writing buffered (buffer size: 1048576)... 2.015 seconds
Writing buffered (buffer size: 4194304)... 2.282 seconds

Iteration 2
Writing raw... 2.828 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.078 seconds
Writing buffered (buffer size: 4194304)... 2.015 seconds

Iteration 3
Writing raw... 3.187 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.094 seconds
Writing buffered (buffer size: 4194304)... 2.031 seconds

Iteration 4
Writing raw... 3.093 seconds
Writing buffered (buffer size: 8192)... 2.141 seconds
Writing buffered (buffer size: 1048576)... 2.063 seconds
Writing buffered (buffer size: 4194304)... 2.016 seconds

ご覧のとおり、rawメソッドはバッファリングが遅くなります。

転送速度はJavaによって制限されない可能性があります。代わりに、私は疑いがある(順不同)

  1. データベースからの転送速度
  2. ディスクへの転送速度

データセット全体を読み取ってからディスクに書き込むと、JVMがメモリを割り当てる必要があり、db rea/diskの書き込みが順次行われるため、時間がかかります。代わりに、dbから行うすべての読み取りに対してバッファーライターに書き込むため、操作は同時実行に近くなります(それを実行しているかどうかはわかりません)

5
Brian Agnew

DBからのこれらのバルク読み取りでは、ステートメントのフェッチサイズを調整することができます。 DBへの往復を大幅に節約できます。

http://download.Oracle.com/javase/1.5.0/docs/api/Java/sql/Statement.html#setFetchSize%28int%29

4
gpeche
package all.is.well;
import Java.io.IOException;
import Java.io.RandomAccessFile;
import Java.util.concurrent.ExecutorService;
import Java.util.concurrent.Executors;
import junit.framework.TestCase;

/**
 * @author Naresh Bhabat
 * 
Following  implementation helps to deal with extra large files in Java.
This program is tested for dealing with 2GB input file.
There are some points where extra logic can be added in future.


Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object.



It uses random access file,which is almost like streaming API.


 * ****************************************
Notes regarding executor framework and its readings.
Please note :ExecutorService executor = Executors.newFixedThreadPool(10);

 *         for 10 threads:Total time required for reading and writing the text in
 *         :seconds 349.317
 * 
 *         For 100:Total time required for reading the text and writing   : seconds 464.042
 * 
 *         For 1000 : Total time required for reading and writing text :466.538 
 *         For 10000  Total time required for reading and writing in seconds 479.701
 *
 * 
 */
public class DealWithHugeRecordsinFile extends TestCase {

        static final String FILEPATH = "C:\\springbatch\\bigfile1.txt.txt";
        static final String FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt";
        static volatile RandomAccessFile fileToWrite;
        static volatile RandomAccessFile file;
        static volatile String fileContentsIter;
        static volatile int position = 0;

        public static void main(String[] args) throws IOException, InterruptedException {
                long currentTimeMillis = System.currentTimeMillis();

                try {
                        fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles 
                        file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles 
                        seriouslyReadProcessAndWriteAsynch();

                } catch (IOException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                }
                Thread currentThread = Thread.currentThread();
                System.out.println(currentThread.getName());
                long currentTimeMillis2 = System.currentTimeMillis();
                double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0;
                System.out.println("Total time required for reading the text in seconds " + time_seconds);

        }

        /**
         * @throws IOException
         * Something  asynchronously serious
         */
        public static void seriouslyReadProcessAndWriteAsynch() throws IOException {
                ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class
                while (true) {
                        String readLine = file.readLine();
                        if (readLine == null) {
                                break;
                        }
                        Runnable genuineWorker = new Runnable() {
                                @Override
                                public void run() {
                                        // do hard processing here in this thread,i have consumed
                                        // some time and eat some exception in write method.
                                        writeToFile(FILEPATH_WRITE, readLine);
                                        // System.out.println(" :" +
                                        // Thread.currentThread().getName());

                                }
                        };
                        executor.execute(genuineWorker);
                }
                executor.shutdown();
                while (!executor.isTerminated()) {
                }
                System.out.println("Finished all threads");
                file.close();
                fileToWrite.close();
        }

        /**
         * @param filePath
         * @param data
         * @param position
         */
        private static void writeToFile(String filePath, String data) {
                try {
                        // fileToWrite.seek(position);
                        data = "\n" + data;
                        if (!data.contains("Randomization")) {
                                return;
                        }
                        System.out.println("Let us do something time consuming to make this thread busy"+(position++) + "   :" + data);
                        System.out.println("Lets consume through this loop");
                        int i=1000;
                        while(i>0){
                        
                                i--;
                        }
                        fileToWrite.write(data.getBytes());
                        throw new Exception();
                } catch (Exception exception) {
                        System.out.println("exception was thrown but still we are able to proceeed further"
                                        + " \n This can be used for marking failure of the records");
                        //exception.printStackTrace();

                }

        }
}
3
RAM