Richard Jonas's


Fast file for writing terms

Recently I needed to write a massive amount of sensor data into a database and I quickly ran into its limitations. After some analysis I found that the data can be written independently based on the source from where the data are coming. So the solution can be that writing the sensor data into files belonging to the individual sources.

There were no real problems of that solution until I needed to implement a Bitcask-like merge operation. During that operation we open a data file for reading, create a new file for writing and we read all records from the first file, check if some retention condition can be hold, and write the record in the new file if we need to keep that record. It requires a massive amount of write of small data (around 1 KB). The speed of the copy wasn't very convincing, to be gentle.

Erlang file types

In Erlang there are two types of file what we can use. The first is (non-raw) file which spawns a process for the file, so every file operation is a message passing to that process which reacts to the message and reads, writes data. One can feel that it can work with larger binaries but it won't perform brilliantly if the binaries are small. The other is a raw file when there is no controlling process spawned so only an erlang port what we have wrapped. Then according to some fprof profiling the big amount of computing time is spent during port communication.

Fast file

That is what drove me to implement a fast_file based on Joe Armstrong's idea. Instead of writing the data into somewhere which requires cross-context call (kernel call or port command), let us collect the data into a buffer and when the buffer grows big enough just flush the buffer.

Fast file module defines a record which holds a buffer for reading and writing. Yes, one buffer. If we are writing data we are using that buffer as a write buffer. If we want to read, the buffer is synced and we can use it as a read buffer. So fast file remembers the last operation, too.


I wrote small and bigger chunks of binaries into normal Erlang file, raw file and fast file. I ran the tests on my laptop (Core i5 2.4GHz, 6GB ram, 640GB HDD 5400rpm ext4).

TestNormal file   Raw file   Fast file
100 big280ms15ms24ms
1000 big2 336ms123ms222ms
10000 small     338ms153ms7ms
100000 small    2 366ms1 604ms79ms
200000 small    4 854ms3 088ms163ms

In case of one million writes only fast file didn't run into timeout (763ms). We can see that buffering is still a good use case.

How dangerous to buffer data?

I can see questions like what if the process, Erlang VM or OS crashes? Since fast_file creates an ever changing record we need to update our fast file record whenever a read or write happens. The usage of normal file is much more comfortable, we have an {ok, file:io_device()} and reads and writes leave the io device (the port in most cases) unchanged.

If process crashes we lost some data what haven't written yet. The good news is that we don't cross record boundaries during writing, so we don't need to repair the file when we open after a crash. In case of Erlang VM crash, the story is the same. In case of OS crash, it depends on how OS handles the file buffer. Linux knows a commit=nrsecs option during mounting a device. It means that in every nrsecs seconds Linux will sync all data to the device. If the crash happens between two commits there is a change of data loss.

Till I find a good place for my implementation you can check Joe's elib1_fast_write.erl.
Next PostNewer Post Previous PostOlder Post Home


  1. Quite a useful post, I learned some new points here. Thanks admin please keep posting updates regularly to enlighten our knowledge.
    JAVA Training in Chennai

  2. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

  3. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here. digital marketing jobs career opportunities in abroad
    Advance Digital Marketing Training in chennai– 100% Job Guarantee

  4. Those guidelines additionally worked to become a good way to recognize that other people online have the identical fervor like mine to grasp great deal more around this condition.

    Hadoop training in bangalore

  5. I have been searching for quite some time for information on this topic and no doubt your website saved my time and I got my desired information. Your post has been very helpful. Thanks.

    Selenium training in Chennai
    Selenium Courses in Chennai
    best ios training in chennai
    Digital Marketing Training in Chennai
    Salesforce Course
    Salesforce Developer Training
    Salesforce Course in Tambaram

  6. Want to have fun and get paid for it? we have bonus slot machines .Come to us and have fun getting money for it.

  7. Thanks Admin For sharing this massive info with us. it seems you have put more effort to write this blog , I gained more knowledge from your blog. Keep Doing..
    RPA Training in Chennai
    RPA Classes in Chennai
    CCNA Training in Chennai
    DevOps Training in Chennai
    SEO Training in Chennai
    RPA Training in OMR
    RPA Training in Velachery
    RPA Training in Tambaram

  8. Your info is really amazing with impressive content..Excellent blog with informative concept. Really I feel happy to see this useful blog, Thanks for sharing such a nice blog..
    If you are looking for any Data science Related information please visit our website Data science courses in Pune page!