Brain Teaser

Tired of 4th grade arithmetic’s puzzles disguised as advances for mankind?

Discover something worthy of your time where failure is a possibility and not a shameful one. The exercise is below for you to fill in. Submit your answers & after 10 days we will publish the solution.



Sequential data files are just collections of fixed sized records, but they are just sequences of bytes unless they are read by a program with the appropriate information regarding the record size, and how it divides into individual fields.
But in absence of such a program, how about guessing a record size from the data itself?

The rules

• Assuming they contain some mix of textual and numeric data, or at least, are not entirely random, have fixed structure and are not compressed: how can one guess a plausible record size when all one gets from the file is up to the first megabyte, based on the regularity of the data alone?
• One does not even get the total file size – which could have been used to extract divisors. And the technique should work whether the data is ASCII, EBCDIC, or any other encoding, whether the numeric data is binary, packed decimal or other.

The challenge

You can either provide:

• a description of how you would do it
• or a prototype implementation in a language of your choice.

The technique should be entirely automatic and not rely on human visualization or validation. It should be applicable to thousands of data files, in batch mode, and a plausible record size should be returned for each of these without requiring user interaction.

And if this is too simple for your taste, you can even go for the next level of this challenge, and beyond the record size, recognize fields, textual, numeric, or even dates and timestamps.

Good luck!

[qsm quiz=1]