Discussion:
[Haskell-cafe] Roman to Decimal Algorithms
Andrew Savige
2009-06-07 09:59:34 UTC
Permalink
I recently played in a code golf Roman to Decimal challenge (in Perl,
Python, Ruby and PHP). In playing this game, I found some interesting
short algorithms for converting from Roman Numerals to Decimal. Because
I think some of these are new, I'd like to present them here, in case
they are not really new after all, or in case there are problems with
these algorithms that I've overlooked.

I'd like to eventually write up a "rosetta code" article comparing the
implementation of these algorithms in various languages, including
Haskell. Since I'm only a very occasional Haskell programmer, I thought
it best to get feedback from Haskell experts before inflicting any of
my Haskell code on a wider audience. Hence this post.

To keep this post reasonably brief, note that these algorithms are for
"modern" Roman Numerals only, limited to the range 1-3999, and with no
error checking. That is, the code below assumes that the input is always
a well formed Roman Numeral. I have tested each algorithm against every
modern Roman Numeral in the range 1-3999.

As a starting point, note this "HaskellWiki" function:

romanToInt :: String -> Int
romanToInt = fst
              . foldr (\p (t,s) -> if p >= s then (t+p,p) else (t-p,p)) (0,0)
              . map (fromJust . flip lookup (zip "IVXLCDM" [1,5,10,50,100,500,1000]))

taken from http://haskell.cs.yale.edu/haskellwiki/Roman_numerals.
I'm going to essentially duplicate the functionality of this code.

{-# OPTIONS_GHC -fglasgow-exts -Wall #-}
import Data.Char (toUpper)

rtoa :: Char -> Int
rtoa 'M' = 1000
rtoa 'D' =  500
rtoa 'C' =  100
rtoa 'L' =   50
rtoa 'X' =   10
rtoa 'V' =    5
rtoa 'I' =    1
rtoa r   = error $ "Invalid rtoa char:" ++ show r

urtoa :: Char -> Int
urtoa = rtoa . toUpper

romanToInt :: String -> Int
romanToInt = foldl1 (\t n -> t+n-t`mod`n*2) . map urtoa

The essential difference between this solution and the HaskellWiki
one is the use of the "running total" for state rather than the
"previous value". I see this as an improvement mainly because it
is shorter -- though you might argue that it is less clear.

An alternative way to express the romanToInt function is:

romanToInt = foldl (\t c -> t+(urtoa c)-t`mod`(urtoa c)*2) 0

I'm open to persuasion as to which is better Haskell style.

Since I'm not playing golf anymore, the rtoa function above, though
hardly short, seemed to me to be the simplest and clearest way to
express converting a single Roman Numeral to its corresponding
arabic number. Again, suggestions for the "best"/most efficient
way to do this in Haskell are most welcome.

A common, and often winning technique, in golf is to perform these
sorts of conversions by concocting a "magic formula". For fun, I
rewrote rtoa using a magic formula I used in the golf game:

rtoa c = 10^(205558`mod`(ord c)`mod`7)`mod`9995

I'm not suggesting that magic formulae are useful outside of golf and
this second rtoa function, though shorter, is much less clear. I might
add that this particular magic formula appears to be less useful in
Haskell golf than the other languages because `mod` is five times
longer than the % operator of the other languages. :)

By way of explanation, notice that this formula:

 205558`mod`(ord c)`mod`7

maps I->0, X->1, C->2, M->3, V->4, L->5, D->6 as shown below:

 Roman   m      10^m   10^m`mod`9995
 -----   -      -----  -------------
   M     3       1000      1000
   D     6    1000000       500
   C     2        100       100
   L     5     100000        50
   X     1         10        10
   V     4      10000         5
   I     0          1         1

Noticing this, you can replace the 205558`mod`(ord c)`mod`7 magic
formula with a function that returns a string index (index() in
Perl and Python). I am sometimes overwhelmed by the quantity and
richness of all the functions in the GHC Haskell libraries.
I eventually found a Haskell solution that seemed to work:

{-# OPTIONS_GHC -fglasgow-exts -XOverloadedStrings -Wall #-}
import Data.ByteString.Char8 (elemIndex)
import Data.Maybe (fromJust)

rtoa c = 10^(fromJust (elemIndex c "IXCMVLD"))`mod`9995

I got this to work by trial and error and have no clue what this
"-XOverloadedStrings" and "Data.ByteString.Char8" business really
means. If there is a better Haskell way of finding the numeric
index of a particular character in a string, please let me know.

Alternatively, you could write the rtoa function using an approach
taken from the original HaskellWiki solution:

rtoa = fromJust . flip lookup (zip "IVXLCDM" [1,5,10,50,100,500,1000])

What is your recommendation as to the "best"/most efficient way
of writing the rtoa function in Haskell?

Finally, here is an example complete test program.
Suggestions for improving the style of this code are welcome.

{-# OPTIONS_GHC -fglasgow-exts -Wall #-}
import Data.Char (toUpper)
import Data.List (concat, intersperse)

rtoa :: Char -> Int
rtoa 'M' = 1000
rtoa 'D' =  500
rtoa 'C' =  100
rtoa 'L' =   50
rtoa 'X' =   10
rtoa 'V' =    5
rtoa 'I' =    1
rtoa r   = error $ "Invalid rtoa char:" ++ show r

urtoa :: Char -> Int
urtoa = rtoa . toUpper

-- one is derived from http://haskell.cs.yale.edu/haskellwiki/Roman_numerals
one :: String -> Int
one = fst . foldr (\p (t,s) -> if p >= s then (t+p,p) else (t-p,p)) (0,0) . map urtoa

two :: String -> Int
two = foldl1 (\t n -> t+n-t`mod`n*2) . map urtoa

three :: String -> Int
three = foldl (\t c -> t+(urtoa c)-t`mod`(urtoa c)*2) 0

myshow :: String -> (String -> Int) -> String -> String
myshow capt fn val = capt ++ val ++ " " ++ (show (fn val))

testdata :: [String]
testdata =  [ "I", "xiv", "DXVI", "CMLXIII" ]

main :: IO ()
main = do
  putStrLn $ (concat $ intersperse "\n" (map (\c -> (myshow "one  : " one  c))  testdata))
  putStrLn $ (concat $ intersperse "\n" (map (\c -> (myshow "two  : " two c))   testdata))
  putStrLn $ (concat $ intersperse "\n" (map (\c -> (myshow "three: " three c)) testdata))

Cheers,
/-\



Need a Holiday? Win a $10,000 Holiday of your choice. Enter now.http://us.lrd.yahoo.com/_ylc=X3oDMTJxN2x2ZmNpBF9zAzIwMjM2MTY2MTMEdG1fZG1lY2gDVGV4dCBMaW5rBHRtX2xuawNVMTEwMzk3NwR0bV9uZXQDWWFob28hBHRtX3BvcwN0YWdsaW5lBHRtX3BwdHkDYXVueg--/SIG=14600t3ni/**http%3A//au.rd.yahoo.com/mail/tagline/creativeholidays/*http%3A//au.docs.yahoo.com/homepageset/%3Fp1=other%26p2=au%26p3=mailtagline
Tillmann Rendel
2009-06-07 16:47:31 UTC
Permalink
Hi Andrew,
Post by Andrew Savige
Noticing this, you can replace the 205558`mod`(ord c)`mod`7 magic
formula with a function that returns a string index (index() in
Perl and Python). I am sometimes overwhelmed by the quantity and
richness of all the functions in the GHC Haskell libraries.
Have you tried hoogle?

http://haskell.org/hoogle/?hoogle=String+-%3E+Maybe+Int

This suggests Data.List.elemIndex.

Tillmann
Andrew Savige
2009-06-07 22:34:50 UTC
Permalink
Post by Tillmann Rendel
Have you tried hoogle?
  http://haskell.org/hoogle/?hoogle=String+-%3E+Maybe+Int
This suggests Data.List.elemIndex.
Hadn't heard of hoogle before. Will use it from now on though. :)
Hoogle's suggestion of Data.List.elemIndex works like a charm.

Thanks,
/-\



Need a Holiday? Win a $10,000 Holiday of your choice. Enter now.http://us.lrd.yahoo.com/_ylc=X3oDMTJxN2x2ZmNpBF9zAzIwMjM2MTY2MTMEdG1fZG1lY2gDVGV4dCBMaW5rBHRtX2xuawNVMTEwMzk3NwR0bV9uZXQDWWFob28hBHRtX3BvcwN0YWdsaW5lBHRtX3BwdHkDYXVueg--/SIG=14600t3ni/**http%3A//au.rd.yahoo.com/mail/tagline/creativeholidays/*http%3A//au.docs.yahoo.com/homepageset/%3Fp1=other%26p2=au%26p3=mailtagline
Martijn van Steenbergen
2009-06-07 18:25:15 UTC
Permalink
Hi Andrew,

I haven't read your whole message but if shortness is your goal, here
Post by Andrew Savige
rtoa c = 10^(205558`mod`(ord c)`mod`7)`mod`9995
I'm not suggesting that magic formulae are useful outside of golf and
this second rtoa function, though shorter, is much less clear. I might
add that this particular magic formula appears to be less useful in
Haskell golf than the other languages because `mod` is five times
longer than the % operator of the other languages. :)
Why not rename mod like so?

(%)=mod
rtoa c=10^(205558%ord c%7)%9995
Post by Andrew Savige
Alternatively, you could write the rtoa function using an approach
rtoa = fromJust . flip lookup (zip "IVXLCDM" [1,5,10,50,100,500,1000])
You can write this as:

Just rtoa=flip lookup (zip "IVXLCDM" [1,5,10,50,100,500,1000])

Which is another few characters shorter. :-)

HTH,

Martijn.
Andrew Savige
2009-06-07 22:47:23 UTC
Permalink
Post by Martijn van Steenbergen
Why not rename mod like so?
(%)=mod
rtoa c=10^(205558%ord c%7)%9995
Thanks Martijn. My first Haskell golfing tip. :)

I found I had to write it with an extra set of parens to get it to work:

rtoa c=(10^(205558%ord c)%7)%9995

Though it wasn't my original intention to play Haskell golf, assuming
upper case Roman Numerals, we can now write the whole thing as:

(%)=mod
romanToInt=foldl1(\t n->t+n-t%n*2).map(\c->(10^(205558%ord c)%7)%9995)

Can anyone shorten that further?

Thanks,
/-\



Need a Holiday? Win a $10,000 Holiday of your choice. Enter now.http://us.lrd.yahoo.com/_ylc=X3oDMTJxN2x2ZmNpBF9zAzIwMjM2MTY2MTMEdG1fZG1lY2gDVGV4dCBMaW5rBHRtX2xuawNVMTEwMzk3NwR0bV9uZXQDWWFob28hBHRtX3BvcwN0YWdsaW5lBHRtX3BwdHkDYXVueg--/SIG=14600t3ni/**http%3A//au.rd.yahoo.com/mail/tagline/creativeholidays/*http%3A//au.docs.yahoo.com/homepageset/%3Fp1=other%26p2=au%26p3=mailtagline
Loading...