I wrote a Haskell function that groups words by anagrams. I'm trying to learn OCaml, but I'm a little confused as to how use pattern matching in OCaml. Could someone help translate this to OCaml for me? Thank you!
This function takes a list of strings, and partitions it into a list of string lists, grouped by anagrams.
import Data.List
groupByAnagrams :: [String] -> [[String]]
groupByAnagrams [] = []
groupByAnagrams (x:xs) = let (listOfAnagrams, listOfNonAnagrams) = (partitionByAnagrams (sort x) xs)
in
(x:listOfAnagrams):(groupByAnagrams listOfNonAnagrams)
This helper function takes a sorted string sortedStr
, and a list of strings (the reason the string is sorted is so that I don't have to call sort on it every iteration). The string list is partitioned into two lists; one consisting of the strings that are anagrams to sortedStr
, the other consisting of the strings that are not. The function returns the tuple that consists of these two lists.
partitionByAnagrams :: String -> [String] -> ([String], [String])
partitionByAnagrams sortedStr [] = ([], [])
partitionByAnagrams sortedStr (x:xs)
| (sortedStr == (sort x)) = let (listOfAnagrams, listOfNonAnagrams) = (partitionByAnagrams sortedStr xs)
in
(x:listOfAnagrams, listOfNonAnagrams)
| otherwise = let (listOfAnagrams, listOfNonAnagrams) = (partitionByAnagrams sortedStr xs)
in
(listOfAnagrams, x:listOfNonAnagrams)
This is just a test case:
test1 = mapM_ print (groupByAnagrams ["opts", "alerting", "arrest", "bares", "drapes", "drawer", "emits", "least", "mate", "mates", "merit", "notes", "palest", "parses", "pores", "pots", "altering", "rarest", "baser", "parsed", "redraw", "items", "slate", "meat", "meats", "miter", "onset", "pastel", "passer", "poser", "spot", "integral", "raster", "bears", "rasped", "reward", "mites", "stale", "meta", "steam", "mitre", "steno", "petals", "spares", "prose", "stop", "relating", "raters", "braes", "spared", "warder", "smite", "steal", "tame", "tames", "remit", "stone", "plates", "sparse", "ropes", "tops", "triangle", "starer", "saber", "spread", "warred", "times", "tales", "team", "teams", "timer", "tones", "staple", "spears", "spore"])
**EDIT!!! This is a rewritten version of my function. Thanks to jrouquie for pointing out the inefficiency! **EDITED AGAIN ON 10/7 - used pattern matching on tuples for clarity, no need for all those fsts and snds.
groupByAnagrams2 :: [String] -> [[String]]
groupByAnagrams2 str = groupBySnd $ map (\s -> (s, (sort s))) str
groupBySnd :: [(String, String)] -> [[String]]
groupBySnd [] = []
groupBySnd ((s1,s2):xs) = let (listOfAnagrams, listOfNonAnagramPairs) = (partitionBySnd s2 xs)
in
(s1:listOfAnagrams):(groupBySnd listOfNonAnagramPairs)
partitionBySnd :: String -> [(String, String)] -> ([String], [(String, String)])
partitionBySnd sortedStr [] = ([], [])
partitionBySnd sortedStr ((s, sSorted):ss)
| (sortedStr == sSorted) = let (listOfAnagrams, listOfNonAnagramPairs) = (partitionBySnd sortedStr ss)
in
(s:listOfAnagrams, listOfNonAnagramPairs)
| otherwise = let (listOfAnagrams, listOfNonAnagramPairs) = (partitionBySnd sortedStr ss)
in
(listOfAnagrams, (s, sSorted):listOfNonAnagramPairs)
sortedStr
is indeed sorted only once, each element of the remainig listxs
is sorted once for each call topartitionByAnagrams
. You might want to sort (a copy of) the whole list once and for all. (This uses more memory.) – jrouquie["foo", "bar", "qux", "egg"]
, your code callspartitionByAnagrams (sort x) xs
4 times, withxs
taking resp. values["bar", "qux", "egg"]
,["qux", "egg"]
,["egg"]
,[]
. SincepartitionByAnagrams x xs
sorts each element ofxs
once,"egg"
is sorted 3 times. More precisely, this code callssort
Θ(n²) times, while it could call it only O(n) times. (edit : yes, you're right.) – jrouquie