I wrote a Haskell function that groups words by anagrams. I'm trying to learn OCaml, but I'm a little confused as to how use pattern matching in OCaml. Could someone help translate this to OCaml for me? Thank you!
This function takes a list of strings, and partitions it into a list of string lists, grouped by anagrams.
import Data.List
groupByAnagrams :: [String] -> [[String]]
groupByAnagrams [] = []
groupByAnagrams (x:xs) = let (listOfAnagrams, listOfNonAnagrams) = (partitionByAnagrams (sort x) xs)
in
(x:listOfAnagrams):(groupByAnagrams listOfNonAnagrams)
This helper function takes a sorted string sortedStr, and a list of strings (the reason the string is sorted is so that I don't have to call sort on it every iteration). The string list is partitioned into two lists; one consisting of the strings that are anagrams to sortedStr, the other consisting of the strings that are not. The function returns the tuple that consists of these two lists.
partitionByAnagrams :: String -> [String] -> ([String], [String])
partitionByAnagrams sortedStr [] = ([], [])
partitionByAnagrams sortedStr (x:xs)
| (sortedStr == (sort x)) = let (listOfAnagrams, listOfNonAnagrams) = (partitionByAnagrams sortedStr xs)
in
(x:listOfAnagrams, listOfNonAnagrams)
| otherwise = let (listOfAnagrams, listOfNonAnagrams) = (partitionByAnagrams sortedStr xs)
in
(listOfAnagrams, x:listOfNonAnagrams)
This is just a test case:
test1 = mapM_ print (groupByAnagrams ["opts", "alerting", "arrest", "bares", "drapes", "drawer", "emits", "least", "mate", "mates", "merit", "notes", "palest", "parses", "pores", "pots", "altering", "rarest", "baser", "parsed", "redraw", "items", "slate", "meat", "meats", "miter", "onset", "pastel", "passer", "poser", "spot", "integral", "raster", "bears", "rasped", "reward", "mites", "stale", "meta", "steam", "mitre", "steno", "petals", "spares", "prose", "stop", "relating", "raters", "braes", "spared", "warder", "smite", "steal", "tame", "tames", "remit", "stone", "plates", "sparse", "ropes", "tops", "triangle", "starer", "saber", "spread", "warred", "times", "tales", "team", "teams", "timer", "tones", "staple", "spears", "spore"])
**EDIT!!! This is a rewritten version of my function. Thanks to jrouquie for pointing out the inefficiency! **EDITED AGAIN ON 10/7 - used pattern matching on tuples for clarity, no need for all those fsts and snds.
groupByAnagrams2 :: [String] -> [[String]]
groupByAnagrams2 str = groupBySnd $ map (\s -> (s, (sort s))) str
groupBySnd :: [(String, String)] -> [[String]]
groupBySnd [] = []
groupBySnd ((s1,s2):xs) = let (listOfAnagrams, listOfNonAnagramPairs) = (partitionBySnd s2 xs)
in
(s1:listOfAnagrams):(groupBySnd listOfNonAnagramPairs)
partitionBySnd :: String -> [(String, String)] -> ([String], [(String, String)])
partitionBySnd sortedStr [] = ([], [])
partitionBySnd sortedStr ((s, sSorted):ss)
| (sortedStr == sSorted) = let (listOfAnagrams, listOfNonAnagramPairs) = (partitionBySnd sortedStr ss)
in
(s:listOfAnagrams, listOfNonAnagramPairs)
| otherwise = let (listOfAnagrams, listOfNonAnagramPairs) = (partitionBySnd sortedStr ss)
in
(listOfAnagrams, (s, sSorted):listOfNonAnagramPairs)
sortedStris indeed sorted only once, each element of the remainig listxsis sorted once for each call topartitionByAnagrams. You might want to sort (a copy of) the whole list once and for all. (This uses more memory.) - jrouquie["foo", "bar", "qux", "egg"], your code callspartitionByAnagrams (sort x) xs4 times, withxstaking resp. values["bar", "qux", "egg"],["qux", "egg"],["egg"],[]. SincepartitionByAnagrams x xssorts each element ofxsonce,"egg"is sorted 3 times. More precisely, this code callssortΘ(n²) times, while it could call it only O(n) times. (edit : yes, you're right.) - jrouquie