157
votes

When using ToList(), is there a performance impact that needs to be considered?

I was writing a query to retrieve files from a directory, which is the query:

string[] imageArray = Directory.GetFiles(directory);

However, since I like to work with List<> instead, I decided to put in...

List<string> imageList = Directory.GetFiles(directory).ToList();

So, is there some sort of performance impact that should be considered when deciding to do a conversion like this - or only to be considered when dealing with a large number of files? Is this a negligible conversion?

8
+1 interested to know the answer here too. IMHO unless the app is performance critical, I think I'd always use a List<T> in favour of a T[] if it makes the code more logical/readable/maintainable (unless of course the conversion was causing noticeable performance problems in which case I'd re-visit it I guess).Sepster
Creating a list from an array should be super cheap.leppie
@Sepster I only specify the data type as specifically as I need to do a job. If I don't have to call Add or Remove, I would leave it as IEnumerable<T> (or even better var)p.s.w.g
I think, in this case it's better to call EnumerateFiles instead of GetFiles, so only one array will be created.tukaef
GetFiles(directory), as it is implemented in .NET currently, pretty much does new List<string>(EnumerateFiles(directory)).ToArray(). So GetFiles(directory).ToList() creates a list, creates an array from that, then creates a list again. Like 2kay says, you should be preferring to do EnumerateFiles(directory).ToList() here.Joren

8 Answers

194
votes

IEnumerable.ToList()

Yes, IEnumerable<T>.ToList() does have a performance impact, it is an O(n) operation though it will likely only require attention in performance critical operations.

The ToList() operation will use the List(IEnumerable<T> collection) constructor. This constructor must make a copy of the array (more generally IEnumerable<T>), otherwise future modifications of the original array will change on the source T[] also which wouldn't be desirable generally.

I would like to reiterate this will only make a difference with a huge list, copying chunks of memory is quite a fast operation to perform.

Handy tip, As vs To

You'll notice in LINQ there are several methods that start with As (such as AsEnumerable()) and To (such as ToList()). The methods that start with To require a conversion like above (ie. may impact performance), and the methods that start with As do not and will just require some cast or simple operation.

Additional details on List<T>

Here is a little more detail on how List<T> works in case you're interested :)

A List<T> also uses a construct called a dynamic array which needs to be resized on demand, this resize event copies the contents of an old array to the new array. So it starts off small and increases in size if required.

This is the difference between the Capacity and Count attributes on List<T>. Capacity refers to the size of the array behind the scenes, Count is the number of items in the List<T> which is always <= Capacity. So when an item is added to the list, increasing it past Capacity, the size of the List<T> is doubled and the array is copied.

41
votes

Is there a performance impact when calling toList()?

Yes of course. Theoretically even i++ has a performance impact, it slows the program for maybe a few ticks.

What does .ToList do?

When you invoke .ToList, the code calls Enumerable.ToList() which is an extension method that return new List<TSource>(source). In the corresponding constructor, under the worst circumstance, it goes through the item container and add them one by one into a new container. So its behavior affects little on performance. It's impossible to be a performance bottle neck of your application.

What's wrong with the code in the question

Directory.GetFiles goes through the folder and returns all files' names immediately into memory, it has a potential risk that the string[] costs a lot of memory, slowing down everything.

What should be done then

It depends. If you(as well as your business logic) gurantees that the file amount in the folder is always small, the code is acceptable. But it's still suggested to use a lazy version: Directory.EnumerateFiles in C#4. This is much more like a query, which will not be executed immediately, you can add more query on it like:

Directory.EnumerateFiles(myPath).Any(s => s.Contains("myfile"))

which will stop searching the path as soon as a file whose name contains "myfile" is found. This is obviously has a better performance then .GetFiles.

22
votes

Is there a performance impact when calling toList()?

Yes there is. Using the extension method Enumerable.ToList() will construct a new List<T> object from the IEnumerable<T> source collection which of course has a performance impact.

However, understanding List<T> may help you determine if the performance impact is significant.

List<T> uses an array (T[]) to store the elements of the list. Arrays cannot be extended once they are allocated so List<T> will use an over-sized array to store the elements of the list. When the List<T> grows beyond the size the underlying array a new array has to be allocated and the contents of the old array has to be copied to the new larger array before the list can grow.

When a new List<T> is constructed from an IEnumerable<T> there are two cases:

  1. The source collection implements ICollection<T>: Then ICollection<T>.Count is used to get the exact size of the source collection and a matching backing array is allocated before all elements of the source collection is copied to the backing array using ICollection<T>.CopyTo(). This operation is quite efficient and probably will map to some CPU instruction for copying blocks of memory. However, in terms of performance memory is required for the new array and CPU cycles are required for copying all the elements.

  2. Otherwise the size of the source collection is unknown and the enumerator of IEnumerable<T> is used to add each source element one at a time to the new List<T>. Initially the backing array is empty and an array of size 4 is created. Then when this array is too small the size is doubled so the backing array grows like this 4, 8, 16, 32 etc. Every time the backing array grows it has to be reallocated and all elements stored so far have to be copied. This operation is much more costly compared to the first case where an array of the correct size can be created right away.

    Also, if your source collection contains say 33 elements the list will end up using an array of 64 elements wasting some memory.

In your case the source collection is an array which implements ICollection<T> so the performance impact is not something you should be concerned about unless your source array is very large. Calling ToList() will simply copy the source array and wrap it in a List<T> object. Even the performance of the second case is not something to worry about for small collections.

5
votes

It will be as (in)efficient as doing:

var list = new List<T>(items);

If you disassemble the source code of the constructor that takes an IEnumerable<T>, you will see it will do a few things:

  • Call collection.Count, so if collection is an IEnumerable<T>, it will force the execution. If collection is an array, list, etc. it should be O(1).

  • If collection implements ICollection<T>, it will save the items in an internal array using the ICollection<T>.CopyTo method. It should be O(n), being n the length of the collection.

  • If collection does not implement ICollection<T>, it will iterate through the items of the collection, and will add them to an internal list.

So, yes, it will consume more memory, since it has to create a new list, and in the worst case, it will be O(n), since it will iterate through the collection to make a copy of each element.

5
votes

"is there a performance impact that needs to be considered?"

The issue with your precise scenario is that first and foremost your real concern about performance would be from the hard-drive speed and efficiency of the drive's cache.

From that perspective, the impact is surely negligible to the point that NO it need not be considered.

BUT ONLY if you really need the features of the List<> structure to possibly either make you more productive, or your algorithm more friendly, or some other advantage. Otherwise, you're just purposely adding an insignificant performance hit, for no reason at all. In which case, naturally, you shouldn’t do it! :)

4
votes

ToList() creates a new List and put the elements in it which means that there is an associated cost with doing ToList(). In case of small collection it won't be very noticeable cost but having a huge collection can cause a performance hit in case of using ToList.

Generally you should not use ToList() unless work you are doing cannot be done without converting collection to List. For example if you just want to iterate through the collection you don't need to perform ToList

If you are performing queries against a data source for example a Database using LINQ to SQL then the cost of doing ToList is much more because when you use ToList with LINQ to SQL instead of doing Delayed Execution i.e. load items when needed (which can be beneficial in many scenarios) it instantly loads items from Database into memory

3
votes

Considering the performance of retrieving file list, ToList() is negligible. But not really for other scenarios. That really depends on where you are using it.

  • When calling on an array, list, or other collection, you create a copy of the collection as a List<T>. The performance here depends on the size of the list. You should do it when really necessary.

    In your example, you call it on an array. It iterates over the array and adds the items one by one to a newly created list. So the performance impact depends on the number of files.

  • When calling on an IEnumerable<T>, you materialize the IEnumerable<T> (usually a query).

2
votes

ToList Will create a new list and copy elements from original source to the newly created list so only thing is to copy the elements from the original source and depends on the source size