Often I see developers debating using String vs. string as if it’s a simple style decision. No different than discussing the position of braces, tabs vs. spaces, etc … A meaningless distinction where there is no right answer, just finding a decision everyone can agree on. The debate between String and string though is not a simple style debate, instead it has the potential to radically change the semantics of a program.

The keyword string has concrete meaning in C#. It is the type System.String which exists in the core runtime assembly. The runtime intrinsictly understands this type and provides the capabilities developers expect for strings in .NET. Its presence is so critical to C# that if that type doesn’t exist the compiler will exit before attempting to even parse a line of code. Hence string has a precise, unambiguous meaning in C# code.

The identifier String though has no concrete meaning in C#. It is an identifier that goes through all the name lookup rules as Widget, Student, etc … It could bind to string or it could bind to a type in another assembly entirely whose purposes may be entirely differnt than string. Worse it could be defined in a way such that code like String s = "hello"; continued to compile.

class TricksterString { 
  void Example() {
    String s = "Hello World"; // Okay but probably not what you expect.
  }
}

class String {
  public static implicit operator String(string s) => null;
}

The actual meaning of String will always depend on name resolution. That means it depends on all the source files in the project and all the types defined in all the referenced assemblies. In short it requires quite a bit of context to know what it means.

True that in the vast majority of cases String and string will bind to the same type. But using String still means developers are leaving their program up to interpretation in places where there is only one correct answer. When String does bind to the wrong type it can leave developers debugging for hours, filing bugs on the compiler team and generally wasting time that could’ve been saved by using string.

Another way to visualize the difference is with this sample:

string s1 = 42; // Errors 100% of the time 
String s2 = 42; // Might error, might not, depends on the code

Many will argue that while this is information technically accurate using String is still fine because it’s exceedingly rare that a code base would define a type of this name. Or that when String is defined it’s a sign of a bad code base.

The reality though is quite different. Defining String happens with some regularity as is demonstrated by the following BigQuery:

SELECT  
  sample_path, sample_repo_name
FROM `fh-bigquery.github_extracts.contents_net_cs`
WHERE 
  NOT STRPOS(sample_repo_name, 'coreclr') > 0
  AND NOT STRPOS(sample_repo_name, 'corefx') > 0
  AND NOT STRPOS(sample_repo_name, 'roslyn') > 0
  AND NOT STRPOS(sample_repo_name, 'corert') > 0
  AND NOT STRPOS(sample_repo_name, 'mono') > 0
  AND STRPOS(content, 'class String ') > 0
LIMIT 100

Looking through these results you’ll see that String is defined for a number of completely valid purposes: reflection helpers, serialization libraries, lexers, protocols, etc … For any of these libraries String vs. string has real consequences depending on where the code is used.

So remember when you see the String vs. string debate this is about semantics, not style. Choosing string gives crisp meaning to your code base. Choosing String isn’t wrong but it’s leaving the door open for surprises in the future.

Note: This discussion is not limited to string. It also applies to object, int, long, short, etc … essentially any of the type keywords introduced in C# 1.0.


Share Post

Google+

comments powered by Disqus