How to remove diacritics/accents in C#
Solutions on how to remove the diacritics/accents from a text in C# for the various versions of .NET
Some time ago I had a requirement in an application I was building to store all the data without diacritics/accents for example "á" to "a". Searching on the web I found a stackoverflow with the answer to my requirement. So I decided to make a blog post about it.
Solution for .NET and the .NET Framework
The solution for these versions works natively without installing any additional libraries.
using System.Text;
namespace LatinizeExample
{
internal class Program
{
private static void Main(string[] args)
{
Console.WriteLine("crème brûléeĆČ Leoš Janácek Dvořák".Latinize());
Console.ReadLine();
}
}
public static class StringExtensions
{
private static readonly Encoding latinizeEncoding = Encoding.GetEncoding("ISO-8859-8");
public static string Latinize(this string value)
{
var strBytes = latinizeEncoding.GetBytes(value);
return latinizeEncoding.GetString(strBytes);
}
}
}
An example of its usage is:
"crème brûléeĆČ Leoš Janácek Dvořák".Latinize();
Solution for .NET Core
For these versions, the ISO-8859-8 encoding is not integrated into the framework. So the solution is to add a Nuget package called System.Text.Encoding.CodePages. To install it run the following command:
Using Package ManagerInstall-Package System.Text.Encoding.CodePages
Using the .NET CLI:dotnet add package System.Text.Encoding.CodePages
Once installed, you have to register the encodings provider CodePagesEncodingProvider
at the start of the application. This example is for a Console application:
using System.Text;
namespace LatinizeExample
{
internal class Program
{
private static void Main(string[] args)
{
//If using .NET Core
//add the Nuget library System.Text.Encoding.CodePages and then
//register the provider at the start of the application
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Console.WriteLine("crème brûléeĆČ Leoš Janácek Dvořák".Latinize());
Console.ReadLine();
}
}
public static class StringExtensions
{
private static readonly Encoding latinizeEncoding = Encoding.GetEncoding("ISO-8859-8");
public static string Latinize(this string value)
{
var strBytes = latinizeEncoding.GetBytes(value);
return latinizeEncoding.GetString(strBytes);
}
}
}
Resources
How to remove or Latinize diacritics/accents in C# (.NET Framework) (github.com)