How to remove diacritics/accents in C#

How to remove diacritics/accents in C#

Solutions on how to remove the diacritics/accents from a text in C# for the various versions of .NET

Some time ago I had a requirement in an application I was building to store all the data without diacritics/accents for example "á" to "a". Searching on the web I found a stackoverflow with the answer to my requirement. So I decided to make a blog post about it.

Solution for .NET and the .NET Framework

The solution for these versions works natively without installing any additional libraries.

using System.Text;

namespace LatinizeExample
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            Console.WriteLine("crème brûléeĆČ Leoš Janácek Dvořák".Latinize());

            Console.ReadLine();
        }
    }

    public static class StringExtensions
    {
        private static readonly Encoding latinizeEncoding = Encoding.GetEncoding("ISO-8859-8");

        public static string Latinize(this string value)
        {
            var strBytes = latinizeEncoding.GetBytes(value);

            return latinizeEncoding.GetString(strBytes);
        }
    }
}

An example of its usage is:

"crème brûléeĆČ Leoš Janácek Dvořák".Latinize();

Solution for .NET Core

For these versions, the ISO-8859-8 encoding is not integrated into the framework. So the solution is to add a Nuget package called System.Text.Encoding.CodePages. To install it run the following command:
Using Package Manager
Install-Package System.Text.Encoding.CodePages
Using the .NET CLI:
dotnet add package System.Text.Encoding.CodePages

Once installed, you have to register the encodings provider CodePagesEncodingProvider at the start of the application. This example is for a Console application:

using System.Text;

namespace LatinizeExample
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            //If using .NET Core
            //add the Nuget library System.Text.Encoding.CodePages and then
            //register the provider at the start of the application
            Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

            Console.WriteLine("crème brûléeĆČ Leoš Janácek Dvořák".Latinize());

            Console.ReadLine();
        }
    }

    public static class StringExtensions
    {
        private static readonly Encoding latinizeEncoding = Encoding.GetEncoding("ISO-8859-8");

        public static string Latinize(this string value)
        {
            var strBytes = latinizeEncoding.GetBytes(value);

            return latinizeEncoding.GetString(strBytes);
        }
    }
}

Resources

How to remove or Latinize diacritics/accents in C# (.NET Framework) (github.com)

ISO/IEC 8859-8

CodePagesEncodingProvider Class

EncodingInfo Class

Did you find this article valuable?

Support Andres Lugo by becoming a sponsor. Any amount is appreciated!