Featured Post

Dynamics GP 2019 is now Released

It is now released, Microsoft Dynamics GP 2019 which moves the system to the modern product life cycle   " Beginning with the October...

Wednesday, September 16, 2015

SQL Server - Determining the language of a name: Arabic English or Multilingual.


As part of Data Cleansing project, I struggled with one specific case while developing custom algorithms to cleanse, detect and smartly handle duplicates in master files. The case was primarily related to fields with mixed language. 

The original legacy system has fields for the name in which; sometimes, both Arabic and Language characters are recorded. Theoretically speaking, there must be a very simple way out of this in SQL Server, which is not supposed to be time-consuming neither a reinvention of the wheel. 

There was someone in the Stack Overflow community talking about something similar but not precisely the exact thing, he was proposing a way to determine the language of the field by using the Patter Index function in SQL server, his way is quite simple and to the point, which looks for any of the alphabetical characters within the passed string, and return the position of this character. If the return value is larger than 0, it means that any of the alphabetical characters was found (either English or Arabic). Although, that would never be comprehensive in my case as some fields may contain both characters. The function is illustrated below:


CREATE   FUNCTION [dbo].[Fn_CheckName] ( @string NVARCHAR(MAX) )
RETURNS NVARCHAR(100)
BEGIN
    DECLARE @Value NVARCHAR(100)
    IF ( PATINDEX(N'%[Ø£-ÙŠ]%', RTRIM(@string)) > 0 )
        BEGIN
            SET @Value = 'A'
        END
    ELSE
        IF ( PATINDEX(N'%[A-Za-z]%', RTRIM(@string)) > 0 )
            BEGIN
                SET @Value = 'E'
            END

    RETURN @Value
END

GO

Now, the alternative of checking the pattern within the name which could be misinterpreted and processed, is to consider the first and last characters of the string, and compare the result with the original result retrieved by the function above. As follows:


SELECT  ( SELECT    dbo.Fn_CheckName(NAME)
        ) AS Language ,
        LEN(NAME) Length ,
        ( SELECT    dbo.Fn_CheckName(SUBSTRING(NAME, 1, 1))
        ) AS F_Character_Language ,
              ( SELECT    dbo.Fn_CheckName(SUBSTRING(NAME, 2, 1))
        ) Second_Character_Checkname,
        ( SELECT    dbo.Fn_CheckName(SUBSTRING(NAME, LEN(name), 1)))   
AS L_Character_Language ,     
        name
FROM    [dbo].[namelist]


Here is the overall criteria when breaking down the name into characters and check the language of the first and last character to determine which is an Arabic, English or Multilingual field.


 
Process Visualization



Best Regards,
Mahmoud M. AlSaadi


1 comment:

  1. BE SMART AND BECOME RICH IN LESS THAN 3DAYS....It all depends on how fast 
    you can be to get the new PROGRAMMED blank ATM card that is capable of
    hacking into any ATM machine,anywhere in the world. I got to know about 
    this BLANK ATM CARD when I was searching for job online about a month 
    ago..It has really changed my life for good and now I can say I'm rich and 
    I can never be poor again. The least money I get in a day with it is about 
    $50,000.(fifty thousand USD) Every now and then I keeping pumping money 
    into my account. Though is illegal,there is no risk of being caught 
    ,because it has been programmed in such a way that it is not traceable,it 
    also has a technique that makes it impossible for the CCTVs to detect 
    you..For details on how to get yours today, email the hackers on : (
    atmmachinehackers1@gmail.com ). Tell your 
    loved once too, and start to live large. That's the simple testimony of how 
    my life changed for good...Love you all ...the email address again is ;
    atmmachinehackers1@gmail.com

    ReplyDelete