Friday 26 August 2011

Stripping out unwanted characters from a string variable

Question:
I'm using Oracle9i Database. How can I strip out unwanted characters from a string variable? For example, I would like to be able to call a function named stripped_string like this:
DECLARE
l_before VARCHAR2 (100) :=
'This is my string';
l_after VARCHAR2 (100);
BEGIN
l_after := stripped_string
(l_before, 'is');
DBMS_OUTPUT.put_line (l_after);
END;
/

and set l_after to 'Th my trng'.
Answer:
This question was answered in the January/February 2006 issue of Oracle Magazine.
While Oracle9i Database does not provide a built-in function specifically to remove characters from a string, you can easily adapt TRANSLATE to this purpose.
The TRANSLATE function allows you to replace characters in an expression that appear in a from string to their corresponding characters in the to string.
Suppose that I want to replace all instances of the letter e in my name with the letter X. I will then execute this code:
BEGIN
DBMS_OUTPUT.PUT_LINE (
TRANSLATE (
'Steven Feuerstein'
, 'e', 'X')
);
END;
/
And see this result:
StXvXn FXuXrstXin
Note that TRANSLATE is case-sensitive. If I run this variation on the above block
BEGIN
DBMS_OUTPUT.PUT_LINE (
TRANSLATE (
'Steven Feuerstein'
, 'E', 'X')
);
END;
/
I see these results:
Steven Feuerstein
From these demonstrations, you might quickly conclude that to remove all the e's from my name, I can simply call TRANSLATE as follows:
TRANSLATE ('Steven Feuerstein', 'e', NULL)
or
TRANSLATE ('Steven Feuerstein', 'e', '')
Alas, this will not work as desired. Both expressions will return NULL, since the empty string is treated as a NULL, and generally if you provide a NULL argument to an Oracle function, it will return a NULL (there are some exceptions to this rule, like the NVL function).
To use TRANSLATE as a character-stripper, then, I need to finesse the function a bit. For example, I can strip out e's from my name with this block of code:
BEGIN
DBMS_OUTPUT.PUT_LINE (
TRANSLATE (
'Steven Feuerstein'
, '#e', '#')
);
END;
/
Here is the resulting output:
Stvn Furstin
Notice what I have done: I included in both the from string and the to string the same first character. But in the to string, that leading character is the only character. TRANSLATE goes one by one through each character in the from string and replaces it with the character in the to string at the same position. Consequently, # is replaced by # and e is replaced by NULL. So all instances of e disappear: TRANSLATE has done the job for me.
Before you rush off to apply TRANSLATE in this fashion, make sure you know how to best implement it as a character stripper. Consider the following block
BEGIN
DBMS_OUTPUT.PUT_LINE (
TRANSLATE (
'Steven Feuerstein'
, 'ee', 'e')
);
END;
/
and its output:
Steven Feuerstein
TRANSLATE did not, in this case, remove the e's from my name, because the leading character for replacement was also an e, and that translation of e to e took precedence over the second translation of e to NULL. So you need to make sure that the leading character in both strings is not the same as any of the characters you wish to strip from the string.
Now, you could simply try to remember all this, but you would be much better off encapsulating all this information into a function so it can do the remembering and applying of these TRANSLATE-related rules on your behalf.
Consider the stripped_string function in Listing 1. This simple program offers the following advantages:
Code Listing 1: The STRIPPED_STRING function
CREATE OR REPLACE FUNCTION stripped_string (
expression_in IN VARCHAR2
,characters_in IN VARCHAR2
,placeholder_in IN VARCHAR2 DEFAULT CHR(1)
)
RETURN VARCHAR2
IS
BEGIN
RETURN TRANSLATE ( expression_in
, placeholder_in || characters_in
, placeholder_in
);
END stripped_string;
/
• It hides the use of TRANSLATE to perform the character stripping. You no longer need to be aware of how to use TRANSLATE in this fashion.
• It contains the knowledge about the need to provide a leading character in both from and to strings. By default, it uses the nonprinting CHR(1) ("Start of header" character) as the placeholder for TRANSLATE-based stripping. It is very unlikely that you will ever have a string with CHR(1) in it, so you can usually call the function without specifying a placeholder character. If, however, you ever need to override that placeholder, you can do so by calling the function and specifying all three arguments.
Listing 2 includes examples of calls to stripped_string.
Code Listing 2: Using STRIPPED_STRING with different inputs
BEGIN
DBMS_OUTPUT.put_line (stripped_string ('Steven Feuerstein', 'e'));
DBMS_OUTPUT.put_line (stripped_string (NULL, 'e'));
DBMS_OUTPUT.put_line (stripped_string ('Steven Feuerstein', 'e', NULL));
DBMS_OUTPUT.put_line (stripped_string ('Steven Feuerstein', NULL));
DBMS_OUTPUT.put_line (stripped_string ('Steven Feuerstein', 'e', 'e'));
DBMS_OUTPUT.put_line (stripped_string ('Steven Feuerstein', 'e', 't'));
DBMS_OUTPUT.put_line (stripped_string ('Steven Feuerstein', 'etn'));
END;
/
This process of information hiding is certainly useful for this exercise with TRANSLATE. It is absolutely critical, however, with the more-complex programs you will write for your application. Take the time to encapsulate implementation of business rules and formulas in their own functions, such as stripped_string. The resulting code will be much easier to debug and maintain.
I wrote the simple test script in Listing 3 to exercise stripped_string.
Code Listing 3: Testing STRIPPED_STRING
DECLARE
l_success BOOLEAN DEFAULT TRUE;

PROCEDURE report_failure (description_in IN VARCHAR2)
IS
BEGIN
l_success := FALSE;
DBMS_OUTPUT.put_line ('stripped_string failure!');
DBMS_OUTPUT.put_line (' ' || description_in);
END report_failure;
BEGIN
IF stripped_string ('abc', 'b') != 'ac'
THEN
report_failure ('Default placeholder');
END IF;

IF stripped_string ('abc', 'abc') IS NOT NULL
THEN
report_failure ('All characters removed');
END IF;

IF stripped_string (NULL, 'abc') IS NOT NULL
THEN
report_failure ('NULL input string');
END IF;

IF stripped_string ('abcdefb', 'b') != 'acdef'
THEN
report_failure ('Repeats of character');
END IF;

IF stripped_string ('abc', 'a', '#') != 'bc'
THEN
report_failure ('# as placeholder');
END IF;

IF stripped_string ('abc', NULL) != 'abc'
THEN
report_failure ('NULL character list');
END IF;

IF l_success
THEN
DBMS_OUTPUT.put_line ('stripped_string passed its tests!');
END IF;
END;
/

No comments:

Post a Comment