Matt Snider JavaScript Resource

Understanding JavaScript and Frameworks

Friday, January 18, 2008

String Functions

Strings in JavaScript do not have as many helper functions as one might like. As a result you will probably need to write a collection of methods yourself. You can either create a static utility Function or extend the “String.prototype”. String is one of the few objects that it is fine to extend the prototype object of, as you do not need to use “for … in” on strings.

For our discussion today, I will be using YAHOO.lang.augmentObject to augment the String.prototype, which is a simple method that adds the values of the object in parameter2 to the object in parameter1. For more information, see YUI yahoo.js.

Next we need to consider what functionality is most needed and missing: word capitalization, stripping characters (alpha, numbers, etc…), stripping tags (script, or all html tags), and trimming white spaces. Obviously, you may want a lot more, but this is these are the ones I use most:

Example 1: Extending Strings.prototype

YAHOO.lang.augmentObject(String.prototype, { /** * Capitolize the first letter of every word; ucfirst, ensures that all non-first letters are lower-case * * @method capitalize * @param ucfirst {boolean} OPTIONAL: when truthy, converts non-first letters to lower-case * @return {string} the converted string * @static */ capitalize: function(ucfirst) { var words = this.split(/\b/g), rs = []; Core.batch(words, function(w, i) { if (w.trim()) { rs[i] = w.charAt(0).toUpperCase() + (ucfirst? w.substring(1).toLowerCase(): w.substring(1)); } }); return rs.join(’ ‘); }, /** * Checks if a string contains any of the strings in the arguement set * * @method contains * @param argument {string} as many strings you want to test * @return {boolean} true, if string contains any of the arguements * @static */ contains: function() { var hasValue = false; Core.batch(arguments, function(arg) { hasValue = -1 < str.indexOf(arg); // terminates iteration if this becomes true return hasValue; }); return hasValue; }, /** * Removes the rx pattern from the string * * @method remove * @param rx {regex} a regex to find characters to remove * @public */ remove: function(rx) { return this.replace(rx, ''); }, /** * Remove all non-alpha characters;space ok * * @method stripNonAlpha * @public */ stripNonAlpha: function() { return this.remove(/[^A-Za-z ]+/g); }, /** * Remove all non-alpha-numeric characters; space ok * * @method stripNonAlphaNumeric * @public */ stripNonAlphaNumeric: function() { return this.remove(/[^A-Za-z0-9 ]+/g); }, /** * Removes non-numeric characters, except minus and decimal * * @method stripNonNumeric * @public */ stripNonNumeric: function() { return this.remove(/[^0-9\-\.]/g); }, /** * Remove all characters that are 0-9 * * @method stripNumeric * @public */ stripNumeric: function() { return this.remove(/[0-9]/g); }, /** * HTML script tags from the string * * @method stripScripts * @public */ stripScripts: function() { return this.remove(new RegExp("(?: )((\n|\r|.)*?)(?:<\/script>)", "img")); }, /** * HTML tags from the string * * @method stripTags * @public */ stripTags: function() { return this.remove(/<\/?[^>]+>/gi); }, /** * Replaces the white spaces at the front and end of the string * OPTIMIZED: http://blog.stevenlevithan.com/archives/faster-trim-javascript * * @method trim * @public */ trim: function() { return this.remove(/^\s\s*/).remove(/\s\s*$/); } });

Some of these methods I have put a lot of thought into, such as trim, which I use frequently, so I ensure that I have the most efficient regex. As you can see, many string manipulations can/should be handled by regex, so it is a good idea to understand regex (hopefully you do). Steven’s Blog is a great place for your regex questions, especially when looking for the best way to write an expression.

Most of these are pretty easy to understand, especially if you look at my comments (feel free to leave a comment if you have questions). The best part is, because you have extended “String.prototype”, every String throughout your entire project will be able to use them. Often times, you will find tasks that are specific toward the current project, but maybe not relevant to every project. For example, Mint.com is a financial site where I often need to search for numbers and/or currency, so I have special methods for that project. It is best to keep these in a separate file and bring them into your project as necessary.

posted by Matt Snider at 5:28 pm  

11 Comments »

  1. Very useful! Thanks for posting this

    Comment by Dimitry — January 18, 2008 @ 6:30 pm

  2. 1) Something seems to have “eaten” the backslashes in those regular expressions.

    2) Not all of the characters in the regular expressions were converted into X/HTML character entities correctly.

    3) You have some “curly” quotes in there which will probably cause problems.

    4) The statement in the trim function can be simplified to this:
    return this.replace(/^\s*|\s*$/g,”);
    (In case the backslashes are messed up here too, there should be one backslash preceding each of the two “s” characters.)

    Comment by Kravvitz — January 21, 2008 @ 4:21 pm

  3. Thanks Krawitz.

    I missed that WordPress drops the backslashes. This messed up most of the expressions in this article. I went ahead and replaced them with HTMLEntities and I believe all the expressions are correct now.

    Comment by admin — January 21, 2008 @ 4:28 pm

  4. I also include >ltrimrtrimchop len)
    {
    var tmp = String(str).substr(0, len);
    var x = tmp.lastIndexOf(’ ‘);
    if (x > 0)
    {
    out.push(this.trim(tmp.substr(0, x)));
    str = this.trim(str.substr(x + 1));
    }
    else
    {
    out.push(tmp.substr(0, len));
    str = str.substr(len + 1);
    }
    }
    else
    {
    out.push(str);
    str = ”;
    }
    }
    return ((glue === undefined) ? out : out.join(glue));
    },

    Comment by Badotz — June 30, 2008 @ 2:50 pm

  5. Wow, did *that* ever get mangled :-(

    /**
    * Trim leading spaces
    */
    ltrim: function(s) { return s.replace(/\\s*((\\S+\\s*)*)/, ‘$1′); },

    Comment by Badotz — June 30, 2008 @ 2:51 pm

  6. Too many backslashes that time :-/ What a P.O.S.

    Comment by Badotz — June 30, 2008 @ 2:52 pm

  7. Last chance:

    /**
    * Trim leading spaces
    */
    ltrim: function(s) { return s.replace(/\s*((\S+\s*)*)/, ‘$1′); },

    Comment by Badotz — June 30, 2008 @ 2:53 pm

  8. /**
    * Trim trailing spaces
    */
    rtrim: function(s) { return s.replace(/((\s*\S+)*)\s*/, ‘$1′); },

    Comment by Badotz — June 30, 2008 @ 2:53 pm

  9. /**
    * Trim leading and trailing spaces
    */
    trim: function(s) { return s.ltrim(s.rtrim(s)); },

    Comment by Badotz — June 30, 2008 @ 2:53 pm

  10. And finally:

    /**
    chop a string into segments of ‘n’ chars
    */
    chop: function(str, len, glue) {
    var out = [];
    len = ((len === undefined || isNaN(len)) ? str.length : parseInt(len));
    while (str != ”)
    {
    if (str.length > len)
    {
    var tmp = String(str).substr(0, len);
    var x = tmp.lastIndexOf(’ ‘);
    if (x > 0)
    {
    out.push(this.trim(tmp.substr(0, x)));
    str = this.trim(str.substr(x + 1));
    }
    else
    {
    out.push(tmp.substr(0, len));
    str = str.substr(len + 1);
    }
    }
    else
    {
    out.push(str);
    str = ”;
    }
    }
    return ((glue === undefined) ? out : out.join(glue));
    },

    Comment by Badotz — June 30, 2008 @ 2:54 pm

  11. Thanks for the additional methods Badotz.

    Comment by Matt Snider — June 30, 2008 @ 4:03 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress