102
votes

What is the best way to check if a URL is valid in Java?

If tried to call new URL(urlString) and catch a MalformedURLException, but it seems to be happy with anything that begins with http://.

I'm not concerned about establishing a connection, just validity. Is there a method for this? An annotation in Hibernate Validator? Should I use a regex?

Edit: Some examples of accepted URLs are http://*** and http://my favorite site!.

8
How do you define validity if you're not going to establish a connection?Michael Myers
Can you give an example of something which isn't a valid URL that the URL constructor accepts?uckelman
@mmyers: Validity should be determined by RFCs 2396 and 2732, the ones which define what a URL is.uckelman
@uckelman: Just about anything. "http://***" works. "http://my favorite site!" works. I can't get it to throw an exception (when http:// is at the beginning.)Eric Wilson
possible duplicate of Validating URL in JavaJasonB

8 Answers

105
votes

Consider using the Apache Commons UrlValidator class

UrlValidator urlValidator = new UrlValidator();
urlValidator.isValid("http://my favorite site!");

There are several properties that you can set to control how this class behaves, by default http, https, and ftp are accepted.

61
votes

Here is way I tried and found useful,

URL u = new URL(name); // this would check for the protocol
u.toURI(); // does the extra checking required for validation of URI 
8
votes

I'd love to post this as a comment to Tendayi Mawushe's answer, but I'm afraid there is not enough space ;)

This is the relevant part from the Apache Commons UrlValidator source:

/**
 * This expression derived/taken from the BNF for URI (RFC2396).
 */
private static final String URL_PATTERN =
        "/^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?/";
//         12            3  4          5       6   7        8 9

/**
 * Schema/Protocol (ie. http:, ftp:, file:, etc).
 */
private static final int PARSE_URL_SCHEME = 2;

/**
 * Includes hostname/ip and port number.
 */
private static final int PARSE_URL_AUTHORITY = 4;

private static final int PARSE_URL_PATH = 5;

private static final int PARSE_URL_QUERY = 7;

private static final int PARSE_URL_FRAGMENT = 9;

You can easily build your own validator from there.

5
votes

My favorite approach, without external libraries:

try {
    URI uri = new URI(name);

    // perform checks for scheme, authority, host, etc., based on your requirements

    if ("mailto".equals(uri.getScheme()) {/*Code*/}
    if (uri.getHost() == null) {/*Code*/}

} catch (URISyntaxException e) {
}
5
votes

The most "foolproof" way is to check for the availability of URL:

public boolean isURL(String url) {
  try {
     (new java.net.URL(url)).openStream().close();
     return true;
  } catch (Exception ex) { }
  return false;
}
3
votes

I didn't like any of the implementations (because they use a Regex which is an expensive operation, or a library which is an overkill if you only need one method), so I ended up using the java.net.URI class with some extra checks, and limiting the protocols to: http, https, file, ftp, mailto, news, urn.

And yes, catching exceptions can be an expensive operation, but probably not as bad as Regular Expressions:

final static Set<String> protocols, protocolsWithHost;

static {
  protocolsWithHost = new HashSet<String>( 
      Arrays.asList( new String[]{ "file", "ftp", "http", "https" } ) 
  );
  protocols = new HashSet<String>( 
      Arrays.asList( new String[]{ "mailto", "news", "urn" } ) 
  );
  protocols.addAll(protocolsWithHost);
}

public static boolean isURI(String str) {
  int colon = str.indexOf(':');
  if (colon < 3)                      return false;

  String proto = str.substring(0, colon).toLowerCase();
  if (!protocols.contains(proto))     return false;

  try {
    URI uri = new URI(str);
    if (protocolsWithHost.contains(proto)) {
      if (uri.getHost() == null)      return false;

      String path = uri.getPath();
      if (path != null) {
        for (int i=path.length()-1; i >= 0; i--) {
          if ("?<>:*|\"".indexOf( path.charAt(i) ) > -1)
            return false;
        }
      }
    }

    return true;
  } catch ( Exception ex ) {}

  return false;
}
3
votes

Judging by the source code for URI, the

public URL(URL context, String spec, URLStreamHandler handler)

constructor does more validation than the other constructors. You might try that one, but YMMV.

2
votes

validator package:

There seems to be a nice package by Yonatan Matalon called UrlUtil. Quoting its API:

isValidWebPageAddress(java.lang.String address, boolean validateSyntax, 
                      boolean validateExistance) 
Checks if the given address is a valid web page address.

Sun's approach - check the network address

Sun's Java site offers connect attempt as a solution for validating URLs.

Other regex code snippets:

There are regex validation attempts at Oracle's site and weberdev.com.