gekomad / scala-regex-collection   2.0.0

Apache License 2.0 GitHub

A collection of useful regular expression patterns

Scala versions: 3.x 2.13 2.12

Build Status Maven Central

Scala regex collection

Scala-regex-collection is a pure scala regex collection

Add the library to your project

The latest version of the library is available for Scala 2.12, 2.13 and 3.0.

libraryDependencies += "com.github.gekomad" %% "scala-regex-collection" % "2.0.0"

Using Library

Patterns

You can use defined patterns or you can define yours

Email

Ciphers

  • UUID (1CC3CCBB-C749-3078-E050-1AACBE064651)
  • MD5 (23f8e84c1f4e7c8814634267bd456194)
  • SHA1 (1c18da5dbf74e3fc1820469cf1f54355b7eec92d)
  • SHA256 (000020f89134d831f48541b2d8ec39397bc99fccf4cc86a3861257dbe6d819d1)

URL, IP, MAC Address

  • IP (10.192.168.1)
  • IP_6 (2001:db8:a0b:12f0::1)
  • URLs (http://abc.def.com)
  • Youtube (https://www.youtube.com/watch?v=9bZkp7q19f0)
  • Facebook (https://www.facebook.com/thesimpsons - https://www.facebook.com/pages/)
  • Twitter (https://twitter.com/rtpharry)
  • MAC Address (fE:dC:bA:98:76:54)

HEX

  • HEX (#F0F0F0 - 0xF0F0F0)

Bitcoin

Phone numbers

Date time

Crontab

Codes

Concurrency

Strings

Logs

  • Apache error ([Fri Dec 16 02:25:55 2005] [error] [client 1.2.3.4] Client sent malformed Host header)

Numbers

Coordinates

Programming

Credit Cards

Use the library

Validate String

Returns Option[String] with the matched string

import com.github.gekomad.regexcollection._
import com.github.gekomad.regexcollection.Validate.validate
import java.time.LocalDateTime

assert(validate[Email]("[email protected]") == Some("[email protected]"))
assert(validate[Email]("baz") == None)
assert(validate[MD5]("fc42757b4142b0474d35fcddb228b304") == Some("fc42757b4142b0474d35fcddb228b304"))
assert(validate[LocalDateTime]("2000-12-31T11:21:19") == Some("2000-12-31T11:21:19"))

findAll

Example extracting all emails from a string

import com.github.gekomad.regexcollection.Email
import com.github.gekomad.regexcollection.Validate.findAll

assert(findAll[Email]("bar [email protected] hi hello [email protected]") == List("[email protected]", "[email protected]"))
assert(findAll[Email]("[email protected]") == List("[email protected]"))
assert(findAll[Email]("ddddd") == List())

findFirst

Example extracting first email from a string

trait Bar
import com.github.gekomad.regexcollection.Validate.findFirst
import com.github.gekomad.regexcollection.Validate.findFirstIgnoreCase
import com.github.gekomad.regexcollection.Collection.Validator

implicit val myValidator = Validator[Bar]("""Bar@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*""")
assert(findFirstIgnoreCase[Bar]("bar [email protected] hi hello [email protected] 123 [email protected]") == Some("[email protected]"))
assert(findFirst[Bar]("bar [email protected] hi hello [email protected] 123 [email protected]") == Some("[email protected]"))

Get pattern

Returns the current pattern used for that type, for example for Email type:

import com.github.gekomad.regexcollection.Email
import com.github.gekomad.regexcollection.Validate.regexp

assert(regexp[Email] == """[a-zA-Z0-9\.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*""")

Modify default pattern

It's possible modify the default pattern for all types, example for Email

import com.github.gekomad.regexcollection.Email
import com.github.gekomad.regexcollection.Validate.validate
import com.github.gekomad.regexcollection.Collection.Validator

val email = "abc,a@%.d"

//using default pattern doesn't match the string
assert(validate[Email](email) == None)

//using custom pattern the string is matched
implicit val validator = Validator[Email](""".+@.+\..+""")
assert(validate[Email](email) == Some("abc,a@%.d"))

Matching your own type

Defining a pattern for Bar type

trait Bar

import com.github.gekomad.regexcollection.Validate.validate
import com.github.gekomad.regexcollection.Validate.validateIgnoreCase
import com.github.gekomad.regexcollection.Collection.Validator

// pattern for strings starting with "Bar."
implicit val myValidator = Validator[Bar]("Bar.*")

assert(validate[Bar]("a string") == None)
assert(validate[Bar]("Bar foo") == Some("Bar foo"))
assert(validate[Bar]("bar foo") == None)
assert(validateIgnoreCase[Bar]("bar foo") == Some("bar foo"))

findAllIgnoreCase

Retrieve all emails using findAll and findAllCaseSensitive

trait Bar
import com.github.gekomad.regexcollection.Collection.Validator
import com.github.gekomad.regexcollection.Validate._

//get all Alice's emails
implicit val myValidator = Validator[Bar]("""Alice@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*""")

val s = "bar [email protected] hi hello [email protected] 123 [email protected]"
assert(findAll[Bar](s) == List("[email protected]"))
assert(findAllIgnoreCase[Bar](s) == List("[email protected]", "[email protected]"))

Using a function pattern

Instead of using a regular expression to match a string it's possible defining a function pattern

Example matching even numbers

trait Foo

import com.github.gekomad.regexcollection.Validate.validate
import com.github.gekomad.regexcollection.Collection.Validator

def even: String => Option[String] = { s =>
  {
    for {
      i <- scala.util.Try(s.toInt)
      if (i % 2 == 0)
    } yield Some(s)
  }.getOrElse(None)
}

implicit val validator: Validator[Foo] = Validator[Foo](even)

assert(validate[Foo]("42") == Some("42"))
assert(validate[Foo]("41") == None)
assert(validate[Foo]("hello") == None)

Bugs and Feedback

For bugs, questions and discussions please use Github Issues.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Special Thanks

To regexlib.com