src/bookofnim/deepdives/dataWrangling

Search:
Group by:
Source   Edit  

data wrangling

bookmark

TLDR
  • for parsing configuration files, see packaging
  • constructing regular expressions are expensive, save to a var if you can reuse it
  • re
    • follows perl 5 (see pcre spec link)
    • is an impure module require C's PCRE to be available at runtime
  • pegs meant to replace re
  • scanf can be extended with arbitrary procs for data wrangling
  • parseutils
    • provides many declarative wrappers utilizing while loops
    • prefer this over manually looping through haystacks looking for needles
    • sometimes faster than using re module

links

TODOs

  • re
    • study the expression: no clue what this means

re

  • everything works as expected, only things i found interesting are listed
  • wrapper around pcre pkg
  • supports up to 20 or 40 capturing subpatterns, not sure which is correct
  • start param can change where scan starts, but output is always relative to ^input
  • findAll and split can be iterated
  • =~ is particularly useful

re metacharacters

  • the usual suspects
  • ddd octal code ddd or backreference
  • x{hh} character with hex code hh

re exceptions

  • RegexError re syntax invalid

re types

  • Regex
  • RegexFlag enum
    • reIgnoreCase
    • reMultiLine ^ $ match new lines
    • reDotAll . matches anything
    • reExtended ignore whitespace and # and comments
    • reStudy study the expression?

re consts

  • MaxReBufSize high(cint)
  • MaxSubPatterns 20

re procs

  • find
  • match

Consts

lost = "lost something in this string, can you help me find it?"
Source   Edit