Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Asking for advice: Using Python for data validation

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


juan.alcolea at bt

Sep 11, 2001, 4:29 AM

Post #1 of 4 (403 views)
Permalink
Asking for advice: Using Python for data validation

Hi!

I need a piece of advice: In a little project where I am currently
working, we must receive and load data from very different sources and =
very
different geographic locations. The data comes as plain ascii text =
files,
and we are having a lot of problems with the quality of the data we are
receiving, so a lot of time is wasted trying to load wrong-formatted or
incomplete data, finding out where is the offending data and what the
problem is, asking the remote administrator to correct and resend the =
files,
etc...

I'm thinking about using python to code a set of scripts that perform =
some
data validation (format, completeness) of the files *before* they are =
sent
to us, so any error is detected as close to the source as possible (and =
as
far from us as possible ;-) in order to minimize this bad-data time =
waste.

The questions are:

- Do you think that Python is a good choice for this task? Please note =
that
the scripts must run in very differente platforms (NT, *nix, maybe =
Mac...).
I'm fairly new to Python, and although I'm impressed with it, I'm not =
sure
about it being really and easily portable unless you're a C & OS =
guru...

- Is there any module or library specially designed for this kind of =
task?
(parsing text data files with fixed or variable length fields, =
validating
date formats, etc...)


Big thanks in advance!




Juan Jes=FAs Alcolea Picazo - jjalcolea [at] inad



**********************************************=20
Noticia legal=20
Este mensaje electr=F3nico contiene informaci=F3n de BT =
Telecomunicaciones S.A.
que es privada y confidencial, siendo para el uso exclusivo de la =
persona(s)
o entidades arriba mencionadas. Si usted no es el destinatario =
se=F1alado, le
informamos que cualquier divulgaci=F3n, copia, distribuci=F3n o uso de =
los
contenidos est=E1 prohibida. Si usted ha recibido este mensaje por =
error, por
favor borre su contenido y comun=EDquenoslo en la direcci=F3n =
postmaster [at] bt=20
Gracias


aleax at aleax

Sep 11, 2001, 5:57 AM

Post #2 of 4 (304 views)
Permalink
Asking for advice: Using Python for data validation [In reply to]

<juan.alcolea [at] bt> wrote in message
news:mailman.1000207993.9968.python-list [at] python
...
"""
I'm thinking about using python to code a set of scripts that perform some
data validation (format, completeness) of the files *before* they are sent
to us, so any error is detected as close to the source as possible (and as
far from us as possible ;-) in order to minimize this bad-data time waste.
"""
OK, good general problem statement.


"""
- Do you think that Python is a good choice for this task? Please note that
the scripts must run in very differente platforms (NT, *nix, maybe Mac...).
I'm fairly new to Python, and although I'm impressed with it, I'm not sure
about it being really and easily portable unless you're a C & OS guru...
"""
It's well portable among Win32, Unixlike, and maybe Mac, at least, if
you just watch out for a few gotcha's (time.strptime comes to mind: it's
unfortunately NOT around on Win32 Python!!!). But any non-cross-portable
aspects would easily emerge when you run halfway-decent test, anyway,
and they're easy to fix.


"""
- Is there any module or library specially designed for this kind of task?
(parsing text data files with fixed or variable length fields, validating
date formats, etc...)
"""
Not a single module or library, as far as I know. Built-in objects
(such as strings and file objects) and modules (particularly regular
expressions) take you most of the way, and it's not hard to find 3rd
party modules for the rest of the tast -- for date/time parsing, in
particular, I recommend eGenix's "mxDateTime" module.


Alex


monch at hereandnow

Sep 11, 2001, 6:04 AM

Post #3 of 4 (325 views)
Permalink
Asking for advice: Using Python for data validation [In reply to]

Assuming the data is plain text, then any language with regular expressions
should work pretty well. Python should be fine, and use the 're' module to
get regular expression parsing capability. I've done lots of this sort of
work with Perl in the past, so I know Perl is also pretty well suited to
this type of task.

If the data's XML (I know you said "plain ascii text"...), then you'd want a
language and/or module that understands XML-formatted data. Again Python
should be just fine.


chrishbarker at home

Sep 12, 2001, 10:41 AM

Post #4 of 4 (307 views)
Permalink
Asking for advice: Using Python for data validation [In reply to]

juan.alcolea [at] bt wrote:

> - Do you think that Python is a good choice for this task? Please note that
> the scripts must run in very differente platforms (NT, *nix, maybe Mac...).

Yes.

> I'm fairly new to Python, and although I'm impressed with it, I'm not sure
> about it being really and easily portable unless you're a C & OS guru...

Python itself is VERY portable, although some of the extension modules
ar less so.

> - Is there any module or library specially designed for this kind of task?

no the whole task, but...

> (parsing text data files with fixed or variable length fields,

Check out SciPy's io functions for help with this.
(http://www.scipy.org/

>validating date formats, etc...)

mxDateTime is excellent for this.

Both of those are available for *nix and Win32, but I'm not so sure
about the Mac.



--
Christopher Barker,
Ph.D.
ChrisHBarker [at] home --- --- ---
http://members.home.net/barkerlohmann ---@@ -----@@ -----@@
------@@@ ------@@@ ------@@@
Oil Spill Modeling ------ @ ------ @ ------ @
Water Resources Engineering ------- --------- --------
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.