Data Science and Engineering at Enterprise Scale

Published by: Research Desk Released: Nov 26, 2019

Recently I helped create an upskilling curriculum for data science. It was aimed at people already in the industry with some tech background, though not in big data—those who didn’t have months to spend, but needed hands-on experience to get started. Our team debated which technologies would be most important for people to learn, given time constraints in the course. Jupyter, Docker, Spark, and TensorFlow each came up as key technologies. We started building examples to tie the tools together into practical workflows, plus a sampler of machine learning, data visualization, and associated topics.

At about the same time, I saw a preview of this book. “Hey, that’s it!” we recognized. “That’s just the right mix of tools, techniques, and real-world examples.”

I met Jerome years ago while guest lecturing for a data engineering fellowship. We were focused on Apache Spark in that course.

Although other components described here—Jupyter, Docker, Anaconda, deep learning, vector embedding, etc.—existed at the time, it wasn’t clear how they’d evolve and become important together. Later Jerome adapted IBM training materials for Spark to use in a training program at O’Reilly where we were both teaching. It’s been a pleasure to see him grow in this field.

Jerome has a passion for education and developer advocacy that shows throughout the pages here. Beyond demonstrating these open source technologies, he enjoys showcasing them in context, giving people tools they need to succeed in their work.

survey: it priorities and challenges for high grow...

simplifying and modernizing data protection soluti...

can dell™ poweredge™ r450 and dell poweredge r...

protect data across hybrid cloud environments with...

power ai and analytics workloads with performance,...

shield your infrastructure from cyberthreats with ...

market study: state of generative ai...

market study: state of generative ai...

problem-solving ip video system challenges...

it considerations for designing an ip video system...

prepare for the future now. achieve greater, secur...

cost savings and business benefits enabled by the ...

dell optimizer...

idc infobrief: workforce upskilling for the ai era...

work and innovate everywhere...

maximizing power efficiency with dell optimizer...

equipping the future workspace...

forrester: the total economic impact of dell pc as...

poweredge server upgrade considerations for small ...

sustainable devices for positive impact...

role of digital immune system in strengthening bus...

what is cybersecurity asset management & why is it...

the growing threat of ai malware...

b2b lead scoring best practices for better lead ac...

elevate your success with varied benefits of conve...

what is a crypto winter? navigate indicators and s...

declarative programming alleviating software devel...

promising customer retention strategies aimed at b...

the rise of progressive web applications for busin...

the types of display advertising solutions and its...

what is cyber espionage? attacks jeopardizing busi...

how brand extensions can fuel explosive growth...

insights into the google pagerank algorithm...

why are businesses turning to enterprise content m...

menace of ping flood attacks: a growing network pe...

how chatbot marketing supports today’s business ...

what is domain-based message authentication, repor...

explore reasons & steps to stop social engineering...

a comprehensive guide on saas risk management...

what are the applications of swarm intelligence (s...

securitize raises usd 47 m to tokenize blockchain-...

datarobot updates generative ai with intervention ...

coreweave raises usd 1.1 b in funding to expand it...

anthropic pbc unveiled claude team, a subscription...

oasis security’s series a expansion strengthens ...

mongodb’s ai applications initiative shapes next...

ai chip manufacturing startup, blaize inc. raises ...

nist launches nist genai for detecting ai-generate...

github copilot workspace to transform development ...

tsmc’s new chipmaking process shows power distri...

carv raises usd 10 m to build blockchain data laye...

dropzone ai cybersecurity funding reaches usd 16.8...

salesforce’s einstein copilot is now available w...

ibm announces acquisition of hashicorp inc. for us...

uk inquired microsoft and amazon ai partnerships o...

nvidia run:ai acquisition revolutionizes ai perfor...

microsoft open-sources pi-3 mini language model to...

perplexity ai secures usd 63m for generative ai se...

the potential hashicorp acquisition by ibm could b...

salesforce will not acquire informatica, a data ma...

14 interesting trends that affect innovation and t...

what is web hosting?...

data privacy best practices every business should ...

Data Science and Engineering at Enterprise Scale

Our Brands